How to create a simple ETL/stored procedure error capturing database (schema model, SQL code and implementation) – Part 1

November 5th, 2014 / 2 Comments » / by admin

Introduction

Two part series (second part can be found HERE) on how to create and implement an ETL execution error capturing database (schema model, SQL code, implementation etc.) based on Microsoft SQL Server stored procedures default error handling capabilities.

TL;DR Version

My last client engagement required me to create an ETL errors tracking solution which would serve as a central repository for all data related to SSIS packages execution failures. Given the initial scope and assumptions pointed to a small project size, thus the number of ETL routines involved kept at a minimum, my first intention was to build a single table encompassing all key information related to the package metadata and execution state. However, as the project grew in size, execs started to demand richer, more detailed data and the constricting schema of a single table became too evident, I decided to expand it with a proper relational database schema with additional metadata attributes at all common data-grouping hierarchies i.e. instance, database, schema and objects. In addition, I have also extracted metadata pertaining to users, logins and packages which complemented the whole data set with a comprehensive suite of variables for error-related analysis. Please note that this model and the subsequently discussed implementation is based primarily on capturing errors resulting from stored procedures execution as per my project demands. Since version 2012, SQL Server included the SSISDB catalog as the central point for working with Integration Services (SSIS) projects deployed to the Integration Services server which contains a rich set of data for all execution eventualities and should be treated as a default logging option for comprehensive SSIS runtime events capture.

Below is a breakdown of a database schema model which accounts for most of the database-related objects as well as some handy information on individual packages, logins and users. First up, we have instance and database data specific tables – LogSSISErrors_Instance and LogSSISErrors_DB – in a one-to-many relationship i.e. one SQL Server instance to multiple databases.

Error_Capture_DB_Schema_ERD_Partial_1

As each database can have multiple schemas, LogSSISErrors_DB has a one-to-many relationship to LogSSISErrors_Schema table, which in turn can be further related to LogSSISErrors_Object and LogSSISErrors_Process tables, each storing object-level data. The only distinction between ‘Object’ and ‘Process’ specific tables is the fact that ‘Object’ table stores table and view metadata whereas ‘Process’ table is concerned with stored procedures, assemblies, functions and triggers, even though on the database level they are all technically treated as objects.

Error_Capture_DB_Schema_ERD_Partial_2

Both, LogSSISErrors_Object and LogSSISErrors_Process have a one-to-many relationship with LogSSISErrors_Error table – the centrepiece of all the logging activities. LogSSISErrors_Error table is also linked up with LogSSISErrors_User, LogSSISErrors_Package and LogSSISErrors_Login tables, storing users, local SSIS packages and logins metadata.

Error_Capture_DB_Schema_ERD_Partial_3

LogSSISErrors_user table is also connected to LogSSISErrors_DB table as one database can have multiple users created against it.

Error_Capture_DB_Schema_ERD_Partial_4

Full entity relational diagram for AdminDBA is as per the image below.

Error_Capture_DB_Schema_ERD_Full

All the data represented by the above schema, with the exception of LogSSISErrors_Package table entries, should be available by querying either catalog views, information schema views or system stored procedures/functions. Most of the tables names are self-explanatory and can be easily referred back to the function they play and relevant metadata information. In case of LogSSISErrors_Process and LogSSISError_Object, these tables’ data comes from the same query with the distinction made on the object type i.e. view and tables, whereas stored procedures are bucketed as processes (even though technically they are objects too), thus placed in the LogSSISErrors_Process table. In case of LogSSISErrors_Package table, the package metadata is not stored on the server level but can be sourced from a .dtsx file using T-SQL. This part of solution was made available thanks to Jamie Thompson’s scripts which you can read about in detail in his post HERE, I simply adopted his technique for my solution.

The script used to create all database objects and populate those with SQL Server instance metadata is close to 1800 lines long so in this post I’ll only go through sections of it, skipping all the boilerplate SQL; however, a full working version is available for download from my OneDrive folder HERE. The script is divided into a few sections, main parts being AdminDBA database metadata collection for subsequent objects recreation and second part determined by a conditional logic where  a counts of foreign key constraints and a count of tables from two table variables created is compared.

First part creates two table variables storing objects AdminDBA objects metadata and foreign key constraints referencing individual tables. If those tables have already been created and relationships exist, than the rest of the script utilizes this metadata in subsequent execution. The following T-SQL generates table variables and their content.

DECLARE @TempFKs TABLE
    (
      ID INT IDENTITY(1, 1)
             NOT NULL ,
      Foreign_Key_Name VARCHAR(512) ,
      Parent_Object_Name VARCHAR(256) ,
      Referenced_Object_Name VARCHAR(256) ,
      Parent_Column_Name VARCHAR(256) ,
      Referenced_Column_Name VARCHAR(256)
    )				

INSERT  INTO @TempFKs
        ( Foreign_Key_Name ,
          Parent_Object_Name ,
          Referenced_Object_Name ,
          Parent_Column_Name ,
          Referenced_Column_Name
        )
        SELECT  fk.name AS foreign_key_name ,
                po.name AS parent_object_name ,
                ro.name AS referenced_object_name ,
                pc.name AS parent_column_name ,
                rc.name AS referenced_column_name
        FROM    sys.foreign_key_columns fc
                JOIN sys.columns pc ON pc.column_id = parent_column_id
                                       AND parent_object_id = pc.object_id
                JOIN sys.columns rc ON rc.column_id = referenced_column_id
                                       AND referenced_object_id = rc.object_id
                JOIN sys.objects po ON po.object_id = pc.object_id
                JOIN sys.objects ro ON ro.object_id = rc.object_id
                JOIN sys.foreign_keys fk ON fk.object_id = fc.constraint_object_id
        WHERE   ro.type = 'U'
                AND fk.type = 'F'								

DECLARE @TempTbls TABLE
    (
      ID INT IDENTITY(1, 1)
             NOT NULL ,
      Table_Name VARCHAR(128) NOT NULL ,
      Table_Schema_Definition VARCHAR(MAX) NOT NULL
    )				

INSERT  INTO @TempTbls
        ( Table_Name ,
          Table_Schema_Definition
        )
        SELECT  so.name AS Table_Name ,
                'create table [' + so.name + '] (' + STUFF(o.list, LEN(o.list),
                                                           1, '') + ')'
                + CASE WHEN tc.CONSTRAINT_NAME IS NULL THEN ''
                       ELSE 'ALTER TABLE ' + so.name + ' ADD CONSTRAINT '
                            + tc.CONSTRAINT_NAME + ' PRIMARY KEY ' + ' ('
                            + LEFT(j.list, LEN(j.list) - 1) + ')'
                  END AS Table_Schema_Definition
        FROM    sysobjects so
                CROSS APPLY ( SELECT    '  [' + COLUMN_NAME + '] ' + DATA_TYPE
                                        + CASE DATA_TYPE
                                            WHEN 'sql_variant' THEN ''
                                            WHEN 'text' THEN ''
                                            WHEN 'ntext' THEN ''
                                            WHEN 'xml' THEN ''
                                            WHEN 'decimal'
                                            THEN '('
                                                 + CAST(NUMERIC_PRECISION AS VARCHAR)
                                                 + ', '
                                                 + CAST(NUMERIC_SCALE AS VARCHAR)
                                                 + ')'
                                            ELSE COALESCE('('
                                                          + CASE
                                                              WHEN CHARACTER_MAXIMUM_LENGTH = -1
                                                              THEN 'MAX'
                                                              ELSE CAST(CHARACTER_MAXIMUM_LENGTH AS VARCHAR)
                                                            END + ')', '')
                                          END + ' '
                                        + CASE WHEN EXISTS ( SELECT
                                                              id
                                                             FROM
                                                              syscolumns
                                                             WHERE
                                                              OBJECT_NAME(id) = so.name
                                                              AND name = column_name
                                                              AND COLUMNPROPERTY(id,
                                                              name,
                                                              'IsIdentity') = 1 )
                                               THEN 'IDENTITY('
                                                    + CAST(IDENT_SEED(so.name) AS VARCHAR)
                                                    + ','
                                                    + CAST(IDENT_INCR(so.name) AS VARCHAR)
                                                    + ')'
                                               ELSE ''
                                          END + ' '
                                        + ( CASE WHEN IS_NULLABLE = 'No'
                                                 THEN 'NOT '
                                                 ELSE ''
                                            END ) + 'NULL '
                                        + CASE WHEN INFORMATION_SCHEMA.COLUMNS.COLUMN_DEFAULT IS NOT NULL
                                               THEN 'DEFAULT '
                                                    + INFORMATION_SCHEMA.COLUMNS.COLUMN_DEFAULT
                                               ELSE ''
                                          END + ', '
                              FROM      INFORMATION_SCHEMA.COLUMNS
                              WHERE     TABLE_NAME = so.name
                              ORDER BY  ORDINAL_POSITION
                            FOR
                              XML PATH('')
                            ) o ( list )
                LEFT JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc ON tc.TABLE_NAME = so.name
                                                              AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
                CROSS APPLY ( SELECT    '[' + COLUMN_NAME + '], '
                              FROM      INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
                              WHERE     kcu.CONSTRAINT_NAME = tc.CONSTRAINT_NAME
                              ORDER BY  ORDINAL_POSITION
                            FOR
                              XML PATH('')
                            ) j ( list )
        WHERE   xtype = 'U'
                AND name NOT IN ( 'dtproperties' )

By design, after the the database and objects have been (re)created, the number of foreign key constraints should be one fewer than the number of tables. The below piece of SQL compares the counts of foreign key constraints and the count of tables from two table variables at which point one of the two possible scenarios can be triggered – if the count is comparable than the objects are re-created using the data encapsulated in table variables which already store DDL code for individual tables as well as foreign key constraints. If, on the other hand, the count is not comparable, a different part of stored procedure gets executed, this time responsible for recreating objects based on the hard-coded DDL statements and foreign key constraints.

IF NOT EXISTS ( SELECT  COUNT(*) - 1 AS ct
                FROM    @TempFKs a
                EXCEPT
                SELECT  COUNT(*)
                FROM    @TempTbls b )
		AND EXISTS ( SELECT TOP 1 1 FROM @TempTbls )

Most of the SQL Server databases’ metadata comes from storing the results of the below query which extracts database, schema and objects information for each database and temporarily stores it in a table variable.

SET @SQL = N'';
SELECT
@SQL = @SQL + ' UNION ALL
SELECT
[database]					=	''' + name	+ ''',
[schema]					=	s.name COLLATE Latin1_General_CI_AI,
[schemaid]					=	s.schema_id,
[schemaowner]				=	dp.name COLLATE Latin1_General_CI_AI,
[processobjectname]			=	ob.name COLLATE Latin1_General_CI_AI,
[processobjectid]			=	ob.object_id,
[processobjectschemaid]		=	ob.schema_id,
[ProcessObjectDescription]	=	NULL,
[processobjecttype]			=	CASE
								WHEN ob.type =	''AF''	THEN	''Aggregate Function''
								WHEN ob.type =	''C''	THEN	''Check Constraint''
								WHEN ob.type =	''D''	THEN	''Default Constraint''
								WHEN ob.type =	''F''	THEN	''Foreign Key Constraint''
								WHEN ob.type =	''FN''	THEN	''SQL Scalar Function''
								WHEN ob.type =	''FS''	THEN	''Assembly Scalar Function''
								WHEN ob.type =	''FT''	THEN	''Assembly Table Valued Function''
								WHEN ob.type =	''IF''	THEN	''SQL Inline Table-valued Function''
								WHEN ob.type =	''IT''	THEN	''Internal Table''
								WHEN ob.type =	''P''	THEN	''SQL Stored Procedure''
								WHEN ob.type =	''PC''	THEN	''Assembly Stored Procedure''
								WHEN ob.type =	''PG''	THEN	''Plan Guide''
								WHEN ob.type =	''PK''	THEN	''Primary Key Constraint''
								WHEN ob.type =	''R''	THEN	''Rule''
								WHEN ob.type =	''RF''	THEN	''Replication Filter Procedure''
								WHEN ob.type =	''S''	THEN	''System Base Table''
								WHEN ob.type =	''SN''	THEN	''Synonym''
								WHEN ob.type =	''SO''	THEN	''Sequence Object''
								WHEN ob.type =	''SQ''	THEN	''Service Queue''
								WHEN ob.type =	''TA''	THEN	''Assembly DML Trigger''
								WHEN ob.type =	''TF''	THEN	''SQL Table Table Valued Function''
								WHEN ob.type =	''TR''	THEN	''SQL DML Trigger''
								WHEN ob.type =	''TT''	THEN	''Table Type''
								WHEN ob.type =	''U''	THEN	''User Defined Table''
								WHEN ob.type =	''UQ''	THEN	''Unique Constraint''
								WHEN ob.type =	''V''	THEN	''View''
								WHEN ob.type =	''X''	THEN	''Extended Stored Procedure''
								ELSE							''Unknown Object Type''
								END,
[createdate]				=	CAST(ob.create_date as date),
[modifieddate]				=	CAST(ob.modify_date as date)
FROM ' + QUOTENAME(name) + '.sys.schemas AS s
INNER JOIN ' + QUOTENAME(name)
+ '.sys.database_principals dp
on s.principal_id = dp.principal_id
INNER JOIN ' + QUOTENAME(name) + '.sys.objects AS ob
ON s.[schema_id] = ob.[schema_id]
where s.name <> ''sys'''
FROM    sys.databases
WHERE   database_id > 4
        AND name NOT IN ( 'ReportServer$SQL2014_MULTIDIM',
                          'ReportServer$SQL2014_MULTIDIMTempDB' );
SET @SQL = @SQL + N' ORDER BY [database],[schema];';
SET @SQL = STUFF(@SQL, 1, 11, '');

DECLARE @EnumDBMeta TABLE
    (
      ID INT NOT NULL
             IDENTITY(1, 1) ,
      DBName NVARCHAR(128) ,
      SchemaName NVARCHAR(128) ,
      SchemaID INT ,
      SchemaOwner NVARCHAR(128) ,
      ProcessObjectName NVARCHAR(128) ,
      ProcessObjectID INT ,
      ProcessObjectSchemaID INT ,
      ProcessObjectDescription NVARCHAR(2048) ,
      ProcessObjectType NVARCHAR(128) ,
      ProcessObjectCreatedDate DATE ,
      ProcessObjectModifiedDate DATE
    )
INSERT  INTO @EnumDBMeta
        ( DBName ,
          SchemaName ,
          SchemaID ,
          SchemaOwner ,
          ProcessObjectName ,
          ProcessObjectID ,
          ProcessObjectSchemaID ,
          ProcessObjectDescription ,
          ProcessObjectType ,
          ProcessObjectCreatedDate ,
          ProcessObjectModifiedDate
        )
EXEC sp_executesql @SQL

Instance level metadata comes mainly from appending different arguments to a SERVERPROPERTY function to return instance-specific information. Database users metadata comes from sys.database_principals, whereas login details are returned from sys.server_principals catalog views.

SSIS package metadata is a little bit more complex to interrogate. In case of LogSSISErrors_Package table, the package metadata is not stored on the server level but can be sourced from a .dtsx file using T-SQL. As mentioned before, this part of solution was made available thanks to Jamie Thompson’s scripts which you can read about in detail in his post HERE. This part of script accesses the file system and reads .dtsx XML content, storing relevant attributes in LogSSISErrors_Package table.

EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'xp_cmdshell', 1;
RECONFIGURE;

DECLARE @Path VARCHAR(2000);
SET @Path = 'C:\*.dtsx';
		--Must be of form [drive letter]\...\*.dtsx

DECLARE @MyFiles TABLE
    (
      MyID INT IDENTITY(1, 1)
               PRIMARY KEY ,
      FullPath VARCHAR(2000)
    );
DECLARE @CommandLine VARCHAR(4000);

SELECT  @CommandLine = LEFT('dir "' + @Path + '" /A-D /B /S ', 4000);
INSERT  INTO @MyFiles
        ( FullPath )
        EXECUTE xp_cmdshell @CommandLine;
DELETE  FROM @MyFiles
WHERE   FullPath IS NULL
        OR FullPath = 'File Not Found'
        OR FullPath = 'The system cannot find the path specified.'
        OR FullPath = 'The system cannot find the file specified.'; 

IF EXISTS ( SELECT  *
            FROM    sys.tables
            WHERE   name = N'pkgStats' )
    DROP	TABLE pkgStats;
CREATE	 TABLE pkgStats
    (
      PackagePath VARCHAR(900) NOT NULL
                               PRIMARY KEY ,
      PackageXML XML NOT NULL
    );

DECLARE @FullPath VARCHAR(2000);
DECLARE file_cursor CURSOR
FOR
    SELECT  FullPath
    FROM    @MyFiles;
OPEN	file_cursor
FETCH NEXT FROM file_cursor INTO @FullPath;
WHILE @@FETCH_STATUS = 0
    BEGIN
        SET @SQL = '
					INSERT	pkgStats (PackagePath,PackageXML)
		select  ''@FullPath'' as PackagePath
		,		cast(BulkColumn as XML) as PackageXML
		from    openrowset(bulk ''@FullPath'',
								single_blob) as pkgColumn';
        SELECT  @SQL = REPLACE(@SQL, '@FullPath', @FullPath);
        EXEC sp_executesql @SQL;

        FETCH NEXT FROM file_cursor INTO @FullPath;
    END
CLOSE	file_cursor;
DEALLOCATE file_cursor;

DECLARE @pkgStatsBase TABLE
    (
      PackagePath VARCHAR(900) ,
      PackageId UNIQUEIDENTIFIER ,
      PackageCreatorsName NVARCHAR(500) ,
      PackageFormatVersion SMALLINT ,
      PackageType NVARCHAR(50) ,
      PackageDescription NVARCHAR(2000) ,
      PackageVersionMajor SMALLINT ,
      PackageVersionMinor SMALLINT ,
      PackageVersionBuild SMALLINT ,
      PackageVersionGUID UNIQUEIDENTIFIER ,
      PackageXML XML
    );

INSERT  INTO @pkgStatsBase
        SELECT  PackagePath ,
                CAST(PackageXML.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
							/DTS:Executable[1]/DTS:Property[@DTS:Name=''DTSID''][1]',
                                      'nvarchar(500)') AS UNIQUEIDENTIFIER) AS PackageID ,
                PackageXML.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
							/DTS:Executable[1]/DTS:Property[@DTS:Name=''CreatorName''][1]',
                                 'nvarchar(500)') AS PackageCreatorsName ,
                CAST(PackageXML.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
							/DTS:Executable[1]/DTS:Property[@DTS:Name=''PackageFormatVersion''][1]',
                                      'varchar(3)') AS SMALLINT) AS PackageFormatVersion ,
                CAST(PackageXML.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
							DTS:Executable[1]/@DTS:ExecutableType[1]',
                                      'nvarchar(50)') AS NVARCHAR(50)) AS PackageType ,
                PackageXML.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
							/DTS:Executable[1]/DTS:Property[@DTS:Name=''Description''][1]',
                                 'nvarchar(2000)') AS PackageDescription ,
                CAST(PackageXML.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
							/DTS:Executable[1]/DTS:Property[@DTS:Name=''VersionMajor''][1]',
                                      'varchar(3)') AS SMALLINT) AS PackageVersionMajor ,
                CAST(PackageXML.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
							/DTS:Executable[1]/DTS:Property[@DTS:Name=''VersionMinor''][1]',
                                      'varchar(3)') AS SMALLINT) AS PackageVersionMinor ,
                CAST(PackageXML.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
							/DTS:Executable[1]/DTS:Property[@DTS:Name=''VersionBuild''][1]',
                                      'varchar(3)') AS SMALLINT) AS PackageVersionBuild ,
                CAST(PackageXML.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
							/DTS:Executable[1]/DTS:Property[@DTS:Name=''VersionGUID''][1]',
                                      'char(38)') AS UNIQUEIDENTIFIER) AS PackageVersionGUID ,
                PackageXML
        FROM    pkgStats

DECLARE @AllpkgStats TABLE
    (
      PackageId UNIQUEIDENTIFIER NULL ,
      PackagePath VARCHAR(900) PRIMARY KEY
                               NOT NULL ,
      PackageName VARCHAR(900) NULL ,
      PackageCreatorsName NVARCHAR(500) NULL ,
      PackageFormatVersion SMALLINT NULL ,
      PackageType NVARCHAR(50) NULL ,
      PackageDescription NVARCHAR(2000) NULL ,
      PackageVersionMajor SMALLINT NULL ,
      PackageVersionMinor SMALLINT NULL ,
      PackageVersionBuild SMALLINT NULL ,
      PackageVersionGUID UNIQUEIDENTIFIER NULL ,
      PackageTasksNumber INT NULL ,
      PackageContainersNumber INT NULL ,
      PackageDataFlowsNumber INT NULL ,
      PackageConnectionManagersNumber INT NULL ,
      PackageVariablesInEntirePackageNumber INT NULL ,
      PackageXML XML NULL
    ); 

INSERT  INTO @AllpkgStats
        ( PackageId ,
          PackagePath ,
          PackageName ,
          PackageCreatorsName ,
          PackageFormatVersion ,
          PackageType ,
          PackageDescription ,
          PackageVersionMajor ,
          PackageVersionMinor ,
          PackageVersionBuild ,
          PackageVersionGUID ,
          PackageXML
        )
        SELECT  p.PackageId ,
                p.PackagePath ,
                SUBSTRING(PackagePath,
                          LEN(PackagePath) - CHARINDEX('\',
                                                       REVERSE(PackagePath), 0)
                          + 2, LEN(PackagePath)) AS PackageName ,
                p.PackageCreatorsName ,
                p.PackageFormatVersion ,
                p.PackageType ,
                p.PackageDescription ,
                p.PackageVersionMajor ,
                p.PackageVersionMinor ,
                p.PackageVersionBuild ,
                p.PackageVersionGUID ,
                p.PackageXML
        FROM    @pkgStatsBase p;
		--Number of tasks
MERGE INTO @AllpkgStats AS t
USING
    ( SELECT    PackagePath ,
                COUNT(*) AS PackageTasksNumber
      FROM      @pkgStatsBase p
                CROSS    APPLY p.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
									//DTS:Executable[@DTS:ExecutableType!=''STOCK:SEQUENCE''
								and    @DTS:ExecutableType!=''STOCK:FORLOOP''
								and    @DTS:ExecutableType!=''STOCK:FOREACHLOOP''
								and not(contains(@DTS:ExecutableType,''.Package.''))]') Pkg ( props )
      GROUP BY  PackagePath
    ) s
ON ( t.PackagePath = s.PackagePath )
WHEN MATCHED THEN
    UPDATE SET PackageTasksNumber = s.PackageTasksNumber;

		--Number of containers
MERGE INTO @AllpkgStats AS t
USING
    ( SELECT    PackagePath ,
                COUNT(*) AS PackageContainersNumber
      FROM      @pkgStatsBase p
                CROSS    APPLY p.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
									//DTS:Executable[@DTS:ExecutableType=''STOCK:SEQUENCE''
									or    @DTS:ExecutableType=''STOCK:FORLOOP''
									or    @DTS:ExecutableType=''STOCK:FOREACHLOOP'']') Pkg ( props )
      GROUP BY  PackagePath
    ) s
ON ( t.PackagePath = s.PackagePath )
WHEN MATCHED THEN
    UPDATE SET PackageContainersNumber = s.PackageContainersNumber
WHEN NOT MATCHED BY SOURCE THEN
    UPDATE SET PackageContainersNumber = 0;

		--Number of data flows
MERGE INTO @AllpkgStats AS t
USING
    ( SELECT    PackagePath ,
                COUNT(*) AS PackageDataFlowsNumber
      FROM      @pkgStatsBase p
                CROSS    APPLY p.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
									//DTS:Executable[contains(@DTS:ExecutableType,''.Pipeline.'')]') Pkg ( props )
      GROUP BY  PackagePath
    ) s
ON ( t.PackagePath = s.PackagePath )
WHEN MATCHED THEN
    UPDATE SET PackageDataFlowsNumber = s.PackageDataFlowsNumber
WHEN NOT MATCHED BY SOURCE THEN
    UPDATE SET PackageDataFlowsNumber = 0;

		--Number of connection managers
MERGE INTO @AllpkgStats AS t
USING
    ( SELECT    PackagePath ,
                COUNT(*) AS PackageConnectionManagersNumber
      FROM      @pkgStatsBase p
                CROSS    APPLY p.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
									//DTS:ConnectionManager') Pkg ( props )
      GROUP BY  PackagePath
    ) s
ON ( t.PackagePath = s.PackagePath )
WHEN MATCHED THEN
    UPDATE SET PackageConnectionManagersNumber = s.PackageConnectionManagersNumber
WHEN NOT MATCHED BY SOURCE THEN
    UPDATE SET PackageConnectionManagersNumber = 0;

		--Number of variables in entire package
MERGE INTO @AllpkgStats AS t
USING
    ( SELECT    PackagePath ,
                COUNT(*) AS PackageVariablesInEntirePackageNumber
      FROM      @pkgStatsBase p
                CROSS    APPLY p.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
									//DTS:Variable') Pkg ( props )
      GROUP BY  PackagePath
    ) s
ON ( t.PackagePath = s.PackagePath )
WHEN MATCHED THEN
    UPDATE SET PackageVariablesInEntirePackageNumber = s.PackageVariablesInEntirePackageNumber
WHEN NOT MATCHED BY SOURCE THEN
    UPDATE SET PackageVariablesInEntirePackageNumber = 0;

INSERT  INTO AdminDBA.dbo.LogSSISErrors_Package
        ( PackageID ,
          PackageName ,
          PackagePath ,
          PackageCreatorsName ,
          PackageFormatVersion ,
          PackageType ,
          PackageDescription ,
          PackageVersionMajor ,
          PackageVersionMinor ,
          PackageVersionBuild ,
          PackageVersionGUID ,
          PackageTasksNumber ,
          PackageXML ,
          PackageContainersNumber ,
          PackageDataFlowsNumber ,
          PackageConnectionManagersNumber ,
          PackageVariablesInEntirePackageNumber ,
          CurrentlyUsed
        )
        SELECT  PackageId ,
                PackageName ,
                PackagePath ,
                PackageCreatorsName ,
                PackageFormatVersion ,
                PackageType ,
                PackageDescription ,
                PackageVersionMajor ,
                PackageVersionMinor ,
                PackageVersionBuild ,
                PackageVersionGUID ,
                PackageTasksNumber ,
                PackageXML ,
                PackageContainersNumber ,
                PackageDataFlowsNumber ,
                PackageConnectionManagersNumber ,
                PackageVariablesInEntirePackageNumber ,
                1 AS CurrentlyUsed
        FROM    @AllpkgStats

IF EXISTS ( SELECT  *
            FROM    sys.tables
            WHERE   name = N'pkgStats' )
    DROP	TABLE pkgStats;

EXEC sp_configure 'xp_cmdshell', 0;
RECONFIGURE;
EXEC sp_configure 'show advanced options', 0;
RECONFIGURE;

Upon execution completion this code should create a small set of tables starting with LogSSISErrors prefix and metadata entries in some or all tables based on your local SQL Server instance as per the image below.

Error_Capture_DB_AdminDBA_Objects_OnStart

In the NEXT POST I will go through the process of creating a sample test environment, an SSIS package running an error-generating process and another variation of the above stored procedure used to update the database as the environment changes occur. Also, all the code used in this solution can be downloaded from my OneDrive folder HERE.

Tags: , ,

Microsoft SQL Server Hardware Evaluation and Sizing – Best Practices Guide (Part 2)

October 2nd, 2014 / 1 Comment » / by admin

Continuing on from the FIRST post to this series, this installment focuses on arguably the most important aspects of SQL Server hardware sizing – CPU, RAM and storage selection. Let’s start with a processor selection.

CPU

Central processing unit is the heart of every system and that is particularly evident when dealing with database workloads as CPU selection drives other components selection e.g. how many CPU sockets you can have in the server, physical memory capacity, I/O capacity and upgrade path etc. SQL Server can work with both – Intel and AMD processors, however, there are some distinct differences between the two options. Intel has been dominating the single-threaded performance race for a long time now whereas most modern AMD CPUs have a high physical core count which sometimes makes them more suitable, when compared to Intel counterparts, for certain types of deployments e.g. SQL Server Business Intelligence edition (licensed per server + CALs rather than cores). Starting with SQL Server 2012 version Microsoft introduced SQL Server Core Factor table which allows for up to 25 percent discount when licensing a server with AMD CPUs, however, AMD CPUs are still difficult to recommend given their lower overall single-threaded performance and increased licensing cost. As a result of that I will only be discussing Intel CPUs selection for SQL Server deployments.

Based on the current range of CPUs from Intel (Q1, 2015), the following processor models constitute my recommendation selection for both OLTP and DW/DSS workloads based on the socket types.

SQL_Hardware_Eval_2_sockets_OLTP

SQL_Hardware_Eval_2_sockets_DW

SQL_Hardware_Eval_4_sockets_OLTP

SQL_Hardware_Eval_4_sockets_DW

SQL_Hardware_Eval_8_sockets_OLTP

SQL_Hardware_Eval_8_sockets_DW

As new and improved processors are introduced the above list will gradually become less applicable so make sure you look at Intel website for new or upcoming releases. Also, rather than just taking a blind shot at the CPU model, it is always a good practice to analyse CPU’s performance using benchmarking tools or at least look at the scores obtained by the systems utilizing those chips if available online. The two most commonly used ones are TPC-E (OLTP focused) and TPC-H (Decision Support focused) benchmarks.

TPC Benchmark™ E (TPC-E) is an On-Line Transaction Processing (OLTP) workload developed by the TPC. The TPC-E benchmark uses a database to model a brokerage firm with customers who generate transactions related to trades, account inquiries, and market research. The brokerage firm in turn interacts with financial markets to execute orders on behalf of the customers and updates relevant account information. The benchmark is “scalable,” meaning that the number of customers defined for the brokerage firm can be varied to represent the workloads of different-size businesses. The benchmark defines the required mix of transactions the benchmark must maintain. The TPC-E metric is given in transactions per second (tps). It specifically refers to the number of Trade-Result transactions the server can sustain over a period of time. Although the underlying business model of TPC-E is a brokerage firm, the database schema, data population, transactions, and implementation rules have been designed to be broadly representative of modern OLTP systems.

Current (18 March, 2015) top TPC-E performers are as per the screenshot below.

SQL_Hardware_Eval_TPCE_Top_Perfomers

The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries. These aspects include the selected database size against which the queries are executed, the query processing power when queries are submitted by a single stream, and the query throughput when queries are submitted by multiple concurrent users. The TPC-H Price/Performance metric is expressed as $/QphH@Size.

Current (18 March, 2015) top TPC-H performers are as per the screenshot below.

SQL_Hardware_Eval_TPCH_Top_Perfomers

Storage

Properly configured storage, another key component of deploying a robust SQL Server infrastructure, can have a drastic impact on how you database performs. There is a multitude of factors to consider e.g. budget, server usage, your existing hardware, performance expectations etc. but the first one to consider should be the expected workload type. As discussed in PART 1 of this series, SQL Server can perform a few different types of workloads e.g. OLTP, DW/DSS, OLAP or a mixture of those. These workloads have different I/O access patterns and the choices in storage should reflect those. Here are some of the most common storage activities and characteristics performed by the database that reflect those.

  • OLTP workloads – have frequent writes to data files and log files as well as frequent random reads from data files if the database does not fit into memory.
  • DW/DSS workloads – large sequential read from data files.
  • OLAP workloads – frequent, random reads from cubes.

SQL Server can be typically deployed onto five different types of storage which as was the case with the workload types, will affect performance levels. These are:

  • Internal drives – your traditional magnetic drives or more modern solid state drives (SSDs)
  • PCI-E storage cards
  • Storage Area Network (SAN)
  • Direct-attached storage
  • Server message block (SMB) file shares (Windows Server 2012 and higher)

Nevertheless, storage type is not the only culprit of poor storage-related performance issues. These can also come from RAID level that the disks have been provisioned with, number of disks, disk speed e.g. 10K drives or 15K drives, bandwidth of the RAID controller and many more so it’s important to not only look at the storage medium type but also at the extra features it carries and how it has been provisioned/configured. The following ’10 commandments’ typically apply to a holistic storage configuration for SQL Server.

SQL_Hardware_Eval_Storage_Config_Recom

Event though the above applies to storage configuration in general, regardless of whether the workload targets OLTP or DW/DSS environments, data warehouse systems rely heavily on storage access speed as the copious volumes of data cannot be cached in memory. Selecting the fastest storage medium and configuration with the best sequential read speed e.g. PCI-E cards in RAID1 array will provide a significant performance boost. If PCI-E card(s) cost is considered prohibitive, solid state drives can deliver much better performance over magnetic disk drives on a budget whilst including simplified management, lower operational costs, and predictable maintenance. The following advantages typically characterize SSD storage and its choice over magnetic disk drives.

  • Simplified management – Solid state storage does not require fragmentation management. The SQL Server startup option –E should still be used, but no further optimization or management of page allocation is required. This simplification makes long-term management of FTDW environments significantly easier. In addition, larger disk groups and lower volume/LUN counts can be used with no negative performance implications. This change simplifies filegroup creation and maintenance.
  • I/O resiliency – Solid state storage has minimal performance degradation under high concurrency or page fragmentation. In addition, mixed random read (seek) workload does not negatively impact large request (scan) I/O patterns.
  • Predictable maintenance – Many solid-state storage options provide software based write-life monitoring with lower frequency of difficult-to-predict physical failures.
  • Lower operational cost – While more expensive at list-price, solid-state storage offers a more efficient balance of I/O throughput to capacity per unit. Effective FTDW workload I/O rates for 300 GB 10k SAS HDD average 50 MBs. Enterprise MLC SSD delivers between 150 and 200 MBs at 600-GB capacity. Additionally, solid-state storage draws significantly less power, generates less heat, and often supports higher density solutions.

Additional to the storage configuration guidelines outline above, the following adjustments can be employed to maximise DW/DSS performance when coupled with a solid state drives for primary data storage.

  • If mirroring is required, RAID1+0 or RAID5 can be used. RAID5 provides the best capacity at no performance penalty for FTDW workloads on solid state.
  • LUN and volume count can be reduced to as low as one PRI volume per storage unit. It is helpful in some cases to have the PRI volume count be a multiple of the CPU core count. The minimum PRI volume count is two.
  • The transaction log can be placed on solid state as well, but FTDW workloads are not typically log-bound. Cost can be reduced by placing the log on traditional HDD. The same is true of local storage for Windows Server and SQL Server installation.
  • Recommendations for page fragmentation management and cluster index parallel loading can be ignored, because logical database fragmentation does not impact solid stage I/O performance.

In order to evaluate the existing storage performance or to approximate the requirements for the storage system that would resolve the most common storage-related issues i.e. low IOPS and throughput and high latency, there is a number of tools that can be used to facilitate this analysis. The most common ones are SQLIO from Microsoft and CrystalDiskMark, both free. The latter one can be downloaded as a portable edition (no installation required) from HERE and the rest is very simple as the interface is simple and intuitive. CrystalDiskMark generates sequential & random, read & write loads at your storage system in 512KB and 4KB chunks. This gets you quick answers, but not necessarily SQL Server-related – after all, SQL Server doesn’t read or write in 512KB or 4KB chunks. Nevertheless, this tool is a quick and straightforward way to get a glimpse into your storage throughput capabilities as per the images below (samples from my Samsung M.2 SATA/600 SSD and Patriot Rage XT USB 3.0 flash drive for comparison)

SQL_Hardware_Eval_CrystalDiskMark_Sample_Perf_Results

SQLIO, on the other hand, is a little more comprehensive tool that even though has nothing to do with SQL Server (as the name may imply), it supports a large number of options and provides a more comprehensive view of the storage subsystem. Brent Ozar’s blog and THIS POST has a good overview on getting started with using SQLIO so I will not go into details here but even though the tool has no GUI (most of testing is done in the command line), don’t let that scare you off – with a little bit of notepad magic and Google the implementation is very straightforward. Below is the output of my laptop’s storage (single Samsung M.2 SSD in Lenovo T440s) performance test using SQLIO with core performance measures highlighted.

SQL_Hardware_Eval_SQLIO_Sample_Output

Memory

SQL Server memory upgrade/selection is probably the cheapest way to boost its performance e.g. at the time of writing this post Amazon had a 128GB of DDR3 memory available for around 1,500 dollars which compared to the price of SQL Server licence or changing an application code seem like a rounding error. For DW/DSS systems, in their Fast Track Data Warehouse Reference Guide (SQL Server 2012) Microsoft recommends a minimum of 64GB for a single-socket server which grows linearly with the socket counts available in the system e.g. two-socket server – 128GB, four-socket server – 256GB etc. For maximum memory recommendation, doubling the minimum is a good estimate as per table below.

SQL_Hardware_Eval_MS_Memory_Recom

The following considerations are also important to bear in mind when evaluating system memory requirements:

  • Query from cache – Workloads that service a large percentage of queries from cache may see an overall benefit from increased RAM allocations as the workload grows.
  • Hash joins and sorts – Queries that rely on large-scale hash joins or perform large-scale sorting operations will benefit from large amounts of physical memory. With smaller memory, these operations spill to disk and heavily utilise tempdb, which introduces a random I/O pattern across the data drives on the server.
  • Loads – Bulk inserts can also introduce sorting operations that utilise tempdb if they cannot be processed in available memory.
  • xVelocity memory-optimized columnstore index – Workloads that heavily favor columnstore index query plans run more efficiently with memory pools at the higher end of the ranges listed in the table above.

Naturally these recommendations will need to be reconciled against the SQL Server edition you plan on running as different versions have different limitations on the amount of memory used by the database engine or other SQL Server components (see PART 1 for specifics) e.g. SQL Server Standard Edition (2014 version) allows up to 128GB for database engine and additional 64GB for SASS and SSRS.

When it comes to OLTP workloads, the general rule of thumb is that as long as the database (including indexes) fits into memory, than the performance should be adequate, however, when sizing memory for a new server more is always better. If using Standard or BI Edition (SQL Server 2014), allocating as much memory as the licence permits i.e. 128GB for database engine is a good practice. When running SSAS or/and SSAS on the same server Microsoft allows addition 64GB for Standard Edition and as much as the OS can take for the BI equivalent. Things get a little tricky when dealing with Enterprise Edition where, in theory, you can allocate as much as 4TB of RAM for a single instance if running on Windows Server 2012 and higher. If unsure what the adequate size should be buy as much as possible, otherwise running a number of tests and evaluating system performance in view of memory usage should shed some light on the actual requirements and at what point your memory is under strain.

One of the best places to start conducting capacity planning by measuring performance statistics is Performance Monitor, otherwise known as Perfmon. Perfmon allows measuring various performance statistics on a regular interval, and saving the results into a file for further analysis. The database administrator picks the time interval, file format, and which statistics are monitored. SQL Server provides numerous counters for monitoring and evaluating performance of your system. Although there are no hard and fast rules for performance monitoring, you should audit your system periodically to ensure that it lives up to user expectations. The first step in performance monitoring efforts should be collecting baseline information. A baseline has a collection of performance counter values while the system isn’t heavily utilised and performs well. Next you should take benchmarks at various times of day and during different days of week to figure out acceptable ranges for the performance counters you intend to monitor. For example, your tests could reveal that buffer cache hit ratio values between 93.5% and higher are acceptable, but any value below 93.5% indicates sub-optimal performance. Then setup continuous monitoring and define alerts which notify you if any performance counter values fall below (or rise above) a predefined threshold. Although we try to provide some guidelines for performance counter values adequate range for each counter will vary from one environment to next; this is why it’s important to create a baseline and determine acceptable values for each counter. You should watch several memory related counters very closely as they present a clear indication of acceptable or sub-optimal performance.

A large majority of these memory performance counters are available through Memory Manager, Plan Cache and Buffer Manager objects with their associated counters as per table below.

SQL_Hardware_Eval_Memory_Counters

Quest Software also released a handy poster which provides description of key Windows/SQL Perfmon counters with their recommended target values (you can download it from HERE).

Another factor to consider is the effect of different memory configurations on the memory bandwidth of the memory. As you add more DIMMs to a system to populate more of the available memory slots, the bandwidth of the memory can decrease, which can hurt memory access performance somewhat. This effect varies based on the processor and the associated, integrated memory controller that is being used. The latest Intel processors are much less susceptible to this problem. Regardless of this, I still favor using large amounts of RAM, because even ‘slower’ RAM is in order of magnitude faster than your storage subsystem. If you are concerned about this, you should consult the documentation for your server to see the recommendations for how you should configure your memory.

Conclusion

This concludes the second post in this series. The information outlined in these two installments provides a rudimentary ‘check list’ of the core functions that typically require investigating when making a decision on provisioning/procuring a new hardware for SQL Server installation and usually a much more comprehensive approach needs to be taken when evaluating future needs. When faced with an infrastructure upgrade an experienced DBA will carefully establish performance baselines and determine the production capacity needed by an organisation/application to meet current and future demands. Some tools which may assist in capacity planning have already been listed above and typically provide a single point of focus in regards to various pressure points when dealing with different realms of hardware upgrade/update. Sometimes it may be beneficial to use a more comprehensive tooling which gives the administrator more holistic view of system’s performance e.g. SQL Sentry Performance Advisor or Database Performance Analyser. A full-featured trial version of any of these applications can usually be downloaded, installed and evaluated for a period of time so having the ability to view, troubleshoot and report on the system core components’ performance levels can be a great asset and speed up the analysis phase. Below is a screenshot of SQL Sentry Performance Advisor activity dashboard for one of the SQL Server instances (click on image to expand).

SQL_Hardware_Eval_SQLSentry_Dashboard

Regardless of which tool you use to ascertain your performance needs, every SQL Server hardware selection decision, particularly when deploying a new server, should be preceded with a detailed analysis of all the core components making up the platform, not only from the hardware point of view but also factoring in licensing model and projected workload type. There is a lot to take into consideration but a little bit of elbow-grease, backed up by some investigative work and the consideration for best practice guidelines should yield a great performing system with no unexpected or costly surprises.

Tags: