Asynchronous SQL Execution via SQL Server Agent Jobs

May 27th, 2015 / No Comments » / by admin

Recently, one of the clients tasked me with creating a data acquisition routine for an externally hosted MySQL database data, a part of a larger ETL solution to merge dispersed data sources into a staging area for further data warehouse processing. The database size was very small i.e. just over a few gigabytes, distributed across around 35 tables with the largest object holding over 15 million records. Microsoft SQL Server 2014 instance served as the local data storage engine and all the code required to be Microsoft dialect of SQL i.e. Transact-SQL, with data flow and execution governed by Integration Services packages (SSIS). The interface to querying MySQL database data was through a linked server set up on the SQL Server instance using Oracle’s ODBC driver, over 100Mbps internet link. The requirements seemed quite easy to accommodate i.e. create a SSIS package to manage data acquisition pipelines, either through a collection of SSIS-specific transformations or T-SQL code. Also, a few preliminary tests showed that the connection speed should be robust enough to copy or even merge the data across source/target without much hassle.

Once the development commenced, I quickly learned that Integration Services may not have been the best tool for the job (more due to the ODBC driver stability rather then SSIS functionality). Regardless of how much reconfiguration and tweaking went into optimising SSIS package to handle moving the data across the linked server connection, the package could not cope with a few of the large tables, continually failing on execution with cryptic messages and confusing error logs. Likewise, running pure SQL MERGE statements often resulted in connection time-outs or buffer overflows, indicating that the ODBC driver was too flaky and unstable to handle long running queries. The only solution that seemed not to throw the connection out of balance was to abandon more convenient ‘upserts’ for truncations and inserts, broken down into multiple statements to dissect the data flow across the network. The lesser the data needed to go through the network as a single transaction the more reliable the transfer was, with one caveat – the largest MySQL database table (15 million records) was taking nearly one hour to read from and insert into the local SQL Server instance. Regardless of how many batches the data acquisition job was comprised of, the performance oscillated around 55 minutes mark – an unacceptable result by both my and the client’s standards, even if proven to solve the reliability issues. As data acquisition failures occurrence increased with the source tables records count (when executed as a single transaction) and breaking it up into multiple batches seemed to provide the needed stability, albeit at the expense of prolonged execution, the one potential solution and a silver bullet to this conundrum was to find a half-point between the two approaches. That compromise turned out to be batch processing (stability) combined with asynchronous, concurrent execution (speed).

Microsoft SQL Server does not offer built-in asynchronous SQL processing – all statements part of a query execute in a sequential, non-parallel fashion. Even when looping through a process using a WHILE statement or a cursor, each SQL statement gets invoked in a synchronous manner. There are different approaches which can help achieving concurrent SQL execution, however, these require either a programming knowledge or a fair amount of inelegant hacking to circumvent this limitation. One simple solution that I found worked really well for my project was to create a number of SQL Server Agent jobs (using T-SQL only), which would get initialised in parallel to dissect the query thus ‘multi-threading’ the whole process and shaving a considerable amount of time off the final execution duration.

Let’s look at an example of two tables placed in two efferent databases simulating this event – one containing ten million records of mock data and the second one containing no data which will become the target of the parallel INSERT operation. The following code creates two sample databases with their corresponding objects and populates the source table with dummy data.

--CREATE 'Source' AND 'Target' DATABASES
USE master;
GO
IF DB_ID('SourceDB') IS NOT NULL
    BEGIN
        ALTER DATABASE SourceDB SET SINGLE_USER
        WITH ROLLBACK IMMEDIATE;
        DROP DATABASE SourceDB;
    END;
GO
CREATE DATABASE SourceDB;
GO
ALTER DATABASE SourceDB SET RECOVERY SIMPLE;
GO
ALTER DATABASE SourceDB
MODIFY FILE
    (NAME = SourceDB,
    SIZE = 10240MB);
GO

USE master;
GO
IF DB_ID('TargetDB') IS NOT NULL
    BEGIN
        ALTER DATABASE TargetDB SET SINGLE_USER
        WITH ROLLBACK IMMEDIATE;
        DROP DATABASE TargetDB;
    END;
GO
CREATE DATABASE TargetDB;
GO
ALTER DATABASE TargetDB SET RECOVERY SIMPLE;
GO
ALTER DATABASE TargetDB
MODIFY FILE
    (NAME = TargetDB,
    SIZE = 10240MB);
GO

--CREATE 'Source' AND 'Target' TABLES
USE SourceDB;
CREATE TABLE dbo.SourceTable
    (
      ID INT IDENTITY(1, 1) NOT NULL ,
      Value1 INT NOT NULL ,
      Value2 INT NOT NULL ,
      Value3 DECIMAL(10, 2) NOT NULL ,
      Value4 DATETIME NOT NULL ,
      Value5 DATETIME NOT NULL ,
	  Value6 NVARCHAR (512) NOT NULL,
	  Value7 NVARCHAR (512) NOT NULL,
	  Value8 UNIQUEIDENTIFIER,
      CONSTRAINT PK_ID PRIMARY KEY CLUSTERED ( ID ASC )
    );

USE TargetDB;
CREATE TABLE dbo.TargetTable
    (
      ID INT IDENTITY(1, 1) NOT NULL ,
      Value1 INT NOT NULL ,
      Value2 INT NOT NULL ,
      Value3 DECIMAL(10, 2) NOT NULL ,
      Value4 DATETIME NOT NULL ,
      Value5 DATETIME NOT NULL ,
	  Value6 NVARCHAR (512) NOT NULL,
	  Value7 NVARCHAR (512) NOT NULL,
	  Value8 UNIQUEIDENTIFIER,
      CONSTRAINT PK_ID PRIMARY KEY CLUSTERED ( ID ASC )
    );

--POPULATE 'SourceTable' WITH TEST DATA
USE SourceDB;
SET NOCOUNT ON;
DECLARE @count INT = 0;
DECLARE @records_to_insert INT = 10000000;
WHILE @count < @records_to_insert
    BEGIN
        INSERT  INTO dbo.SourceTable
                ( Value1 ,
                  Value2 ,
                  Value3 ,
                  Value4 ,
                  Value5 ,
				  Value6 ,
				  Value7 ,
				  Value8
                )
                SELECT  10 * RAND() ,
                        20 * RAND() ,
                        10000 * RAND() / 100 ,
                        DATEADD(ss, @count, SYSDATETIME()) ,
                        CURRENT_TIMESTAMP ,
						REPLICATE(CAST(NEWID() AS NVARCHAR(MAX)),10) ,
						REPLICATE(CAST(RAND() AS NVARCHAR(MAX)),10) ,
						NEWID();
        SET @count = @count + 1;
    END;

Next let’s test out sequential INSERT performance, inserting all records from SourceDB database and SourceTable table into TargetDB database and TargetTable table. For this purpose I simply run INSERT statement, coping all the data across with no alterations or changes in the target object generating the following execution plan.

SET STATISTICS TIME ON;
GO
INSERT  INTO TargetDB.dbo.TargetTable
        ( Value1 ,
          Value2 ,
          Value3 ,
          Value4 ,
          Value5 ,
		  Value6 ,
		  Value7 ,
		  Value8
        )
        SELECT  Value1 ,
                Value2 ,
                Value3 ,
                Value4 ,
                Value5 ,
				Value6 ,
				Value7 ,
				Value8
        FROM    SourceDB.dbo.SourceTable;
GO
SET STATISTICS TIME OFF;
GO

Async_SQL_Exec_INSERT_Statement

Given that SQL Server 2014 edition offers some enhancements to SELECT…INTO execution, I think it’s worthwhile to also try it out as an alternative. If there is a provision to create the target table from scratch and the developer has the liberty to execute SELEC…INTO in place of an INSERT statement, Microsoft claims that this improvement will allow it to run in a parallel mode in SQL Server 2014 (as confirmed by the execution plan below), speeding up the execution considerably.

IF OBJECT_ID('TargetDB.dbo.TargetTable', 'U') IS NOT NULL
  BEGIN
	DROP TABLE TargetDB.dbo.TargetTable
  END

SET STATISTICS TIME ON;
GO
SELECT			Value1 ,
                Value2 ,
                Value3 ,
                Value4 ,
                Value5 ,
				Value6 ,
				Value7 ,
				Value8
		INTO	TargetDB.dbo.TargetTable
        FROM    SourceDB.dbo.SourceTable;
GO
SET STATISTICS TIME OFF;
GO

Async_SQL_Exec_SELECT_INTO_Statement

Finally, let look at the solution to enforce parallel queries execution, in this case also INSERT statement by means of ‘spinning up’ multiple SQL Server Agent jobs. The following code creates a stored procedure which ‘dissects’ all source table records into multiple, record-count comparable batches (the exact number controlled by the parameter value passed) and generates SQL Server Agent jobs executing in parallel.

USE [SourceDB]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[usp_asyncTableInsert]
(@Source_DB_Name			VARCHAR				(128) ,
@Source_DB_Schema_Name		VARCHAR				(128) ,
@Source_DB_Object_Name		VARCHAR				(256) ,
@Target_DB_Name				VARCHAR				(128) ,
@Target_DB_Schema_Name		VARCHAR				(128) ,
@Target_DB_Object_Name		VARCHAR				(256) ,
@SQL_Exec_No				VARCHAR				(10))
AS
BEGIN
	SET NOCOUNT ON;

	/*
	Declare additional variables:
	(1) @SQL				- stores dynamic SQL string
	(2) @Error_Massage		- stores error message string
	(3) @Is_Debug			- stores a binary flag for allowing/disallowing displaying messages during procedure execution
	(4) @Check_Count		- stores the starting number for the cursor execution loop
	(5) @Max_Check_Count	- stores the maximum number for the cursor execution loop
	*/
    DECLARE @SQL				NVARCHAR(MAX)
	DECLARE @Error_Message		VARCHAR (4000)
	DECLARE @Is_Debug			INT = 1			

	/*
	Create temporary table to hold the ranges of values for dissecting
	the 'source' table into semi-equal chunks of data for further processing
	*/
    IF OBJECT_ID('tempdb..#Ids_Range') IS NOT NULL
		BEGIN
			DROP TABLE #Ids_Range;
        END;
    CREATE TABLE #Ids_Range
    (
    id SMALLINT IDENTITY(1, 1) ,
    range_FROM BIGINT ,
    range_TO BIGINT
    );				

	/*
	Create and populate temporary variables and #Ids_Range temporary table with value ranges
	This table contains value ranges i.e. pairs of values which will be used to build the WHERE
	clause of the INSERT statment, breaking down all the 'source' records into smaller chunks
	*/
	SET @SQL =			'DECLARE @R1 INT = (SELECT MIN(ID) AS ID FROM '									+CHAR(13)
	SET @SQL = @SQL +	''+@Source_DB_Name+'.'+@Source_DB_Schema_Name+'.'+@Source_DB_Object_Name+')'	+CHAR(13)
	SET @SQL = @SQL +	'DECLARE @R2 BIGINT = (SELECT (MAX(ID)-MIN(ID)+1)/'+@SQL_Exec_No+' AS ID FROM'	+CHAR(13)
	SET @SQL = @SQL +	''+@Source_DB_Name+'.'+@Source_DB_Schema_Name+'.'+@Source_DB_Object_Name+')'	+CHAR(13)
	SET @SQL = @SQL +	'DECLARE @R3 BIGINT = (SELECT MAX(ID) AS ID FROM '								+CHAR(13)
	SET @SQL = @SQL +	''+@Source_DB_Name+'.'+@Source_DB_Schema_Name+'.'+@Source_DB_Object_Name+')'	+CHAR(13)
	SET @SQL = @SQL +	'DECLARE @t int = @r2+@r2+2 '													+CHAR(13)
	SET @SQL = @SQL +	'INSERT INTO #Ids_Range '														+CHAR(13)
	SET @SQL = @SQL +	'(range_FROM, range_to) '														+CHAR(13)
	SET @SQL = @SQL +	'SELECT @R1, @R2 '																+CHAR(13)
	SET @SQL = @SQL +	'UNION ALL '																	+CHAR(13)
	SET @SQL = @SQL +	'SELECT @R2+1, @R2+@R2+1 '														+CHAR(13)
	SET @SQL = @SQL +	'WHILE @t<=@r3 '																+CHAR(13)
	SET @SQL = @SQL +	'BEGIN '																		+CHAR(13)
	SET @SQL = @SQL +	'INSERT INTO #Ids_Range '														+CHAR(13)
	SET @SQL = @SQL +	'(range_FROM, range_to) '														+CHAR(13)
	SET @SQL = @SQL +	'SELECT @t, CASE WHEN (@t+@r2)>=@r3 THEN @r3 ELSE @t+@r2 END '					+CHAR(13)
	SET @SQL = @SQL +	'SET @t = @t+@r2+1 '															+CHAR(13)
	SET @SQL = @SQL +	'END'																			+CHAR(13)

	EXEC(@SQL)

	/*
	Truncate Target table if any data exists
	*/
	SET @SQL =			'IF EXISTS (SELECT TOP 1 1 FROM '														+CHAR(13)
	SET @SQL = @SQL +	''+@Target_DB_Name+'.'+@Target_DB_Schema_Name+'.'+@Target_DB_Object_Name+''				+CHAR(13)
	SET @SQL = @SQL +	'WHERE ID IS NOT NULL) BEGIN '															+CHAR(13)
	SET @SQL = @SQL +	'TRUNCATE TABLE '																		+CHAR(13)
	SET @SQL = @SQL +	''+@Target_DB_Name+'.'+@Target_DB_Schema_Name+'.'+@Target_DB_Object_Name+' END'			+CHAR(13)

	EXEC(@SQL)

	/*
	Create temporary #Temp_Tbl_AgentJob_Stats table to store SQL Server Agent job
	*/
	IF OBJECT_ID('tempdb..#Temp_Tbl_AgentJob_Stats') IS NOT NULL
    BEGIN
		DROP TABLE #Temp_Tbl_AgentJob_Stats;
    END;
    CREATE TABLE #Temp_Tbl_AgentJob_Stats
    (
    ID SMALLINT IDENTITY(1, 1) ,
    Job_Name VARCHAR(256) ,
    Job_Exec_Start_Date DATETIME
    );		

	/*
	Create a cursor and a range of variables to define SQL execution string
	to create and manage SQL Server Agent jobs (exact count defined by @SQL_Exec_No
	varaiable). This SQL splits the INSERT statment into multiple, concurrently-executing
	batches which run in asunchronous mode. This part of code also manages stopping
	and deleting already created jobs in case a re-run is required as well as deleting
	SQL Agent jobs created as part of this process on successful completion
	*/
	IF CURSOR_STATUS('global', 'sp_cursor') >= -1
		BEGIN
			DEALLOCATE sp_cursor
		END
		DECLARE @z INT
		DECLARE @err INT
		DECLARE sp_cursor CURSOR
		FOR
		SELECT id FROM #ids_range
		SELECT  @err = @@error
        IF @err <> 0
			BEGIN
				DEALLOCATE sp_cursor
				RETURN @err
            END
		OPEN sp_cursor
		FETCH NEXT
		FROM sp_cursor INTO @z
		WHILE @@FETCH_STATUS = 0
			BEGIN
				DECLARE
				@range_from		VARCHAR(10)		= (SELECT CAST(range_FROM AS VARCHAR(10)) FROM #ids_range where id = @z),
				@range_to		VARCHAR(10)		= (SELECT CAST(range_TO AS VARCHAR(10)) FROM #ids_range where id = @z),
				@job_name		VARCHAR (256)	= 'Temp_'+UPPER(LEFT(@Target_DB_Object_Name,1))+LOWER(SUBSTRING(@Target_DB_Object_Name,2,LEN(@Target_DB_Object_Name)))+'_TableSync_'+'AsyncJob'+'_'+CAST(@z AS VARCHAR (20)),
				@job_owner		VARCHAR (256)	= 'sa'
				DECLARE
				@sql_job_delete VARCHAR (400)	= 'EXEC msdb..sp_delete_job @job_name='''''+@job_name+''''''
				DECLARE
				@sql_job		NVARCHAR(MAX)	=
				'INSERT  INTO TargetDB.dbo.TargetTable
				(Value1, Value2, Value3, Value4, Value5, Value6, Value7,Value8)
				SELECT Value1, Value2, Value3, Value4, Value5, Value6, Value7, Value8
				FROM SourceDB.dbo.SourceTable
				WHERE id >= '+cast(@range_FROM as varchar (20))+' AND ID <= '+cast(@range_to as varchar(20))+''

				SET @SQL =			'IF EXISTS'
				SET @SQL = @SQL +	'(SELECT TOP 1 1 FROM msdb..sysjobs_view job JOIN msdb.dbo.sysjobactivity activity'					+CHAR(13)
				SET @SQL = @SQL +	'ON job.job_id = activity.job_id WHERE job.name = N'''+@job_name+''''								+CHAR(13)
				SET @SQL = @SQL +	'AND activity.start_execution_date IS NOT NULL AND activity.stop_execution_date IS NULL)'			+CHAR(13)
				SET @SQL = @SQL +	'BEGIN'																								+CHAR(13)
				SET @SQL = @SQL +	'EXEC msdb..sp_stop_job @job_name=N'''+@job_name+''';'												+CHAR(13)
				SET @SQL = @SQL +	'EXEC msdb..sp_delete_job @job_name=N'''+@job_name+''', @delete_unused_schedule=1'					+CHAR(13)
				SET @SQL = @SQL +	'END'																								+CHAR(13)
				SET @SQL = @SQL +	'IF EXISTS'																							+CHAR(13)
				SET @SQL = @SQL +	'(SELECT TOP 1 1 FROM msdb..sysjobs_view job JOIN msdb.dbo.sysjobactivity activity'					+CHAR(13)
				SET @SQL = @SQL +	'ON job.job_id = activity.job_id WHERE job.name = N'''+@job_name+''''								+CHAR(13)
				SET @SQL = @SQL +	'AND activity.start_execution_date IS NULL AND activity.stop_execution_date IS NOT NULL)'			+CHAR(13)
				SET @SQL = @SQL +	'BEGIN'																								+CHAR(13)
				SET @SQL = @SQL +	'EXEC msdb..sp_delete_job @job_name=N'''+@job_name+''', @delete_unused_schedule=1'					+CHAR(13)
				SET @SQL = @SQL +	'END'																								+CHAR(13)
				SET @SQL = @SQL +	'EXEC msdb..sp_add_job '''+@job_name+''', @owner_login_name= '''+@job_owner+''';'					+CHAR(13)
				SET @SQL = @SQL +	'EXEC msdb..sp_add_jobserver @job_name= '''+@job_name+''';'											+CHAR(13)
				SET @SQL = @SQL +	'EXEC msdb..sp_add_jobstep @job_name='''+@job_name+''', @step_name= ''Step1'', '					+CHAR(13)
				SET @SQL = @SQL +	'@command = '''+@sql_job+''', @database_name = '''+@Target_DB_Name+''', @on_success_action = 3;'	+CHAR(13)
				SET @SQL = @SQL +	'EXEC msdb..sp_add_jobstep @job_name = '''+@job_name+''', @step_name= ''Step2'','					+CHAR(13)
				SET @SQL = @SQL +   '@command = '''+@sql_job_delete+''''																+CHAR(13)
				SET @SQL = @SQL +	'EXEC msdb..sp_start_job @job_name= '''+@job_name+''''												+CHAR(13)
																																		+REPLICATE(CHAR(13),4)
				EXEC (@SQL)

				WAITFOR DELAY '00:00:01'					

				INSERT INTO #Temp_Tbl_AgentJob_Stats
				(Job_Name, Job_Exec_Start_Date)
				SELECT job.Name, activity.start_execution_date
				FROM msdb.dbo.sysjobs_view job
				INNER JOIN msdb.dbo.sysjobactivity activity
				ON job.job_id = activity.job_id
				WHERE job.name = @job_name    

                FETCH NEXT
				FROM sp_cursor INTO @z
			END
			CLOSE sp_cursor
			DEALLOCATE sp_cursor
END

Running the stored procedure (with the prior target table truncation) will generate as many INSERT statements as specified by the value of @SQL_Exec_No parameter passed. When executed, we can observe multiple instances of the SQL statement inserting records falling between certain ID ranges, here shown as an output from Adam Mechanic’s sp_whoisactive stored procedure as well as SQL Server Agent jobs view. Given that the value of @SQL_Exec_No parameter was 5, there were five instances of this statement running in parallel as per image below.

Async_SQL_Exec_StoredProc_Agent_Parallel_View

Looking at the execution times and the differences between individual approaches it is evident that the enhancements made in SQL Server 2014 edition regarding parallelisation of SELECT…INTO statement are profound, however, given the optimizer taking advantage of multiple cores I also noticed a large spike in CPU utilization i.e. 20% for INSERT and 90% for SELECT…INTO. If, on the other hand, SELECT…INTO is not an option and a substantial amount of data is to be copied as part of a long-running transaction, asynchronous execution via SQL Server Agent jobs seems like a good solution.

Async_SQL_Exec_Speed_Comparison

I have only scratched the surface of how SQL Server Agent jobs can facilitate workload distribution across the server to speed up data processing and take advantage of SQL Server available resource pool. There are other ways to achieve the same result but given the simplicity and the ease of development/provisioning I think that taking advantage of this technique can provide a quick and robust workaround to parallel SQL statements execution with SQL Server.

Tags: ,

In-Memory Natively Compiled vs Interpreted Stored Procedures Performance Comparison in SQL Server 2014

January 12th, 2015 / No Comments » / by admin

Introduction

Whilst the past couple of pre-2014 SQL Server releases have had a strong focus on business intelligence (BI), 2014 edition focuses largely on relational database features and support. Amongst few compelling new additions presented in SQL Server 2014 release is In-Memory OLTP engine, which promises some big performance improvements for OLTP applications.

In the majority of relational databases query runtime cost is typically comprised of latching/locking, disk I/O as well as the overhead associated with the engine interpreting the code thus increasing the number of CPU cycles required to produce the desired output. As most of those roadblocks are mitigated by means of storing the data in memory and taking advantage of ‘bypassing’ SQL Server interpretive engine in favor of complied code, In-Memory OLTP is claimed to have the ability to improve performance by the factor of 20 or more (depending on the workload, data, transaction type and hardware used).

Performance Testing

In this short post I’d like to perform a rudimentary analysis of the performance improvements achieved through utilizing In-Memory OLTP with compiled stored procedure versus disk-based/interpreted code implementation. My test bed is my old trusty Lenovo laptop (X230 series) with quad-core Intel i5-3320M CPU (2.60GHz), 16 gigabytes of RAM and Samsung 840 PRO SSD running Windows 8.1 and SQL Server 2014 Developers Edition installed. I will be comparing execution times for INSERT and UPDATE statements in in-memory database for both: natively compiled stored procedures and standard, ‘interpreted’ T-SQL code.

Let’s start with creating a sample database, a memory optimized table and four stored procedures used for this demonstration. Notice that the code difference between the first CREATE STORED PROCEDURE SQL statements and the second one is the fact that ‘usp_insert_data_compiled’ stored procedure, creates functionally equivalent construct to its non-compiled counterpart with the distinction of compiling the code into processor instructions which can be executed directly by the CPU, without the need for further compilation or interpretation thus invoking fewer CPU instructions then the equivalent interpreted T-SQL.

--CREATE InMemoryDemo SAMPLE DATABASE
USE master
GO
IF DB_ID('InMemoryDBSample') IS NOT NULL
    BEGIN
        DROP DATABASE InMemoryDBSample
    END
CREATE DATABASE InMemoryDBSample ON PRIMARY
(NAME = N'InMemoryDemo_Data',
FILENAME = N'c:\DBSample\InMemoryDemo.mdf'), FILEGROUP InMemoryDemo_fg
    CONTAINS MEMORY_OPTIMIZED_DATA
(NAME = N'InMemoryDemo_Dir',
FILENAME = N'c:\DBSample\InMemoryDemo_Mod') LOG ON  (NAME = 'InMemoryDemo_Log',
FILENAME = N'c:\DBSample\InMemoryDemo.ldf')
GO

--create table
USE InMemoryDBSample
CREATE TABLE SampleTable
    (
      ID INT IDENTITY(1, 1)
             NOT NULL ,
      Value1 INT NOT NULL ,
      Value2 INT NOT NULL ,
      Value3 DECIMAL(10, 2) NOT NULL ,
      Value4 DATETIME NOT NULL ,
      Value5 DATETIME NOT NULL ,
      CONSTRAINT PK_ID PRIMARY KEY NONCLUSTERED ( ID ASC )
    )
    WITH (
         MEMORY_OPTIMIZED =
         ON,
         DURABILITY =
         SCHEMA_ONLY)
--CREATE In-Memory TABLE WITH PRIMARY KEY ON IDENTITY COLUMN
USE InMemoryDBSample
CREATE TABLE SampleTable
    (
      ID INT IDENTITY(1, 1)
             NOT NULL ,
      Value1 INT NOT NULL ,
      Value2 INT NOT NULL ,
      Value3 DECIMAL(10, 2) NOT NULL ,
      Value4 DATETIME NOT NULL ,
      Value5 DATETIME NOT NULL ,
      CONSTRAINT PK_ID PRIMARY KEY NONCLUSTERED ( ID ASC )
    )
    WITH (
         MEMORY_OPTIMIZED =
         ON,
         DURABILITY =
         SCHEMA_ONLY)
--CREATE NON-COMPILED 'INSERT' STORED PROCEDURE
CREATE PROCEDURE usp_insert_data_noncompiled
    (
      @records_to_insert int
    )
AS
    BEGIN
        SET NOCOUNT ON;
        DECLARE @var INT = 0
        WHILE @var < @records_to_insert
            BEGIN
                INSERT  INTO dbo.SampleTable
                        ( Value1 ,
                          Value2 ,
                          Value3 ,
                          Value4 ,
                          Value5
                        )
                        SELECT  10 * RAND() ,
                                20 * RAND() ,
                                10000 * RAND() / 100 ,
                                DATEADD(ss, @var, SYSDATETIME()) ,
                                CURRENT_TIMESTAMP;
                SET @var = @var + 1
            END
    END
--CREATE NATIVELY COMPILED 'INSERT' STORED PROCEDURE
CREATE PROCEDURE usp_insert_data_compiled
    (
      @records_to_insert INT
    )
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE =
        N'us_english' )
        DECLARE @var INT = 0
        WHILE @var < @records_to_insert
            BEGIN
                INSERT  INTO dbo.SampleTable
                        ( Value1 ,
                          Value2 ,
                          Value3 ,
                          Value4 ,
                          Value5
                        )
                        SELECT  10 * RAND() ,
                                20 * RAND() ,
                                10000 * RAND() / 100 ,
                                DATEADD(ss, @var, SYSDATETIME()) ,
                                CURRENT_TIMESTAMP;
                SET @var = @var + 1
            END
    END
--CREATE NON-COMPILED 'UPDATE' STORED PROCEDURE
CREATE PROCEDURE usp_update_data_noncompiled
AS
    BEGIN
        SET NOCOUNT ON;
        DECLARE @var INT = 0
        DECLARE @record_ct INT
        SELECT  @record_ct = MAX(ID)
        FROM    dbo.SampleTable
        WHILE @var < @record_ct
            BEGIN
                UPDATE  dbo.SampleTable
                SET     Value1 = 20 * RAND() ,
                        Value2 = 10 * RAND() ,
                        Value3 = 10000 * RAND() / 1000 ,
                        Value4 = DATEADD(ss, -@var, Value4) ,
                        Value5 = DATEADD(ss, -@var, Value4)
                WHERE   ID = @var;
                SET @var = @var + 1
            END
    END
--CREATE NATIVELY COMPILED 'UPDATE' STORED PROCEDURE
CREATE PROCEDURE usp_update_data_compiled
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE =
        N'us_english' )
        DECLARE @var INT = 0
        DECLARE @record_ct INT
        SELECT  @record_ct = MAX(ID)
        FROM    dbo.SampleTable

        WHILE @var < @record_ct
            BEGIN
                UPDATE  dbo.SampleTable
                SET     Value1 = 20 * RAND() ,
                        Value2 = 10 * RAND() ,
                        Value3 = 10000 * RAND() / 1000 ,
                        Value4 = DATEADD(ss, -@var, Value4) ,
                        Value5 = DATEADD(ss, -@var, Value4)
                WHERE   ID = @var;
                SET @var = @var + 1
            END
    END

Before I run the code to put the performance of In-Memory OLTP to test, there are few other things in the above code which are worth noting. The SCHEMA_ONLY clause (on compiled code snippet) indicates that SQL Server will log table creation so that the table schema will be durable but will not log any data manipulation on the table, so the data will not be durable. These type of tables do not require any I/O operations during transaction processing and the data is only available in-memory while SQL Server is running. In the event of SQL Server shutdown the data stored in these tables is purged and will not be persisted on restart, however, all operations on this data meet ACID requirements i.e. are atomic, consistent, isolated and durable. Another important functional aspect to highlight is that apart from WITH_NATIVE_COMPILATION clause added (which as the name implies compiles the SQL code for faster execution) the use of WITH SCHEMABINDING and BEGIN ATOMIC clauses is a requirement.

Let’s look at the execution results for both, non-compiled and compiled ‘Insert’ stored procedures.

--EXECUTE COMPILED 'Insert' STORED PROCEDURE
EXEC dbo.usp_insert_data_compiled @records_to_insert = 1000000
GO
--EXECUTE NON-COMPILED 'Insert' STORED PROCEDURE
EXEC dbo.usp_insert_data_noncompiled @records_to_insert = 1000000
GO

In-Memory_Comparison_Insert_Stored_Proc_Exec_Times

As we can see, ‘Insert’ operation on natively compiled stored procedure yielded a performance increase of over 800 percent, which is a substantial improvement. Likewise, the results from ‘Update’ statement show similar improvements as per the image below.

In-Memory_Comparison_Update_Stored_Proc_Exec_Times

Conclusion

As we can see from this rudimentary exercise, naively compiled stored procedure execution is a lot more efficient, generating far fewer CPU instructions for the SQL Server engine thus improving performance considerably. Compiling stored procedure’s code creates a highly efficient data access path, allowing for much faster data operations which, in most cases, outperform a non-compiled stored procedure that uses an In-Memory table.

Tags: ,

How to create a simple ETL/stored procedure error capturing database (schema model, SQL code and implementation) – Part 2

November 7th, 2014 / No Comments » / by admin

In PART 1 of this series I analysed the execution error tracking database schema as well as the code for ‘scraping’ SQL Server metadata in order to populate the database tables with instance, database, schema and objects-specific information. In this part I will briefly describe SQL code modifications which are applied the original stored procedure (see PART 1) to customize it in order to handle metadata changes as well as the actual implementation and error-logging mechanism during package/stored procedure execution. Also, all the code and any additional files presented in this and previous part can be downloaded from my OneDrive folder HERE.

Before I get into the nuts and bolts on how to log into LogSSISErrors_Error table, which in this case will form the base for all errors entries, let create a sample database, schema, table and stored procedure containing an error-invoking code i.e. executing division by 0. This stored procedure, which will later be appended with additional error-handling code it in order to capture and log execution error(s) through TRY/CATCH T-SQL, will be used as the basis for demonstrating error capturing mechanism.

USE master
IF EXISTS ( SELECT TOP 1
                    *
            FROM    sys.databases
            WHERE   name = 'TestDB' )
    DROP DATABASE TestDB
CREATE DATABASE TestDB
GO
USE TestDB
EXEC sp_executesql N'CREATE SCHEMA dummyschema;';
GO
CREATE TABLE dummyschema.DummyTable
    (
      ID INT IDENTITY(1, 1) ,
      Column1 VARCHAR(56) ,
      Column2 VARCHAR(56) ,
      Column3 VARCHAR(56)
    )
INSERT  INTO dummyschema.DummyTable
        ( Column1 ,
          Column2 ,
          Column3
        )
        SELECT  'dummy value1' ,
                'dummy value1' ,
                'dummy value1'
        UNION ALL
        SELECT  'dummy value1' ,
                'dummy value1' ,
                'dummy value1'
        UNION ALL
        SELECT  'dummy value1' ,
                'dummy value1' ,
                'dummy value1' 
GO
CREATE PROCEDURE dummyschema.usp_DummySP
    (
      @ExecutionInstanceGUID UNIQUEIDENTIFIER ,
      @PackageName NVARCHAR(256)
    )
AS
    BEGIN
        INSERT  INTO TestDB.dummyschema.DummyTable
                    ( Column1 )
                    SELECT  1 / 0
    END

As this database is supposed to be working mainly in conjunction with an SSIS package (standalone stored procedure execution error capture is also fine but in this case the SSIS component would have to be omitted and replaced with a place-holder value), I have also created a sample package used primarily to trigger stored procedure execution, with a couple of system variables capturing package execution instance GUID identifier and package name metadata. This package needs to be saved somewhere on the C:\ drive for the metadata updating stored procedure to find the relevant info and populate LogSSISErrors_Package table accordingly.

Error_Capture_DB_Package_SQL_Statement

Error_Capture_DB_Package_Params_Mapping

Most of that code from PART 1 is also applicable to maintaining the error tracking database by means of modifying INSERT statements into MERGE statements, thus allowing for applying changes to already existing data and inserting new records. The core difference is that in PART 1 all AdminSQL database objects required to be (re)created before we could proceed with SQL Server instance metadata analysis and capture thus making its execution only applicable to a new environment, where the database needed to be rebuilt from scratch. Once SQLAdmin database has been created and initially populated with the code from PART 1, any subsequent execution should account for updates and new data inserts only which are implemented by means of subtle changes to the stored procedure code, mainly replacing INSERTS with MERGE T-SQL. Below is a couple of examples where the code from PART 1 stored procedure was replaced with MERGE SQL statements with the complete code available for download from HERE.

-------------------------------------------------------------------------------------------------------
--Update LogSSISErrors_Schema table
-------------------------------------------------------------------------------------------------------	
MERGE INTO AdminDBA.dbo.LogSSISErrors_Schema AS t
USING
    ( SELECT  DISTINCT
                db.ID AS ID ,
                meta.SchemaID AS SchemaID ,
                meta.SchemaName AS SchemaName ,
                meta.SchemaOwner AS SchemaOwner ,
                1 AS CurrentlyUsed
      FROM      @EnumDBMeta meta
                JOIN AdminDBA.dbo.LogSSISErrors_DB db ON meta.DBName = db.DBName
      WHERE     db.CurrentlyUsed = 1
    ) s
ON ( t.SchemaName = s.SchemaName
     AND s.ID = t.FKDBID
   )
WHEN MATCHED THEN
    UPDATE SET [SchemaID] = s.[SchemaID] ,
               [SchemaName] = s.[SchemaName] ,
               [SchemaOwner] = s.[SchemaOwner] ,
               [CurrentlyUsed] = s.[CurrentlyUsed]
WHEN NOT MATCHED THEN
    INSERT ( [FKDBID] ,
             [SchemaID] ,
             [SchemaName] ,
             [SchemaOwner] ,
             [CurrentlyUsed]
           )
    VALUES ( s.[ID] ,
             s.[SchemaID] ,
             s.[SchemaName] ,
             s.[SchemaOwner] ,
             s.[CurrentlyUsed]
           )
WHEN NOT MATCHED BY SOURCE THEN
    UPDATE SET [CurrentlyUsed] = 0;


-------------------------------------------------------------------------------------------------------
--Update LogSSISErrors_Process table
-------------------------------------------------------------------------------------------------------	
MERGE INTO AdminDBA.dbo.LogSSISErrors_Process AS t
USING
    ( SELECT DISTINCT
                meta.ProcessObjectID ,
                meta.ProcessObjectName ,
                sch.ID ,
                1 AS CurrentlyUsed
      FROM      @EnumDBMeta meta
                JOIN AdminDBA.dbo.LogSSISErrors_Schema AS sch ON sch.SchemaName = meta.SchemaName
                                                              AND meta.ProcessObjectSchemaID = sch.SchemaID
                JOIN AdminDBA.dbo.LogSSISErrors_DB db ON meta.DBName = db.DBName
                                                         AND sch.FKDBID = db.ID
      WHERE     sch.CurrentlyUsed = 1
                AND db.CurrentlyUsed = 1
                AND ProcessObjectType IN ( 'SQL Stored Procedure',
                                           'Aggregate Function',
                                           'SQL DML Trigger',
                                           'Assembly DML Trigger',
                                           'Extended Stored Procedure',
                                           'Assembly Stored Procedure',
                                           'Replication Filter Procedure',
                                           'Assembly Scalar Function',
                                           'SQL Scalar Function' )
    ) s
ON ( t.ProcessName = s.ProcessObjectName
     AND s.ID = t.FKSchemaID
   )
WHEN MATCHED THEN
    UPDATE SET [ProcessID] = s.ProcessObjectID ,
               [ProcessName] = s.ProcessObjectName ,
               [CurrentlyUsed] = s.[CurrentlyUsed]
WHEN NOT MATCHED THEN
    INSERT ( [ProcessID] ,
             [ProcessName] ,
             [FKSchemaID] ,
             [CurrentlyUsed]
           )
    VALUES ( s.ProcessObjectID ,
             s.ProcessObjectName ,
             s.ID ,
             s.[CurrentlyUsed]
           )
WHEN NOT MATCHED BY SOURCE THEN
    UPDATE SET [CurrentlyUsed] = 0;

Upon execution of the modified stored procedure from PART 1, the code should pick up our new database, schema, table and stored procedure which in turn should be reflected in AdminDBA database tables’ entries i.e. LogSSISErrors_Package, LogSSISErrors_DB, LogSSISErrors_Schema, LogSSISErrors_Object, and LogSSISErrors_Process tables as per the image below.

Error_Capture_DB_AdminDBA_Refreshed_Data

When trying to execute the package, an error message is thrown due to T-SQL in usp_DummySP stored procedure attempting to divide by zero, which is in accordance with our expectations. The stored procedure does not contain error handling code therefore division by zero operation is handled by halting execution and returning the error message back to Execution Results view as per below. Also, the package invoking the stored procedure needs to be saved

Error_Capture_DB_SSIS_Exec_Failure

Now let’s alter the dummyschema.usp_DummySP stored procedure’s code to allow error capture and handling capabilities. By default SQL Server has a number of system functions available to troubleshoot execution flow and expose error details. In combination with SSIS package system parameters i.e. execution instance GUID and package name, we can capture and log those details in LogSSISErrors_Error table, referencing other tables which provide other metadata details for richer analysis.

The following ALTER SQL statements implements TRY/CATCH SQL Server error handling capabilities and allows for logging execution issues e.g. dividing by zero in SELECT statement section.

USE [TestDB]
GO
ALTER PROCEDURE [dummyschema].[usp_DummySP]
    (
      @ExecutionInstanceGUID UNIQUEIDENTIFIER ,
      @PackageName NVARCHAR(256)
    )
AS
    BEGIN 
        BEGIN TRY
            BEGIN TRANSACTION		
            INSERT  INTO TestDB.dummyschema.DummyTable
                    ( Column1 )
                    SELECT  1 / 0
            COMMIT TRANSACTION
        END TRY
        BEGIN CATCH
            ROLLBACK TRANSACTION;
            WITH    TempErr ( [ErrorNumber], [ErrorSeverity], [ErrorState], [ErrorLine], [ErrorMessage], [ErrorDateTime], 
							[LoginName], [UserName], PackageName, [ObjectID], [ProcessID], [ExecutionInstanceGUID], [DBName] )
                      AS ( SELECT   ERROR_NUMBER()		AS ErrorNumber ,
                                    ERROR_SEVERITY()	AS ErrorSeverity ,
                                    ERROR_STATE()		AS ErrorState ,
                                    ERROR_LINE()		AS ErrorLine ,
                                    ERROR_MESSAGE()		AS ErrorMessage ,
                                    SYSDATETIME()		AS ErrorDateTime ,
                                    SYSTEM_USER			AS LoginName ,
                                    USER_NAME()			AS UserName ,
                                    @PackageName ,
                                    OBJECT_ID('TestDB.dummyschema.DummyTable') AS ObjectID ,
                                    ( SELECT    a.objectid
                                      FROM      sys.dm_exec_requests r
                                                CROSS	APPLY sys.dm_exec_sql_text(r.sql_handle) a
                                      WHERE     session_id = @@spid
                                    ) AS ProcessID ,
                                    @ExecutionInstanceGUID AS ExecutionInstanceGUID ,
									DB_NAME()			AS DatabaseName
                         )
                INSERT  INTO AdminDBA.dbo.LogSSISErrors_Error
                        ( [ErrorNumber] ,
                          [ErrorSeverity] ,
                          [ErrorState] ,
                          [ErrorLine] ,
                          [ErrorMessage] ,
                          [ErrorDateTime] ,
                          [FKLoginID] ,
                          [FKUserID] ,
                          [FKPackageID] ,
                          [FKObjectID] ,
                          [FKProcessID] ,
                          [ExecutionInstanceGUID]
                        )
                        SELECT  ErrorNumber				= COALESCE(err.ErrorNumber, -1) ,
                                ErrorSeverity			= COALESCE(err.[ErrorSeverity], -1) ,
                                ErrorState				= COALESCE(err.[ErrorState], -1) ,
                                ErrorLine				= COALESCE(err.[ErrorLine], -1) ,
                                ErrorMessage			= COALESCE(err.[ErrorMessage], 'Unknown') ,
                                ErrorDateTime			= ErrorDateTime ,
                                FKLoginID				= src_login.ID ,
                                FKUserID				= src_user.ID ,
                                [FKPackageID]			= src_package.ID ,
                                [FKObjectID]			= src_object.ID ,
                                [FKProcessID]			= src_process.ID ,
                                [ExecutionInstanceGUID] = err.ExecutionInstanceGUID		
                        FROM    TempErr err
                                LEFT JOIN AdminDBA.dbo.LogSSISErrors_Login src_login
								ON err.LoginName = src_login.LoginName
                                LEFT JOIN AdminDBA.dbo.LogSSISErrors_User src_user 
								ON err.UserName = src_user.UserName	
								AND src_user.FKDBID = 
									(SELECT ID 
									FROM AdminDBA.dbo.LogSSISErrors_DB db 
									WHERE db.DBName = err.DBName)						
                                LEFT JOIN AdminDBA.dbo.LogSSISErrors_Package src_package 
								ON err.PackageName = (LEFT(src_package.PackageName, CHARINDEX('.', src_package.PackageName) - 1)) 
                                LEFT JOIN AdminDBA.dbo.LogSSISErrors_Object src_object 
								ON err.ObjectID = src_object.ObjectID
                                LEFT JOIN AdminDBA.dbo.LogSSISErrors_Process src_process 
								ON err.ProcessID = src_process.ProcessID
								--WHERE 
								--src_login.CurrentlyUsed	= 1 AND
								--src_user.CurrentlyUsed	= 1 AND 
								--src_package.CurrentlyUsed	= 1 AND
								--src_object.CurrentlyUsed	= 1 AND
								--src_process.CurrentlyUsed	= 1															
        END CATCH
    END

Once altered, the stored procedure executed through the package will write into the table if BEGIN CATCH/END CATCH statement is invoked as per image below.

Error_Capture_DB_Package_Exec_OK_With_Error_Metadata

Now that error details have been inserted into our AdminDBA database table, we can query the data across all related tables if more comprehensive view is required e.g. schema, database etc.

You can also download all the code samples, SSIS package solution files and database schema model (saved in a format compatible with Navicat Data Modeler database design tool) from my OneDrive folder HERE.

Tags: , ,