January 12th, 2015 / No Comments » / by admin
Introduction
Whilst the past couple of pre-2014 SQL Server releases have had a strong focus on business intelligence (BI), 2014 edition focuses largely on relational database features and support. Amongst few compelling new additions presented in SQL Server 2014 release is In-Memory OLTP engine, which promises some big performance improvements for OLTP applications.
In the majority of relational databases query runtime cost is typically comprised of latching/locking, disk I/O as well as the overhead associated with the engine interpreting the code thus increasing the number of CPU cycles required to produce the desired output. As most of those roadblocks are mitigated by means of storing the data in memory and taking advantage of ‘bypassing’ SQL Server interpretive engine in favor of complied code, In-Memory OLTP is claimed to have the ability to improve performance by the factor of 20 or more (depending on the workload, data, transaction type and hardware used).
Performance Testing
In this short post I’d like to perform a rudimentary analysis of the performance improvements achieved through utilizing In-Memory OLTP with compiled stored procedure versus disk-based/interpreted code implementation. My test bed is my old trusty Lenovo laptop (X230 series) with quad-core Intel i5-3320M CPU (2.60GHz), 16 gigabytes of RAM and Samsung 840 PRO SSD running Windows 8.1 and SQL Server 2014 Developers Edition installed. I will be comparing execution times for INSERT and UPDATE statements in in-memory database for both: natively compiled stored procedures and standard, ‘interpreted’ T-SQL code.
Let’s start with creating a sample database, a memory optimized table and four stored procedures used for this demonstration. Notice that the code difference between the first CREATE STORED PROCEDURE SQL statements and the second one is the fact that ‘usp_insert_data_compiled’ stored procedure, creates functionally equivalent construct to its non-compiled counterpart with the distinction of compiling the code into processor instructions which can be executed directly by the CPU, without the need for further compilation or interpretation thus invoking fewer CPU instructions then the equivalent interpreted T-SQL.
--CREATE InMemoryDemo SAMPLE DATABASE
USE master
GO
IF DB_ID('InMemoryDBSample') IS NOT NULL
BEGIN
DROP DATABASE InMemoryDBSample
END
CREATE DATABASE InMemoryDBSample ON PRIMARY
(NAME = N'InMemoryDemo_Data',
FILENAME = N'c:\DBSample\InMemoryDemo.mdf'), FILEGROUP InMemoryDemo_fg
CONTAINS MEMORY_OPTIMIZED_DATA
(NAME = N'InMemoryDemo_Dir',
FILENAME = N'c:\DBSample\InMemoryDemo_Mod') LOG ON (NAME = 'InMemoryDemo_Log',
FILENAME = N'c:\DBSample\InMemoryDemo.ldf')
GO
--create table
USE InMemoryDBSample
CREATE TABLE SampleTable
(
ID INT IDENTITY(1, 1)
NOT NULL ,
Value1 INT NOT NULL ,
Value2 INT NOT NULL ,
Value3 DECIMAL(10, 2) NOT NULL ,
Value4 DATETIME NOT NULL ,
Value5 DATETIME NOT NULL ,
CONSTRAINT PK_ID PRIMARY KEY NONCLUSTERED ( ID ASC )
)
WITH (
MEMORY_OPTIMIZED =
ON,
DURABILITY =
SCHEMA_ONLY)
--CREATE In-Memory TABLE WITH PRIMARY KEY ON IDENTITY COLUMN
USE InMemoryDBSample
CREATE TABLE SampleTable
(
ID INT IDENTITY(1, 1)
NOT NULL ,
Value1 INT NOT NULL ,
Value2 INT NOT NULL ,
Value3 DECIMAL(10, 2) NOT NULL ,
Value4 DATETIME NOT NULL ,
Value5 DATETIME NOT NULL ,
CONSTRAINT PK_ID PRIMARY KEY NONCLUSTERED ( ID ASC )
)
WITH (
MEMORY_OPTIMIZED =
ON,
DURABILITY =
SCHEMA_ONLY)
--CREATE NON-COMPILED 'INSERT' STORED PROCEDURE
CREATE PROCEDURE usp_insert_data_noncompiled
(
@records_to_insert int
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @var INT = 0
WHILE @var < @records_to_insert
BEGIN
INSERT INTO dbo.SampleTable
( Value1 ,
Value2 ,
Value3 ,
Value4 ,
Value5
)
SELECT 10 * RAND() ,
20 * RAND() ,
10000 * RAND() / 100 ,
DATEADD(ss, @var, SYSDATETIME()) ,
CURRENT_TIMESTAMP;
SET @var = @var + 1
END
END
--CREATE NATIVELY COMPILED 'INSERT' STORED PROCEDURE
CREATE PROCEDURE usp_insert_data_compiled
(
@records_to_insert INT
)
WITH NATIVE_COMPILATION,
SCHEMABINDING,
EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE =
N'us_english' )
DECLARE @var INT = 0
WHILE @var < @records_to_insert
BEGIN
INSERT INTO dbo.SampleTable
( Value1 ,
Value2 ,
Value3 ,
Value4 ,
Value5
)
SELECT 10 * RAND() ,
20 * RAND() ,
10000 * RAND() / 100 ,
DATEADD(ss, @var, SYSDATETIME()) ,
CURRENT_TIMESTAMP;
SET @var = @var + 1
END
END
--CREATE NON-COMPILED 'UPDATE' STORED PROCEDURE
CREATE PROCEDURE usp_update_data_noncompiled
AS
BEGIN
SET NOCOUNT ON;
DECLARE @var INT = 0
DECLARE @record_ct INT
SELECT @record_ct = MAX(ID)
FROM dbo.SampleTable
WHILE @var < @record_ct
BEGIN
UPDATE dbo.SampleTable
SET Value1 = 20 * RAND() ,
Value2 = 10 * RAND() ,
Value3 = 10000 * RAND() / 1000 ,
Value4 = DATEADD(ss, -@var, Value4) ,
Value5 = DATEADD(ss, -@var, Value4)
WHERE ID = @var;
SET @var = @var + 1
END
END
--CREATE NATIVELY COMPILED 'UPDATE' STORED PROCEDURE
CREATE PROCEDURE usp_update_data_compiled
WITH NATIVE_COMPILATION,
SCHEMABINDING,
EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE =
N'us_english' )
DECLARE @var INT = 0
DECLARE @record_ct INT
SELECT @record_ct = MAX(ID)
FROM dbo.SampleTable
WHILE @var < @record_ct
BEGIN
UPDATE dbo.SampleTable
SET Value1 = 20 * RAND() ,
Value2 = 10 * RAND() ,
Value3 = 10000 * RAND() / 1000 ,
Value4 = DATEADD(ss, -@var, Value4) ,
Value5 = DATEADD(ss, -@var, Value4)
WHERE ID = @var;
SET @var = @var + 1
END
END
Before I run the code to put the performance of In-Memory OLTP to test, there are few other things in the above code which are worth noting. The SCHEMA_ONLY clause (on compiled code snippet) indicates that SQL Server will log table creation so that the table schema will be durable but will not log any data manipulation on the table, so the data will not be durable. These type of tables do not require any I/O operations during transaction processing and the data is only available in-memory while SQL Server is running. In the event of SQL Server shutdown the data stored in these tables is purged and will not be persisted on restart, however, all operations on this data meet ACID requirements i.e. are atomic, consistent, isolated and durable. Another important functional aspect to highlight is that apart from WITH_NATIVE_COMPILATION clause added (which as the name implies compiles the SQL code for faster execution) the use of WITH SCHEMABINDING and BEGIN ATOMIC clauses is a requirement.
Let’s look at the execution results for both, non-compiled and compiled ‘Insert’ stored procedures.
--EXECUTE COMPILED 'Insert' STORED PROCEDURE
EXEC dbo.usp_insert_data_compiled @records_to_insert = 1000000
GO
--EXECUTE NON-COMPILED 'Insert' STORED PROCEDURE
EXEC dbo.usp_insert_data_noncompiled @records_to_insert = 1000000
GO
As we can see, ‘Insert’ operation on natively compiled stored procedure yielded a performance increase of over 800 percent, which is a substantial improvement. Likewise, the results from ‘Update’ statement show similar improvements as per the image below.
Conclusion
As we can see from this rudimentary exercise, natively compiled stored procedure execution is a lot more efficient, generating far fewer CPU instructions for the SQL Server engine thus improving performance by a considerable margin. Compiling stored procedure’s code creates a highly efficient data access path, allowing for much faster data operations which, in most cases, outperform a non-compiled stored procedure that uses an In-Memory table.
Posted in: SQL
Tags: SQL, SQL Server
November 7th, 2014 / 3 Comments » / by admin
In PART 1 of this series I analysed the execution error tracking database schema as well as the code for ‘scraping’ SQL Server metadata in order to populate the database tables with instance, database, schema and objects-specific information. In this part I will briefly describe SQL code modifications which are applied the original stored procedure (see PART 1) to customize it in order to handle metadata changes as well as the actual implementation and error-logging mechanism during package/stored procedure execution. Also, all the code and any additional files presented in this and previous part can be downloaded from my OneDrive folder HERE.
Before I get into the nuts and bolts on how to log into LogSSISErrors_Error table, which in this case will form the base for all errors entries, let create a sample database, schema, table and stored procedure containing an error-invoking code i.e. executing division by 0. This stored procedure, which will later be appended with additional error-handling code it in order to capture and log execution error(s) through TRY/CATCH T-SQL, will be used as the basis for demonstrating error capturing mechanism.
USE master
IF EXISTS ( SELECT TOP 1
*
FROM sys.databases
WHERE name = 'TestDB' )
DROP DATABASE TestDB
CREATE DATABASE TestDB
GO
USE TestDB
EXEC sp_executesql N'CREATE SCHEMA dummyschema;';
GO
CREATE TABLE dummyschema.DummyTable
(
ID INT IDENTITY(1, 1) ,
Column1 VARCHAR(56) ,
Column2 VARCHAR(56) ,
Column3 VARCHAR(56)
)
INSERT INTO dummyschema.DummyTable
( Column1 ,
Column2 ,
Column3
)
SELECT 'dummy value1' ,
'dummy value1' ,
'dummy value1'
UNION ALL
SELECT 'dummy value1' ,
'dummy value1' ,
'dummy value1'
UNION ALL
SELECT 'dummy value1' ,
'dummy value1' ,
'dummy value1'
GO
CREATE PROCEDURE dummyschema.usp_DummySP
(
@ExecutionInstanceGUID UNIQUEIDENTIFIER ,
@PackageName NVARCHAR(256)
)
AS
BEGIN
INSERT INTO TestDB.dummyschema.DummyTable
( Column1 )
SELECT 1 / 0
END
As this database is supposed to be working mainly in conjunction with an SSIS package (standalone stored procedure execution error capture is also fine but in this case the SSIS component would have to be omitted and replaced with a place-holder value), I have also created a sample package used primarily to trigger stored procedure execution, with a couple of system variables capturing package execution instance GUID identifier and package name metadata. This package needs to be saved somewhere on the C:\ drive for the metadata updating stored procedure to find the relevant info and populate LogSSISErrors_Package table accordingly.
Most of that code from PART 1 is also applicable to maintaining the error tracking database by means of modifying INSERT statements into MERGE statements, thus allowing for applying changes to already existing data and inserting new records. The core difference is that in PART 1 all AdminSQL database objects required to be (re)created before we could proceed with SQL Server instance metadata analysis and capture thus making its execution only applicable to a new environment, where the database needed to be rebuilt from scratch. Once SQLAdmin database has been created and initially populated with the code from PART 1, any subsequent execution should account for updates and new data inserts only which are implemented by means of subtle changes to the stored procedure code, mainly replacing INSERTS with MERGE T-SQL. Below is a couple of examples where the code from PART 1 stored procedure was replaced with MERGE SQL statements with the complete code available for download from HERE.
-------------------------------------------------------------------------------------------------------
--Update LogSSISErrors_Schema table
-------------------------------------------------------------------------------------------------------
MERGE INTO AdminDBA.dbo.LogSSISErrors_Schema AS t
USING
( SELECT DISTINCT
db.ID AS ID ,
meta.SchemaID AS SchemaID ,
meta.SchemaName AS SchemaName ,
meta.SchemaOwner AS SchemaOwner ,
1 AS CurrentlyUsed
FROM @EnumDBMeta meta
JOIN AdminDBA.dbo.LogSSISErrors_DB db ON meta.DBName = db.DBName
WHERE db.CurrentlyUsed = 1
) s
ON ( t.SchemaName = s.SchemaName
AND s.ID = t.FKDBID
)
WHEN MATCHED THEN
UPDATE SET [SchemaID] = s.[SchemaID] ,
[SchemaName] = s.[SchemaName] ,
[SchemaOwner] = s.[SchemaOwner] ,
[CurrentlyUsed] = s.[CurrentlyUsed]
WHEN NOT MATCHED THEN
INSERT ( [FKDBID] ,
[SchemaID] ,
[SchemaName] ,
[SchemaOwner] ,
[CurrentlyUsed]
)
VALUES ( s.[ID] ,
s.[SchemaID] ,
s.[SchemaName] ,
s.[SchemaOwner] ,
s.[CurrentlyUsed]
)
WHEN NOT MATCHED BY SOURCE THEN
UPDATE SET [CurrentlyUsed] = 0;
-------------------------------------------------------------------------------------------------------
--Update LogSSISErrors_Process table
-------------------------------------------------------------------------------------------------------
MERGE INTO AdminDBA.dbo.LogSSISErrors_Process AS t
USING
( SELECT DISTINCT
meta.ProcessObjectID ,
meta.ProcessObjectName ,
sch.ID ,
1 AS CurrentlyUsed
FROM @EnumDBMeta meta
JOIN AdminDBA.dbo.LogSSISErrors_Schema AS sch ON sch.SchemaName = meta.SchemaName
AND meta.ProcessObjectSchemaID = sch.SchemaID
JOIN AdminDBA.dbo.LogSSISErrors_DB db ON meta.DBName = db.DBName
AND sch.FKDBID = db.ID
WHERE sch.CurrentlyUsed = 1
AND db.CurrentlyUsed = 1
AND ProcessObjectType IN ( 'SQL Stored Procedure',
'Aggregate Function',
'SQL DML Trigger',
'Assembly DML Trigger',
'Extended Stored Procedure',
'Assembly Stored Procedure',
'Replication Filter Procedure',
'Assembly Scalar Function',
'SQL Scalar Function' )
) s
ON ( t.ProcessName = s.ProcessObjectName
AND s.ID = t.FKSchemaID
)
WHEN MATCHED THEN
UPDATE SET [ProcessID] = s.ProcessObjectID ,
[ProcessName] = s.ProcessObjectName ,
[CurrentlyUsed] = s.[CurrentlyUsed]
WHEN NOT MATCHED THEN
INSERT ( [ProcessID] ,
[ProcessName] ,
[FKSchemaID] ,
[CurrentlyUsed]
)
VALUES ( s.ProcessObjectID ,
s.ProcessObjectName ,
s.ID ,
s.[CurrentlyUsed]
)
WHEN NOT MATCHED BY SOURCE THEN
UPDATE SET [CurrentlyUsed] = 0;
Upon execution of the modified stored procedure from PART 1, the code should pick up our new database, schema, table and stored procedure which in turn should be reflected in AdminDBA database tables’ entries i.e. LogSSISErrors_Package, LogSSISErrors_DB, LogSSISErrors_Schema, LogSSISErrors_Object, and LogSSISErrors_Process tables as per the image below.
When trying to execute the package, an error message is thrown due to T-SQL in usp_DummySP stored procedure attempting to divide by zero, which is in accordance with our expectations. The stored procedure does not contain error handling code therefore division by zero operation is handled by halting execution and returning the error message back to Execution Results view as per below. Also, the package invoking the stored procedure needs to be saved
Now let’s alter the dummyschema.usp_DummySP stored procedure’s code to allow error capture and handling capabilities. By default SQL Server has a number of system functions available to troubleshoot execution flow and expose error details. In combination with SSIS package system parameters i.e. execution instance GUID and package name, we can capture and log those details in LogSSISErrors_Error table, referencing other tables which provide other metadata details for richer analysis.
The following ALTER SQL statements implements TRY/CATCH SQL Server error handling capabilities and allows for logging execution issues e.g. dividing by zero in the highlighted SELECT statement section.
USE [TestDB]
GO
ALTER PROCEDURE [dummyschema].[usp_DummySP]
(
@ExecutionInstanceGUID UNIQUEIDENTIFIER ,
@PackageName NVARCHAR(256)
)
AS
BEGIN
BEGIN TRY
BEGIN TRANSACTION
INSERT INTO TestDB.dummyschema.DummyTable
( Column1 )
SELECT 1 / 0
COMMIT TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
WITH TempErr ( [ErrorNumber], [ErrorSeverity], [ErrorState], [ErrorLine], [ErrorMessage], [ErrorDateTime],
[LoginName], [UserName], PackageName, [ObjectID], [ProcessID], [ExecutionInstanceGUID], [DBName] )
AS ( SELECT ERROR_NUMBER() AS ErrorNumber ,
ERROR_SEVERITY() AS ErrorSeverity ,
ERROR_STATE() AS ErrorState ,
ERROR_LINE() AS ErrorLine ,
ERROR_MESSAGE() AS ErrorMessage ,
SYSDATETIME() AS ErrorDateTime ,
SYSTEM_USER AS LoginName ,
USER_NAME() AS UserName ,
@PackageName ,
OBJECT_ID('TestDB.dummyschema.DummyTable') AS ObjectID ,
( SELECT a.objectid
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) a
WHERE session_id = @@spid
) AS ProcessID ,
@ExecutionInstanceGUID AS ExecutionInstanceGUID ,
DB_NAME() AS DatabaseName
)
INSERT INTO AdminDBA.dbo.LogSSISErrors_Error
( [ErrorNumber] ,
[ErrorSeverity] ,
[ErrorState] ,
[ErrorLine] ,
[ErrorMessage] ,
[ErrorDateTime] ,
[FKLoginID] ,
[FKUserID] ,
[FKPackageID] ,
[FKObjectID] ,
[FKProcessID] ,
[ExecutionInstanceGUID]
)
SELECT ErrorNumber = COALESCE(err.ErrorNumber, -1) ,
ErrorSeverity = COALESCE(err.[ErrorSeverity], -1) ,
ErrorState = COALESCE(err.[ErrorState], -1) ,
ErrorLine = COALESCE(err.[ErrorLine], -1) ,
ErrorMessage = COALESCE(err.[ErrorMessage], 'Unknown') ,
ErrorDateTime = ErrorDateTime ,
FKLoginID = src_login.ID ,
FKUserID = src_user.ID ,
[FKPackageID] = src_package.ID ,
[FKObjectID] = src_object.ID ,
[FKProcessID] = src_process.ID ,
[ExecutionInstanceGUID] = err.ExecutionInstanceGUID
FROM TempErr err
LEFT JOIN AdminDBA.dbo.LogSSISErrors_Login src_login
ON err.LoginName = src_login.LoginName
LEFT JOIN AdminDBA.dbo.LogSSISErrors_User src_user
ON err.UserName = src_user.UserName
AND src_user.FKDBID =
(SELECT ID
FROM AdminDBA.dbo.LogSSISErrors_DB db
WHERE db.DBName = err.DBName)
LEFT JOIN AdminDBA.dbo.LogSSISErrors_Package src_package
ON err.PackageName = (LEFT(src_package.PackageName, CHARINDEX('.', src_package.PackageName) - 1))
LEFT JOIN AdminDBA.dbo.LogSSISErrors_Object src_object
ON err.ObjectID = src_object.ObjectID
LEFT JOIN AdminDBA.dbo.LogSSISErrors_Process src_process
ON err.ProcessID = src_process.ProcessID
--WHERE
--src_login.CurrentlyUsed = 1 AND
--src_user.CurrentlyUsed = 1 AND
--src_package.CurrentlyUsed = 1 AND
--src_object.CurrentlyUsed = 1 AND
--src_process.CurrentlyUsed = 1
END CATCH
END
Once altered, the stored procedure executed through the package will write into the table if BEGIN CATCH/END CATCH statement is invoked as per image below.
Now that error details have been inserted into our AdminDBA database table, we can query the data across all related tables if more comprehensive view is required e.g. schema, database etc. Below sample query sets up a view which can be used through a reporting platform of choice to quickly identify transactional SSIS execution issues or perform a quick analysis on errors and their related objects, schemas, processes, packages etc.
CREATE VIEW [dbo].[vw_LogSSISErrors_CoreData] AS
SELECT dbo.LogSSISErrors_Error.ID AS Error_ID ,
dbo.LogSSISErrors_Error.ErrorNumber ,
dbo.LogSSISErrors_Error.ErrorSeverity ,
dbo.LogSSISErrors_Error.ErrorState ,
dbo.LogSSISErrors_Error.ErrorLine ,
dbo.LogSSISErrors_Error.ErrorMessage ,
dbo.LogSSISErrors_Error.ErrorDateTime ,
dbo.LogSSISErrors_Object.ObjectName ,
dbo.LogSSISErrors_Object.ObjectType ,
dbo.LogSSISErrors_Object.ObjectDescription ,
dbo.LogSSISErrors_Schema.SchemaName AS ObjectSchemaName ,
dbo.LogSSISErrors_Process.ProcessName ,
LogSSISErrors_Schema_1.SchemaName AS ProcessSchemaName ,
dbo.LogSSISErrors_Package.PackageName ,
dbo.LogSSISErrors_User.UserName ,
dbo.LogSSISErrors_Login.LoginName ,
dbo.LogSSISErrors_DB.DBName ,
dbo.LogSSISErrors_Instance.InstanceName
FROM dbo.LogSSISErrors_Error
INNER JOIN dbo.LogSSISErrors_Object ON dbo.LogSSISErrors_Error.FKObjectID = dbo.LogSSISErrors_Object.ID
INNER JOIN dbo.LogSSISErrors_Schema ON dbo.LogSSISErrors_Object.FKSchemaID = dbo.LogSSISErrors_Schema.ID
INNER JOIN dbo.LogSSISErrors_DB ON dbo.LogSSISErrors_Schema.FKDBID = dbo.LogSSISErrors_DB.ID
INNER JOIN dbo.LogSSISErrors_Instance ON dbo.LogSSISErrors_DB.FKInstanceID = dbo.LogSSISErrors_Instance.ID
INNER JOIN dbo.LogSSISErrors_Package ON dbo.LogSSISErrors_Error.FKPackageID = dbo.LogSSISErrors_Package.ID
INNER JOIN dbo.LogSSISErrors_Process ON dbo.LogSSISErrors_Error.FKProcessID = dbo.LogSSISErrors_Process.ID
INNER JOIN dbo.LogSSISErrors_User ON dbo.LogSSISErrors_Error.FKUserID = dbo.LogSSISErrors_User.ID
AND dbo.LogSSISErrors_DB.ID = dbo.LogSSISErrors_User.FKDBID
INNER JOIN dbo.LogSSISErrors_Schema AS LogSSISErrors_Schema_1 ON dbo.LogSSISErrors_DB.ID = LogSSISErrors_Schema_1.FKDBID
AND dbo.LogSSISErrors_Process.FKSchemaID = LogSSISErrors_Schema_1.ID
INNER JOIN dbo.LogSSISErrors_Login ON dbo.LogSSISErrors_Error.FKLoginID = dbo.LogSSISErrors_Login.ID;
Based on the output of this code, we can set up a simple dashboard (Tableau in this case) to provide us with some core metrics on the ETL issues that require addressing at a glance (click on image to enlarge).
You can also download all the code samples, SSIS package solution files and database schema model (saved in a format compatible with Navicat Data Modeler database design tool) from my OneDrive folder HERE.
Posted in: How To's, SQL
Tags: Data Modelling, SQL, SQL Server