How To Synchronise Data Across Two SQL Server Databases – Part 2. SQL Code, SSIS Package And Application For Multiple Objects Processing
In the FIRST POST to this series I outlined how to synchronised data across two different databases using dynamic MERGE SQL statement. The idea was that the code built MERGE SQL statement on the fly based on database objects’ metadata and as long the table had a primary key constraint present, it automatically handled INSERT and UPDATE based on its content. In this post I would like to expand on this approach and show you how to provide a looping functionality by means of using another stored procedure or an SSIS package to pick up all relevant object and execute it as many times as there is tables to merge together without listing object names individually. All the code and solution files for this series can be downloaded from HERE.
Using SQL Stored Procedure With Cursor
The simplest way to loop through a collection of tables which qualify for synchronisation is to create a simple stored procedure with a cursor. Before we get to the nuts and bolts of this solution, however, let’s first create sample databases, objects and dummy data for this demonstration. The below SQL code creates two databases, each containing three tables. Each table located in Source_DB database has 1000 records in it. We can also notice that our destination database has seemingly similar structure, however, from the data point of view, there is only 500 records in each table. Also, attributes with IDs numbered from 1 to 10 are different in source database to IDs in target database. This creates a good foundation for inserting and updating source data based on those discrepancies using MERGE SQL statement. Let’s go ahead and create all necessary databases, objects and dummy data.
USE [master] GO IF EXISTS (SELECT name FROM sys.databases WHERE name = N'Source_DB') BEGIN -- Close connections to the DW_Sample database ALTER DATABASE [Source_DB] SET SINGLE_USER WITH ROLLBACK IMMEDIATE DROP DATABASE [Source_DB] END GO CREATE DATABASE [Source_DB] IF EXISTS (SELECT name FROM sys.databases WHERE name = N'Target_DB') BEGIN -- Close connections to the DW_Sample database ALTER DATABASE [Target_DB] SET SINGLE_USER WITH ROLLBACK IMMEDIATE DROP DATABASE [Target_DB] END GO CREATE DATABASE [Target_DB] USE Source_DB CREATE TABLE Tbl1 ( ID int NOT NULL, Sample_Data_Col1 varchar (50) NOT NULL, Sample_Data_Col2 varchar (50) NOT NULL, Sample_Data_Col3 varchar (50) NOT NULL) GO USE Target_DB CREATE TABLE Tbl1 ( ID int NOT NULL, Sample_Data_Col1 varchar (50) NOT NULL, Sample_Data_Col2 varchar (50) NOT NULL, Sample_Data_Col3 varchar (50) NOT NULL) GO USE Source_DB DECLARE @rowcount int = 0 WHILE @rowcount < 1000 BEGIN SET NOCOUNT ON INSERT INTO Tbl1 (ID, Sample_Data_Col1, Sample_Data_Col2, Sample_Data_Col3) SELECT @rowcount, 'Sample_Data' + CAST(@rowcount as varchar(10)), 'Sample_Data' + CAST(@rowcount as varchar(10)), 'Sample_Data' + CAST(@rowcount as varchar(10)) SET @rowcount = @rowcount + 1 END GO SELECT * INTO Tbl2 FROM Tbl1 SELECT * INTO Tbl3 FROM Tbl1 USE Target_DB DECLARE @rowcount int = 0 WHILE @rowcount < 1000 BEGIN SET NOCOUNT ON INSERT INTO Tbl1 (ID, Sample_Data_Col1, Sample_Data_Col2, Sample_Data_Col3) SELECT @rowcount, 'Sample_Data' + CAST(@rowcount as varchar(10)), 'Sample_Data' + CAST(@rowcount as varchar(10)), 'Sample_Data' + CAST(@rowcount as varchar(10)) SET @rowcount = @rowcount + 1 END GO DELETE FROM Target_DB.dbo.Tbl1 WHERE ID >= 500 UPDATE Source_DB.dbo.Tbl1 SET Sample_Data_Col1 = 'Changed_Data' WHERE ID < 10 UPDATE Source_DB.dbo.Tbl1 SET Sample_Data_Col2 = 'Changed_Data' WHERE ID < 10 UPDATE Source_DB.dbo.Tbl1 SET Sample_Data_Col3 = 'Changed_Data' WHERE ID < 10 SELECT * INTO Tbl2 FROM Tbl1 SELECT * INTO Tbl3 FROM Tbl1 CREATE UNIQUE CLUSTERED INDEX [Clustered_Idx_Id] ON Source_DB.dbo.Tbl1 ([ID] ASC) GO CREATE UNIQUE CLUSTERED INDEX [Clustered_Idx_Id] ON Source_DB.dbo.Tbl2 ([ID] ASC) GO CREATE UNIQUE CLUSTERED INDEX [Clustered_Idx_Id] ON Source_DB.dbo.Tbl3 ([ID] ASC) GO CREATE UNIQUE CLUSTERED INDEX [Clustered_Idx_Id] ON Target_DB.dbo.Tbl1 ([ID] ASC) GO CREATE UNIQUE CLUSTERED INDEX [Clustered_Idx_Id] ON Target_DB.dbo.Tbl2 ([ID] ASC) GO CREATE UNIQUE CLUSTERED INDEX [Clustered_Idx_Id] ON Target_DB.dbo.Tbl3 ([ID] ASC) GO
Next, let’s recreate the usp_DBSync stored procedure from the PREVIOUS POST. The SQL code can be found either going back to the start of this series – POST 1 – or alternatively downloaded from HERE. Without usp_DBSync stored procedure on the server the rest of the solution will not work so make sure that you re-create it first. Now that we have all necessary objects, we are ready to create the construct which will provide our looping functionality based on metadata and allow for multiple objects processing without the need to specify their names. In order to do this, let’s create a ‘wrapper’ stored procedure around usp_DBSync procedure executing the following code.
USE [Source_DB] IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[usp_SyncMultipleTables]') AND type IN (N'P',N'PC')) DROP PROCEDURE [dbo].[usp_SyncMultipleTables] GO CREATE PROCEDURE [usp_SyncMultipleTables] @SourceDBName varchar (256), @SourceSchemaName varchar (50), @TargetDBName varchar (256), @TargetSchemaName varchar (50) AS BEGIN SET NOCOUNT ON DECLARE @Err_Msg varchar (max) DECLARE @IsDebugMode bit = 1 DECLARE @SQLSource nvarchar (max) = 'INSERT INTO #TempTbl (ObjectName, SchemaName, DBName, Source_vs_Target) SELECT DISTINCT o.name, '''+@SourceSchemaName+''', '''+@SourceDBName+''', ''S'' FROM '+@SourceDBName+'.sys.tables t JOIN '+@SourceDBName+'.sys.schemas s ON t.schema_id = s.schema_id JOIN '+@SourceDBName+'.sys.objects o ON t.schema_id = o.schema_id WHERE S.name = '''+@SourceSchemaName+''' and o.type = ''U''' DECLARE @SQLTarget nvarchar (max) = 'INSERT INTO #TempTbl (ObjectName, SchemaName, DBName, Source_vs_Target) SELECT DISTINCT o.name, '''+@TargetSchemaName+''', '''+@TargetDBName+''', ''T'' FROM '+@TargetDBName+'.sys.tables t JOIN '+@TargetDBName+'.sys.schemas s ON t.schema_id = s.schema_id JOIN '+@TargetDBName+'.sys.objects o ON t.schema_id = o.schema_id WHERE S.name = '''+@TargetSchemaName+''' and o.type = ''U''' CREATE TABLE #TempTbl (ObjectName varchar (256), SchemaName varchar (50), DBName varchar (50), Source_vs_Target char(1)) EXEC sp_executesql @SQLSource EXEC sp_executesql @SQLTarget IF @IsDebugMode = 1 SELECT * FROM #TempTbl CREATE TABLE #TempFinalTbl (ID int IDENTITY (1,1), ObjectName varchar (256)) INSERT INTO #TempFinalTbl (ObjectName) SELECT ObjectName FROM #TempTbl a WHERE Source_vs_Target = 'S' INTERSECT SELECT ObjectName FROM #TempTbl a WHERE Source_vs_Target = 'T' IF @IsDebugMode = 1 SELECT * FROM #TempFinalTbl IF @IsDebugMode = 1 PRINT 'The following tables will be merged between the source and target databases...' DECLARE @ID int DECLARE @TblName varchar (256) DECLARE cur CURSOR FOR SELECT ID, ObjectName FROM #TempFinalTbl OPEN cur FETCH NEXT FROM cur INTO @ID, @TblName WHILE @@FETCH_STATUS = 0 BEGIN PRINT '' + CAST(@ID as varchar (20))+'. '+ @TblName +'' FETCH NEXT FROM cur INTO @ID, @TblName END CLOSE cur DEALLOCATE cur DECLARE @ObjectName varchar (256) DECLARE db_cursor CURSOR FOR SELECT ObjectName FROM #TempFinalTbl OPEN db_cursor FETCH NEXT FROM db_cursor INTO @ObjectName WHILE @@FETCH_STATUS = 0 BEGIN PRINT char(10) PRINT 'Starting merging process...' PRINT 'Merging ' + @SourceDBName + '.' + @SourceSchemaName + '.' + @ObjectName + ' with '+ @TargetDBName + '.' + @TargetSchemaName + '.' + @ObjectName + '' EXEC [dbo].[usp_DBSync] @SourceDBName, @TargetDBName, @SourceSchemaName, @TargetSchemaName, @ObjectName, @ObjectName FETCH NEXT FROM db_cursor INTO @ObjectName END CLOSE db_cursor DEALLOCATE db_cursor END
This code simply reiterates through tables which share same names between two different databases, providing necessary metadata for our code MERGE SQL stored procedure (usp_DBSync). Given we have our environment setup correctly i.e. we executed the first code snippet to prep our databases and objects and we also have usp_DBSync stored procedure sitting on our Source_DB database we can run the usp_SyncMultipleTables procedure to see if it correctly accounted for the database objects and the data they hold as well as whether data has been synchronised successfully. Let’s execute our stored procedure and observe the output using the following SQL.
USE [Source_DB] GO DECLARE @return_value int EXEC @return_value = [dbo].[usp_SyncMultipleTables] @SourceDBName = N'Source_DB', @SourceSchemaName = N'dbo', @TargetDBName = N'Target_DB', @TargetSchemaName = N'dbo' SELECT 'Return Value' = @return_value GO
Finally, when comparing the data between the two databases, we should note that all tables i.e. Tbl1, Tbl2 and Tbl3 have been synchronised and contain the same data. Please note that only tables with same names will be synchronised. If you think of merging data from tables with different names, you need to provide additional functionality to account for source versus target objects mapping.
Using SQL Stored Procedure and SQL Server Integration Services
If you are familiar with SQL Server Integration Services, we can achieve the same result building a simple solution in BIDS or SQL Server Data Tools. Let’s re-create the environment again running the first SQL code snippet again to start with a clean slate. We also have to recreate usp_DBSync stored procedure as per PREVIOUS POST SQL code – without usp_DBSync stored procedure re-created the rest of the solution will not work. Next, we will create a simple stored procedure which will be used by our SSIS package to pass through the object names before we can initiate the looping functionality.
USE [Source_DB] IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[usp_ReturnObjectsMetadata]') AND type IN (N'P',N'PC')) DROP PROCEDURE [dbo].[usp_ReturnObjectsMetadata] GO CREATE PROCEDURE usp_ReturnObjectsMetadata (@SourceSchemaName varchar (50), @SourceDBName varchar (256), @TargetSchemaName varchar (50), @TargetDBName varchar (256)) AS BEGIN SET NOCOUNT ON DECLARE @SQLSource nvarchar (max) = 'INSERT INTO #TempTbl (ObjectName, SchemaName, DBName, Source_vs_Target) SELECT DISTINCT o.name, '''+@SourceSchemaName+''', '''+@SourceDBName+''', ''S'' FROM '+@SourceDBName+'.sys.tables t JOIN '+@SourceDBName+'.sys.schemas s ON t.schema_id = s.schema_id JOIN '+@SourceDBName+'.sys.objects o ON t.schema_id = o.schema_id WHERE S.name = '''+@SourceSchemaName+''' and o.type = ''U''' DECLARE @SQLTarget nvarchar (max) = 'INSERT INTO #TempTbl (ObjectName, SchemaName, DBName, Source_vs_Target) SELECT DISTINCT o.name, '''+@TargetSchemaName+''', '''+@TargetDBName+''', ''T'' FROM '+@TargetDBName+'.sys.tables t JOIN '+@TargetDBName+'.sys.schemas s ON t.schema_id = s.schema_id JOIN '+@TargetDBName+'.sys.objects o ON t.schema_id = o.schema_id WHERE S.name = '''+@TargetSchemaName+''' and o.type = ''U''' CREATE TABLE #TempTbl (ObjectName varchar (256), SchemaName varchar (50), DBName varchar (50), Source_vs_Target char(1)) EXEC sp_executesql @SQLSource EXEC sp_executesql @SQLTarget CREATE TABLE #TempFinalTbl (ID int IDENTITY (1,1), ObjectName varchar (256)) INSERT INTO #TempFinalTbl (ObjectName) SELECT DISTINCT ObjectName FROM #TempTbl a WHERE Source_vs_Target = 'S' INTERSECT SELECT DISTINCT ObjectName FROM #TempTbl a WHERE Source_vs_Target = 'T' SELECT DISTINCT ObjectName FROM #TempFinalTbl END
Finally, we are ready to build a simple SSIS package which will handle iterating through object names as merging occurs (all files for this package can be downloaded from HERE). Let’s create a simple SSIS solution starting with a setting up a database connection (the name will be different as per the environment which your’re developing on) and the following list of variables.
Continuing on, let’s place Execute SQL Task component on the Control Flow pane and adjust its properties under General settings to the following SQL statement and Result Set option.
Next, let’s map the parameters names to our variables and adjust Result Set properties as per the images below.
This part of package is responsible for populating our TableNames variable with the names of the objects we will be looping through. In order to reiterate through table names we will place For Each Loop container from Toolbar on the development pane, join it to the first Execute SQL Task transformation with default constraint option and place another Execute SQL Task container inside the For Each Loop one. Next, let’s adjust the second Execute SQL Task container’s properties as per below.
Lastly, let’s go through similar exercise with the For Each Loop transformation making sure that the Enumerator in the Collection property pane is set to Foreach ADO Enumerator, ADO Object Source Variable is set to User::TableNames variable and that Variable Mapping property is adjusted to match our User::TableNames variable as per images below.
That should be just enough development to provide us with some basic, rudimentary functionality for the package to serve the intended purpose. Let’s test it out to so hopefully when you run the package it will synchronise all the database objects (can be confirmed with a simple SELECT * FROM <table_name> SQL statement) and the development pane output will be as per image below.
This concludes this mini-series. If you happen to stumble upon this blog and find it somewhat useful, please don’t hesitate to leave me a comment – any feedback is appreciate, good or bad! Again, the first post to this series can be viewed HERE and all the SQL code as well as the solution files can be downloaded from HERE.
http://scuttle.org/bookmarks.php/pass?action=addThis entry was posted on Wednesday, September 18th, 2013 at 12:29 am and is filed under How To's, SQL. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
admin November 18th, 2014 at 10:09 am
Hi, the solution would be the same with the exception of adding additional qualifier to the code/variables etc. i.e. instead of ‘database.schema.object’ you will use ‘server.database.schema.object’ where ‘server’ would most likely be your linked server. Also, you would need to alter the stored procedure from PART 1, perhaps breaking it into 2 separate stored procedures/statements (one for INSERT and one for UPDATE) as from memory, MERGE SQL statement cannot be executed over linked server connection……..hope that helps, Martin