How To Synchronise Data Across Two SQL Server Databases - Part 2. SQL Code, SSIS Package And Application For Multiple Objects Processing

How To Synchronise Data Across Two SQL Server Databases – Part 2. SQL Code, SSIS Package And Application For Multiple Objects Processing

In the FIRST POST to this series I outlined how to synchronised data across two different databases using dynamic MERGE SQL statement. The idea was that the code built MERGE SQL statement on the fly based on database objects’ metadata and as long the table had a primary key constraint present, it automatically handled INSERT and UPDATE based on its content. In this post I would like to expand on this approach and show you how to provide a looping functionality by means of using another stored procedure or an SSIS package to pick up all relevant object and execute it as many times as there is tables to merge together without listing object names individually. All the code and solution files for this series can be downloaded from HERE.

Using SQL Stored Procedure With Cursor

The simplest way to loop through a collection of tables which qualify for synchronisation is to create a simple stored procedure with a cursor. Before we get to the nuts and bolts of this solution, however, let’s first create sample databases, objects and dummy data for this demonstration. The below SQL code creates two databases, each containing three tables. Each table located in Source_DB database has 1000 records in it. We can also notice that our destination database has seemingly similar structure, however, from the data point of view, there is only 500 records in each table. Also, attributes with IDs numbered from 1 to 10 are different in source database to IDs in target database. This creates a good foundation for inserting and updating source data based on those discrepancies using MERGE SQL statement. Let’s go ahead and create all necessary databases, objects and dummy data.

USE [master]
GO
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'Source_DB')
BEGIN
-- Close connections to the DW_Sample database
ALTER DATABASE [Source_DB] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE [Source_DB]
END
GO
CREATE DATABASE [Source_DB] 

IF EXISTS (SELECT name FROM sys.databases WHERE name = N'Target_DB')
BEGIN
-- Close connections to the DW_Sample database
ALTER DATABASE [Target_DB] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE [Target_DB]
END
GO
CREATE DATABASE [Target_DB]

USE Source_DB
CREATE TABLE Tbl1 (
ID int NOT NULL,
Sample_Data_Col1 varchar (50) NOT NULL,
Sample_Data_Col2 varchar (50) NOT NULL,
Sample_Data_Col3 varchar (50) NOT NULL)
GO

USE Target_DB
CREATE TABLE Tbl1 (
ID int NOT NULL,
Sample_Data_Col1 varchar (50) NOT NULL,
Sample_Data_Col2 varchar (50) NOT NULL,
Sample_Data_Col3 varchar (50) NOT NULL)
GO

USE Source_DB
DECLARE @rowcount int = 0
WHILE @rowcount < 1000
	BEGIN
		SET NOCOUNT ON
		INSERT INTO Tbl1
		(ID, Sample_Data_Col1, Sample_Data_Col2, Sample_Data_Col3)
		SELECT 
		@rowcount, 
		'Sample_Data' + CAST(@rowcount as varchar(10)), 
		'Sample_Data' + CAST(@rowcount as varchar(10)), 
		'Sample_Data' + CAST(@rowcount as varchar(10))
		SET @rowcount = @rowcount + 1
	END
GO

SELECT * INTO Tbl2 FROM Tbl1
SELECT * INTO Tbl3 FROM Tbl1

USE Target_DB
DECLARE @rowcount int = 0
WHILE @rowcount < 1000
	BEGIN
		SET NOCOUNT ON
		INSERT INTO Tbl1
		(ID, Sample_Data_Col1, Sample_Data_Col2, Sample_Data_Col3)
		SELECT 
		@rowcount, 
		'Sample_Data' + CAST(@rowcount as varchar(10)), 
		'Sample_Data' + CAST(@rowcount as varchar(10)), 
		'Sample_Data' + CAST(@rowcount as varchar(10))
		SET @rowcount = @rowcount + 1
	END
GO

DELETE FROM Target_DB.dbo.Tbl1
WHERE ID >= 500

UPDATE Source_DB.dbo.Tbl1
SET Sample_Data_Col1 = 'Changed_Data'
WHERE ID < 10
UPDATE Source_DB.dbo.Tbl1
SET Sample_Data_Col2 = 'Changed_Data'
WHERE ID < 10
UPDATE Source_DB.dbo.Tbl1
SET Sample_Data_Col3 = 'Changed_Data'
WHERE ID < 10

SELECT * INTO Tbl2 FROM Tbl1
SELECT * INTO Tbl3 FROM Tbl1

CREATE UNIQUE CLUSTERED INDEX [Clustered_Idx_Id] ON Source_DB.dbo.Tbl1
([ID] ASC)
GO
CREATE UNIQUE CLUSTERED INDEX [Clustered_Idx_Id] ON Source_DB.dbo.Tbl2
([ID] ASC)
GO
CREATE UNIQUE CLUSTERED INDEX [Clustered_Idx_Id] ON Source_DB.dbo.Tbl3
([ID] ASC)
GO
CREATE UNIQUE CLUSTERED INDEX [Clustered_Idx_Id] ON Target_DB.dbo.Tbl1
([ID] ASC)
GO
CREATE UNIQUE CLUSTERED INDEX [Clustered_Idx_Id] ON Target_DB.dbo.Tbl2
([ID] ASC)
GO
CREATE UNIQUE CLUSTERED INDEX [Clustered_Idx_Id] ON Target_DB.dbo.Tbl3
([ID] ASC)
GO

Next, let’s recreate the usp_DBSync stored procedure from the PREVIOUS POST. The SQL code can be found either going back to the start of this series – POST 1 – or alternatively downloaded from HERE. Without usp_DBSync stored procedure on the server the rest of the solution will not work so make sure that you re-create it first. Now that we have all necessary objects, we are ready to create the construct which will provide our looping functionality based on metadata and allow for multiple objects processing without the need to specify their names. In order to do this, let’s create a ‘wrapper’ stored procedure around usp_DBSync procedure executing the following code.

USE [Source_DB]
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[usp_SyncMultipleTables]')
AND type IN (N'P',N'PC'))
DROP PROCEDURE [dbo].[usp_SyncMultipleTables]
GO

CREATE PROCEDURE [usp_SyncMultipleTables]
@SourceDBName varchar (256),
@SourceSchemaName varchar (50),
@TargetDBName varchar (256),
@TargetSchemaName varchar (50)

AS
BEGIN
	SET NOCOUNT ON
	DECLARE @Err_Msg varchar (max)
	DECLARE @IsDebugMode bit = 1
	DECLARE @SQLSource nvarchar (max) =
	'INSERT INTO #TempTbl
	(ObjectName, SchemaName, DBName, Source_vs_Target)
	SELECT DISTINCT
	o.name, '''+@SourceSchemaName+''', '''+@SourceDBName+''', ''S''
	FROM   '+@SourceDBName+'.sys.tables t
	JOIN '+@SourceDBName+'.sys.schemas s ON t.schema_id = s.schema_id
	JOIN '+@SourceDBName+'.sys.objects o ON t.schema_id = o.schema_id
	WHERE S.name = '''+@SourceSchemaName+''' and o.type = ''U'''

	DECLARE @SQLTarget nvarchar (max) =
	'INSERT INTO #TempTbl
	(ObjectName, SchemaName, DBName, Source_vs_Target)
	SELECT DISTINCT
	o.name, '''+@TargetSchemaName+''', '''+@TargetDBName+''', ''T''
	FROM   '+@TargetDBName+'.sys.tables t
	JOIN '+@TargetDBName+'.sys.schemas s ON t.schema_id = s.schema_id
	JOIN '+@TargetDBName+'.sys.objects o ON t.schema_id = o.schema_id
	WHERE S.name = '''+@TargetSchemaName+''' and o.type = ''U'''

	CREATE TABLE #TempTbl
	(ObjectName varchar (256),
	SchemaName varchar (50),
	DBName varchar (50),
	Source_vs_Target char(1))

	EXEC sp_executesql @SQLSource
	EXEC sp_executesql @SQLTarget

	IF @IsDebugMode = 1
	SELECT * FROM #TempTbl

	CREATE TABLE #TempFinalTbl
	(ID int IDENTITY (1,1),
	ObjectName varchar (256))

	INSERT INTO #TempFinalTbl
	(ObjectName)
	SELECT ObjectName
	FROM #TempTbl a
	WHERE Source_vs_Target = 'S'
	INTERSECT
	SELECT ObjectName
	FROM #TempTbl a
	WHERE Source_vs_Target = 'T'

	IF @IsDebugMode = 1
		SELECT * FROM #TempFinalTbl
	IF @IsDebugMode = 1
		PRINT 'The following tables will be merged between the source and target databases...'
		DECLARE @ID int
		DECLARE @TblName varchar (256)
		DECLARE cur CURSOR FOR
			SELECT ID, ObjectName FROM #TempFinalTbl
			OPEN cur
			FETCH NEXT FROM cur INTO @ID, @TblName
			WHILE @@FETCH_STATUS = 0
			BEGIN
				PRINT '' + CAST(@ID as varchar (20))+'. '+ @TblName +''
				FETCH NEXT FROM cur INTO @ID, @TblName
			END
		CLOSE cur
		DEALLOCATE cur

	DECLARE @ObjectName varchar (256)
	DECLARE db_cursor CURSOR
		FOR
			SELECT ObjectName
			FROM #TempFinalTbl
			OPEN db_cursor
			FETCH NEXT
			FROM db_cursor INTO @ObjectName
				WHILE @@FETCH_STATUS = 0
					BEGIN
						PRINT char(10)
						PRINT 'Starting merging process...'
						PRINT 'Merging ' + @SourceDBName + '.' + @SourceSchemaName + '.' + @ObjectName + ' with '+ @TargetDBName + '.' + @TargetSchemaName + '.' + @ObjectName + ''
						EXEC [dbo].[usp_DBSync] @SourceDBName, @TargetDBName, @SourceSchemaName, @TargetSchemaName, @ObjectName, @ObjectName
						FETCH NEXT FROM db_cursor INTO @ObjectName
					END
			CLOSE db_cursor
			DEALLOCATE db_cursor
END

This code simply reiterates through tables which share same names between two different databases, providing necessary metadata for our code MERGE SQL stored procedure (usp_DBSync). Given we have our environment setup correctly i.e. we executed the first code snippet to prep our databases and objects and we also have usp_DBSync stored procedure sitting on our Source_DB database we can run the usp_SyncMultipleTables procedure to see if it correctly accounted for the database objects and the data they hold as well as whether data has been synchronised successfully. Let’s execute our stored procedure and observe the output using the following SQL.

USE [Source_DB]
GO
DECLARE	@return_value int
EXEC	@return_value = [dbo].[usp_SyncMultipleTables]
		@SourceDBName = N'Source_DB',
		@SourceSchemaName = N'dbo',
		@TargetDBName = N'Target_DB',
		@TargetSchemaName = N'dbo'
SELECT	'Return Value' = @return_value
GO

Finally, when comparing the data between the two databases, we should note that all tables i.e. Tbl1, Tbl2 and Tbl3 have been synchronised and contain the same data. Please note that only tables with same names will be synchronised. If you think of merging data from tables with different names, you need to provide additional functionality to account for source versus target objects mapping.

Using SQL Stored Procedure and SQL Server Integration Services

If you are familiar with SQL Server Integration Services, we can achieve the same result building a simple solution in BIDS or SQL Server Data Tools. Let’s re-create the environment again running the first SQL code snippet again to start with a clean slate. We also have to recreate usp_DBSync stored procedure as per PREVIOUS POST SQL code – without usp_DBSync stored procedure re-created the rest of the solution will not work. Next, we will create a simple stored procedure which will be used by our SSIS package to pass through the object names before we can initiate the looping functionality.

USE [Source_DB]
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[usp_ReturnObjectsMetadata]')
AND type IN (N'P',N'PC'))
DROP PROCEDURE [dbo].[usp_ReturnObjectsMetadata]
GO

CREATE PROCEDURE usp_ReturnObjectsMetadata
(@SourceSchemaName varchar (50),
@SourceDBName varchar (256),
@TargetSchemaName varchar (50),
@TargetDBName varchar (256))
AS
BEGIN
	SET NOCOUNT ON
	DECLARE @SQLSource nvarchar (max) =
	'INSERT INTO #TempTbl
	(ObjectName, SchemaName, DBName, Source_vs_Target)
	SELECT DISTINCT
	o.name, '''+@SourceSchemaName+''', '''+@SourceDBName+''', ''S''
	FROM   '+@SourceDBName+'.sys.tables t
	JOIN '+@SourceDBName+'.sys.schemas s ON t.schema_id = s.schema_id
	JOIN '+@SourceDBName+'.sys.objects o ON t.schema_id = o.schema_id
	WHERE S.name = '''+@SourceSchemaName+''' and o.type = ''U'''

	DECLARE @SQLTarget nvarchar (max) =
	'INSERT INTO #TempTbl
	(ObjectName, SchemaName, DBName, Source_vs_Target)
	SELECT DISTINCT
	o.name, '''+@TargetSchemaName+''', '''+@TargetDBName+''', ''T''
	FROM   '+@TargetDBName+'.sys.tables t
	JOIN '+@TargetDBName+'.sys.schemas s ON t.schema_id = s.schema_id
	JOIN '+@TargetDBName+'.sys.objects o ON t.schema_id = o.schema_id
	WHERE S.name = '''+@TargetSchemaName+''' and o.type = ''U'''

	CREATE TABLE #TempTbl
	(ObjectName varchar (256),
	SchemaName varchar (50),
	DBName varchar (50),
	Source_vs_Target char(1))

	EXEC sp_executesql @SQLSource
	EXEC sp_executesql @SQLTarget

	CREATE TABLE #TempFinalTbl
	(ID int IDENTITY (1,1),
	ObjectName varchar (256))

	INSERT INTO #TempFinalTbl
	(ObjectName)
	SELECT DISTINCT ObjectName
	FROM #TempTbl a
	WHERE Source_vs_Target = 'S'
	INTERSECT
	SELECT DISTINCT ObjectName
	FROM #TempTbl a
	WHERE Source_vs_Target = 'T'

	SELECT DISTINCT ObjectName FROM #TempFinalTbl
END

Finally, we are ready to build a simple SSIS package which will handle iterating through object names as merging occurs (all files for this package can be downloaded from HERE). Let’s create a simple SSIS solution starting with a setting up a database connection (the name will be different as per the environment which your’re developing on) and the following list of variables.

Continuing on, let’s place Execute SQL Task component on the Control Flow pane and adjust its properties under General settings to the following SQL statement and Result Set option.

Next, let’s map the parameters names to our variables and adjust Result Set properties as per the images below.

This part of package is responsible for populating our TableNames variable with the names of the objects we will be looping through. In order to reiterate through table names we will place For Each Loop container from Toolbar on the development pane, join it to the first Execute SQL Task transformation with default constraint option and place another Execute SQL Task container inside the For Each Loop one. Next, let’s adjust the second Execute SQL Task container’s properties as per below.

Lastly, let’s go through similar exercise with the For Each Loop transformation making sure that the Enumerator in the Collection property pane is set to Foreach ADO Enumerator, ADO Object Source Variable is set to User::TableNames variable and that Variable Mapping property is adjusted to match our User::TableNames variable as per images below.

That should be just enough development to provide us with some basic, rudimentary functionality for the package to serve the intended purpose. Let’s test it out to so hopefully when you run the package it will synchronise all the database objects (can be confirmed with a simple SELECT * FROM <table_name> SQL statement) and the development pane output will be as per image below.

This concludes this mini-series. If you happen to stumble upon this blog and find it somewhat useful, please don’t hesitate to leave me a comment – any feedback is appreciate, good or bad! Again, the first post to this series can be viewed HERE and all the SQL code as well as the solution files can be downloaded from HERE.

Submit Article :- BlinkList + Blogmarks + Digg + Del.icio.us + Ekstreme Socializer + Feedmarker + Furl + Google Bookmarks + ma.gnolia + Netvouz + RawSugar + Reddit + Scuttle + Shadows + Simpy + Spurl + Technorati + Unalog + Wink

http://scuttle.org/bookmarks.php/pass?action=add

Posted in: How To's, SQL

Tags: SQL

This entry was posted on Wednesday, September 18th, 2013 at 12:29 am and is filed under How To's, SQL. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

2 Responses to “How To Synchronise Data Across Two SQL Server Databases – Part 2. SQL Code, SSIS Package And Application For Multiple Objects Processing”

Andy November 18th, 2014 at 3:01 am

Thanks for such a great solution, can you please show to sybc data across two Server? so the source db in server1 and the target db in server2? your help is appreciated

admin November 18th, 2014 at 10:09 am

Hi, the solution would be the same with the exception of adding additional qualifier to the code/variables etc. i.e. instead of ‘database.schema.object’ you will use ‘server.database.schema.object’ where ‘server’ would most likely be your linked server. Also, you would need to alter the stored procedure from PART 1, perhaps breaking it into 2 separate stored procedures/statements (one for INSERT and one for UPDATE) as from memory, MERGE SQL statement cannot be executed over linked server connection……..hope that helps, Martin

What you are looking at...

My name is Martin and this site is a random collection of recipes and reflections about various topics covering information management, data engineering, machine learning, business intelligence and visualisation plus everything else that I fancy to categorise under the 'analytics' umbrella. I'm a native of Poland but since my university days I have lived in Melbourne, Australia and worked as a DBA, developer, data architect, technical lead and team manager. My main interests lie in both, helping clients in technical aspects of information management e.g. data modelling, systems architecture, cloud deployments as well as business-oriented strategies e.g. enterprise data solutions project management, data governance and stewardship, data security and privacy or data monetisation. On the whole, I am very fond of anything closely or remotely related to data and as long as it can be represented as a string of ones and zeros and then analysed and visualised, you've got my attention!

Outside sporadic updates to this site I typically find myself fiddling with data, spending time with my kids or a good book, the gym or watching a good movie while eating Polish sausage with Zubrowka (best served on rocks with apple juice and a lime twist). Please read on and if you find these posts of any interests, don't hesitate to leave me a comment!

Subscribe via RSS | Comments (RSS)

Advertising

Tags

SQL66

Most Viewed Posts In The Last 30 Days

bicortex