Incremental Table Update Using SSIS

In this post I will demonstrate how to update a table (let’s call it a destination table) with data from another table (let’s call this one source table) based on the differences between the two via a SQL Server Integration Services package. This exercise can achieve the same result through a number of different implementation methods but since SSIS is a godsend for any kind of data transfer/manipulation automation, this process can be scheduled and left to its own devises. Also, as SSIS is very transparent about its processing (we can readily see what is happening and troubleshoot accordingly) and to build (who would want to write code when you can drop a few containers, stitch them together and consider it done!), this method is brain-dead and quick to develop/deploy. Doing it via SSIS also provides an option to handle errors in a more elegant way and in a single pass-through.

So let’s start with creating two new tables which will become our data source and destination as well as populating them with some dummy data via the following queries. This will form our core data source which we will use to execute incremental update against based on the difference between them.

--CREATE SOURCE TABLE
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[tbl_Source]') AND type in (N'U'))
DROP TABLE [dbo].[tbl_Source]

CREATE TABLE [dbo].[tbl_Source](
	[Item_SKU] [int] NULL,
	[Item_Name] [varchar](100) NULL,
	[Item_Category] [varchar](50) NULL,
	[Item_Price] [money] NULL,
	[Item_Availability_Flag] [char](1) NULL,
	[Item_Discount_Flag] [char](1) NULL
) ON [PRIMARY]

--POPULATE SOURCE TABLE WITH DUMMY DATA
INSERT INTO tbl_Source
VALUES (2299, 'Pen', 'Stationary', 2.50, 'Y', 'Y');
INSERT INTO tbl_Source
VALUES (2212, 'Binder' , 'Stationary' , 2.10, 'Y', 'Y');
INSERT INTO tbl_Source
VALUES (1217, 'Scissors' , 'Stationary' , 3.20, 'N', 'N');

--CREATE DESTINATION TABLE
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[tbl_Destination]') AND type in (N'U'))
DROP TABLE [dbo].[tbl_Destination]

CREATE TABLE [dbo].[tbl_Destination](
	[Item_SKU] [int] NULL,
	[Item_Name] [varchar](100) NULL,
	[Item_Category] [varchar](50) NULL,
	[Item_Price] [money] NULL,
	[Item_Availability_Flag] [char] (1) NULL,
	[Item_Discount_Flag] [char](1) NULL
) ON [PRIMARY]

--POPULATE DESTINATION TABLE WITH DUMMY DATA
INSERT INTO tbl_Destination
VALUES (2299, 'Pen', 'Stationary', 2.50,'Y', 'Y');

--SELECT DATA OUT OF BOTH TABLES
SELECT * FROM tbl_Source;
SELECT * FROM tbl_Destination;

The output from the above constitutes of two tables, source table has three items with their corresponding categories, price, SKU number etc., whereas, destination table has only one item. Based on this premise we need to update destination table with the missing two items from source table via SSIS package.

The package will only require one Data Flow Task on the Control Flow pane and most of the work will involve Data Flow components adjustments.

Let’s start by placing a Data Flow component from Toolbox onto the project surface area. Next, in the Data Flow Area we place two OLE DB Source components representing the two tables we previously created i.e. tbl_Source and tbl_Destination.

After configuring connections and selecting all columns from both sources we can place two Sort Data Flow components which should link to the OLE DB Source containers. Also, to configure Sort containers, we only specify Item_SKU column, while all remaining attributes are set as ‘Pass Through’ as per below. This is because we will compare tables by looking only at their Item_SKU fields, which should be distinct (can act as primary keys). This is an important feature of this process.

Next, we place and configure the Merge Join component. Here, the join type is set to left outer join (as we want to select all data from the source table), the join is configured between Item_SKU columns only, all columns are selected from tbl_Source and finally only one column – Item_SKU – is selected from tbl_Destination table, which we also rename to give it a more distinct alias. All these steps are highlighted in the image below.

Once Merge Join has been configured, we use Conditional Split component to segregate the data. All we really need to do here is to specify the condition which will allow this transform to pull out the records which do not exist in tbl_Destination. These records have been qualified as SKU_ISNULL in the Merge Join component in the previous transform. To do this, we use ISNULL(SKU_ISNULL) statement and give the output a name, in this case it’s ‘New_Data’.

Finally, we create an OLE DB Destination container and when linking it to the Conditional Split transform, we select ‘New_Data’ (or whatever name we used to describe the data difference) from the drop-down box as per below.

As the last step we map the outputs from conditional split with the column names from tbl_Destination.

The same result can be achieved using other methods. For example, sticking to SSIS and BIDS we can use Execute SQL Task control flow component with the following query:

INSERT INTO tbl_Destination
(Item_SKU,Item_Name,Item_Category,Item_Price,Item_Availability_Flag,Item_Discount_Flag)
SELECT * FROM tbl_Source
EXCEPT 
SELECT * FROM tbl_Destination
GO

Alternatively, we can issue a simple update statement which will do the trick just as nicely. Whichever method rocks your boat, this is just one of them and given some further customizations it can be quite effective in comparison to using simple ‘UPDATE’ or ‘INSERT with EXCEPT’. For example we can isolate the new data (tables difference) and do some further processing or transformations which would be cumbersome using SQL only.

Submit Article :- BlinkList + Blogmarks + Digg + Del.icio.us + Ekstreme Socializer + Feedmarker + Furl + Google Bookmarks + ma.gnolia + Netvouz + RawSugar + Reddit + Scuttle + Shadows + Simpy + Spurl + Technorati + Unalog + Wink

http://scuttle.org/bookmarks.php/pass?action=add

Posted in: SSIS

Tags: Code, SQL, SSIS

This entry was posted on Thursday, February 9th, 2012 at 4:22 am and is filed under SSIS. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

What you are looking at...

My name is Martin and this site is a random collection of recipes and reflections about various topics covering information management, data engineering, machine learning, business intelligence and visualisation plus everything else that I fancy to categorise under the 'analytics' umbrella. I'm a native of Poland but since my university days I have lived in Melbourne, Australia and worked as a DBA, developer, data architect, technical lead and team manager. My main interests lie in both, helping clients in technical aspects of information management e.g. data modelling, systems architecture, cloud deployments as well as business-oriented strategies e.g. enterprise data solutions project management, data governance and stewardship, data security and privacy or data monetisation. On the whole, I am very fond of anything closely or remotely related to data and as long as it can be represented as a string of ones and zeros and then analysed and visualised, you've got my attention!

Outside sporadic updates to this site I typically find myself fiddling with data, spending time with my kids or a good book, the gym or watching a good movie while eating Polish sausage with Zubrowka (best served on rocks with apple juice and a lime twist). Please read on and if you find these posts of any interests, don't hesitate to leave me a comment!

Subscribe via RSS | Comments (RSS)