{"id":1295,"date":"2012-08-21T11:45:46","date_gmt":"2012-08-21T11:45:46","guid":{"rendered":"http:\/\/bicortex.com\/?p=1295"},"modified":"2013-11-15T00:41:35","modified_gmt":"2013-11-15T00:41:35","slug":"how-to-implement-slowly-changing-dimensions-part-2-using-checksum-transformation-ssis-component-to-load-dimension-data","status":"publish","type":"post","link":"http:\/\/bicortex.com\/bicortex\/how-to-implement-slowly-changing-dimensions-part-2-using-checksum-transformation-ssis-component-to-load-dimension-data\/","title":{"rendered":"How to implement Slowly Changing Dimensions \u2013 Part 2. Using Checksum Transformation SSIS component to load dimension data"},"content":{"rendered":"<p style=\"text-align: justify;\">In the previous post I briefly outlined the methodology and steps behind updating a dimension table using a default SCD component in Microsoft&#8217;s SQL Server Data Tools environment. This post also deals with SCD updates but this time I want to introduce a different, open-sourced, third party component which anyone can use to process SCD \u2013 the checksum transformation (you can download it <a href=\"http:\/\/www.sqlis.com\/post\/Checksum-Transformation.aspx\">HERE<\/a>). SQL Server Checksum function returns the checksum value computed over a row of a table or over a list of expressions. Checksum is intended for use in building hash indexes but in this context it will become valuable to update dimension tables based on the values the \u2018Checksum Transformation&#8217; component produces.<\/p>\n<p>If you want to continue hands-on with this tutorial I suggest you create all the objects and data I will be using throughout this post by executing the following SQL code.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n--CREATE RELATIONAL TABLES AND CONSTRAINTS. POPULATE TABLES WITH SOME DUMMY DATA\r\nCREATE TABLE Customer(\r\nCustomerID int IDENTITY (1,1),\r\nFirst_Name varchar (50) NULL,\r\nMiddle_Name varchar (50),\r\nLast_Name varchar (50) NULL,\r\nEmail_Address varchar (100) NULL,\r\nPhone_Number varchar (50) NULL,\r\nCreated_Date datetime NULL,\r\nModified_Date datetime NULL,\r\nCONSTRAINT pk_CustID PRIMARY KEY (CustomerID))\r\n\r\nCREATE TABLE Address(\r\nAddressID int IDENTITY (1,1) NOT NULL,\r\nAddress1 varchar (50) NULL,\r\nAddress2 varchar (50) NULL,\r\nCity varchar (50)  NULL,\r\nPost_Code char (4) NULL,\r\nIs_Current_Flag char(1) NULL,\r\nCONSTRAINT pk_AddressID PRIMARY KEY (AddressID))\r\n\r\nCREATE TABLE Customer_Address_Bridge(\r\nCustomerID int NOT NULL,\r\nAddressID int NOT NULL)\r\nGO\r\n\r\nINSERT INTO Customer\r\n(First_Name, Middle_Name, Last_Name,Email_Address,Phone_Number,Created_Date,Modified_Date)\r\nSELECT 'Mary', 'Joeanne', 'Black', 'mary0120@yahoo.com.au', '03-8573-9455', '2012-01-01', NULL UNION ALL\r\nSELECT 'John', 'Lorance', 'Moore', 'johnnym@awol.com', '03-3423-1155', '2012-01-01', '2012-05-30' UNION ALL\r\nSELECT 'Martin', NULL, 'Laser', 'mlaser91@aol.com', '03-2355-1109', '2012-01-01', '2012-05-12' UNION ALL\r\nSELECT 'Spencer', 'Chris', 'McEvans', 'spencerdude@hotmail.com', '03-1122-0007', '2012-01-01', '2012-05-30' UNION ALL\r\nSELECT 'Mark', NULL, 'King', 'mk038722@gmail.com', '03-3423-1155', '2012-01-01', '2012-05-30' UNION ALL\r\nSELECT 'Mary', 'Susan', 'Grey', 'mmgrey@gmail.com', '03-1299-3859', '2012-01-01', NULL UNION ALL\r\nSELECT 'Luis', 'Blake', 'Shimaro', 'shimarolou@yahoo.com.au', '03-0385-3999', '2012-01-01', NULL UNION ALL\r\nSELECT 'Natalie', 'G', 'Chin', 'nataliechin@mediasmarts.com.au', '03-3759-1001', '2012-01-01', NULL UNION ALL\r\nSELECT 'Marian', NULL, 'McErin', 'marianmcerin@gmail.com', '03-3400-3331', '2012-01-01', '2012-05-01' UNION ALL\r\nSELECT 'Rick', 'Tony', 'Webster', 'rikky69@gmail.com', '03-9459-1112', '2012-01-01', NULL\r\n\r\nINSERT INTO Address\r\n(Address1, Address2, City, Post_Code, Is_Current_Flag)\r\nSELECT '6 Agave Street', 'Apartment 4A', 'Launceston', '7250', 'Y' UNION ALL\r\nSELECT '88 Dodge Street', NULL, 'Sunshine', '3020', 'Y' UNION ALL\r\nSELECT '3 McKenzie Court', 'Level 9', 'Perthville', '2795', 'Y' UNION ALL\r\nSELECT '5 Spencer Drive', 'Unit 9D', 'Melbourne', '3002', 'Y' UNION ALL\r\nSELECT '8 Sunny Avenue', NULL, 'Sydney', '2000', 'Y' UNION ALL\r\nSELECT '83 Mara Drive', NULL, 'Echuca', '3563', 'Y' UNION ALL\r\nSELECT '1038 Mustang Street', NULL, 'Brisbane', '4000', 'Y' UNION ALL\r\nSELECT '1 Bradman Street', NULL, 'Bendigo', '3550', 'Y' UNION ALL\r\nSELECT '12 Cruger Drive', 'Block C', 'Cairns', '4870', 'Y' UNION ALL\r\nSELECT '124 Lasting Court', NULL, 'Adelaide', '5000', 'Y'\r\n\r\nINSERT INTO Customer_Address_Bridge\r\n(CustomerID, AddressID)\r\nSELECT 1, 1 UNION ALL\r\nSELECT 2,2 UNION ALL\r\nSELECT 3,3 UNION ALL\r\nSELECT 4,4 UNION ALL\r\nSELECT 5,5 UNION ALL\r\nSELECT 6,6 UNION ALL\r\nSELECT 7,7 UNION ALL\r\nSELECT 8,8 UNION ALL\r\nSELECT 9,9 UNION ALL\r\nSELECT 10, 10\r\n\r\n--CREATE CONSTRAINTS\r\nALTER TABLE Customer_Address_Bridge\r\nADD FOREIGN KEY (AddressID) REFERENCES Address(AddressID)\r\nALTER TABLE Customer_Address_Bridge\r\nADD FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)\r\nGO\r\n\r\n--CREATE DIMENSION 'DimCustomer' TABLE\r\nCREATE TABLE DimCustomer(\r\nCustomerSK int IDENTITY (1,1) NOT NULL,\r\nCustomerID int NULL,\r\nFirst_Name varchar (50) NULL,\r\nLast_Name varchar (50) NULL,\r\nEmail_Address varchar (100) NULL,\r\nCity varchar (50) NULL,\r\nDT_Valid_From datetime NULL,\r\nDT_Valid_To datetime NULL,\r\nIs_Current bit NULL,\r\nDimChecksum bigint NULL\r\nCONSTRAINT PK_DimCustomer PRIMARY KEY CLUSTERED (CustomerSK asc))\r\n\r\nALTER TABLE &#x5B;dbo].&#x5B;DimCustomer] ADD  CONSTRAINT &#x5B;DF_DimCustomer_RowChecksum]  DEFAULT ((0)) FOR &#x5B;DimChecksum]\r\nGO\r\n\r\n--CREATE TEMPORARY TABLE TO HOLD TYPE 1 SCD DATA\r\nCREATE TABLE Temp_Customer_Updates(\r\nChecksum bigint NULL,\r\nDimChecksum bigint NULL,\r\nCustomerID int NULL,\r\nFirst_Name varchar(50) NULL,\r\nLast_Name varchar(50) NULL,\r\nCity varchar (50) NULL,\r\nEmail_Address varchar (100) NULL,\r\nCustomerSK int NULL)\r\n<\/pre>\n<p style=\"text-align: justify;\">This should form the foundations for our SCD package. As with the previous post to this series (Part 1), attributes such as \u2018First_Name\u2019, \u2018Last_Name\u2019 and \u2018City\u2019 will be treated as Type 2 SCD whereas \u2018Email_Address\u2019 column will be marked as Type 1 SCD. Also, notice that this code is essentially the same as the code I created for the previous post in this series (Part 1) with two differences \u2013 firstly, our \u2018DimCustomer\u2019 dimension table contains an additional attribute (line such and such) called \u2018DimChecksum\u2019 (bigint datatype) to cater for our checksum transformation and we have added \u2018Temp_Customer_Updates\u2019 table as a temporary storage for Type 1 SCD (all explained later).<\/p>\n<p style=\"text-align: justify;\">Now, onto the main part &#8211; creating an SSIS package. Fire up SQL Server Data Tools development environment and create a new SSIS project. Next, on the \u2018Control Flow\u2019 pane drop three \u2018Execute SQL Task\u2019 and one \u2018Data Flow\u2019 transformations and link them together as per image below &#8211; &#8216;Truncate Temp Table&#8217; (Execute SQL Task 1) &#8211;&gt; &#8216;Process SCD&#8217; (Data Flow Task) &#8211;&gt; &#8216;Update DimCustomer from Temp Table&#8217; (Execute SQL Task 2) &#8211;&gt; &#8216;Update Is_Current Flag&#8217; (Execute SQL Task 3).<\/p>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_Initial_Layout.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1309\" title=\"SCD_Part2_SSIS_Initial_Layout\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_Initial_Layout.png\" alt=\"\" width=\"580\" height=\"123\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_Initial_Layout.png 580w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_Initial_Layout-300x63.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">Next, create a database connection in \u2018Truncate Temp Table\u2019 component\u00a0 and enter the following SQL in \u2018SQLStatement\u2019 property:<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nTRUNCATE TABLE Temp_Customer_Updates\r\n<\/pre>\n<p style=\"text-align: justify;\">This will empty our temporary table at the start of package execution. Following the green precedence lines, double-click on \u2018Process SCD\u2019 transform to go to \u2018Data Flow\u2019 pane. Here, drop the \u2018OLE DB Source\u2019 transformations onto the development surface and using the previously created database connection, input the following query into its \u2018SQL command text:\u2019 textbox.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nSELECT cu.CustomerID,\r\ncu.First_Name,\r\ncu.Last_Name,\r\nad.City,\r\ncu.Email_Address\r\nFROM dbo.Address AS ad INNER JOIN\r\ndbo.Customer_Address_Bridge AS br ON ad.AddressID = br.AddressID INNER JOIN\r\ndbo.Customer AS cu ON br.CustomerID = cu.CustomerID\r\n<\/pre>\n<p style=\"text-align: justify;\">This query\u2019s output will form our dimension table data structure. Next, drop the \u2018Checksum Transformation\u2019, link it to our data source and within the transform, on \u2018Input Columns\u2019 tab select all attributes from \u2018Available Input Columns\u2018 except for the business key as per below.<\/p>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_Checksum_Properties.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1310\" title=\"SCD_Part2_SSIS_Checksum_Properties\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_Checksum_Properties.png\" alt=\"\" width=\"580\" height=\"386\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_Checksum_Properties.png 580w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_Checksum_Properties-300x199.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">This means that only the columns selected will have a checksum values created for each row for SCD comparison. Now it is time to take care of the \u2018Lookup\u2019 transform. Open it up and adjust the properties for each property group i.e. General, Connection, Columns, Advanced and Error Output as per below.<\/p>\n<ul>\n<li style=\"text-align: justify;\">In \u2018General\u2019, select Cache Mode as \u2018Full Cache, Connection Type as \u2018OLE DB connection manager\u2019 and \u2018Redirect rows to no match output\u2019 from the options tied to handling entries which do not match. This will handle our Type 2 SCD or \u2018INSERT new rows\u2019 execution.<\/li>\n<li style=\"text-align: justify;\">In &#8216;Connection\u2019, use the database connection created earlier and select \u2018DimCustomer\u2019 table as our source<\/li>\n<li style=\"text-align: justify;\">In \u2018Columns\u2019, select \u2018CustomerSK\u2019 and \u2018DimChecksum\u2019 fields as the columns we will be bringing it and link the columns selected in \u2018Checksum Transformation\u2019 (query source) to their counterparts in \u2018DimCustomer\u2019 table as per below. Notice that we will not be linking \u2018Email_Address\u2019 attribute as this will be treated as Type 1 SCD i.e. not record created if a change occurs, only update to the existing data.<\/li>\n<\/ul>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_OLEDB_Lookup_Columns.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1311\" title=\"SCD_Part2_SSIS_OLEDB_Lookup_Columns\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_OLEDB_Lookup_Columns.png\" alt=\"\" width=\"580\" height=\"340\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_OLEDB_Lookup_Columns.png 580w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_OLEDB_Lookup_Columns-300x175.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<ul>\n<li style=\"text-align: justify;\">Don\u2019t alter \u2018Advanced\u2019 section and finally, change \u2018Error Output\u2019 group properties as per below.<\/li>\n<\/ul>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_OLEDB_Lookup_ErrorOutput.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1312\" title=\"SCD_Part2_SSIS_OLEDB_Lookup_ErrorOutput\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_OLEDB_Lookup_ErrorOutput.png\" alt=\"\" width=\"580\" height=\"107\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_OLEDB_Lookup_ErrorOutput.png 580w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_OLEDB_Lookup_ErrorOutput-300x55.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">The function of this \u2018Lookup\u2019 transform is to differentiate between Type 1 and Type 2 or \u2018INSERT new rows\u2019 SCD. The logic is simple &#8211; if a new record is found or an update to an existing Type 2 record is detected then redirect the data to \u2018Lookup No Match Output\u2019. If there are no new records created in the source or a change is detected in Type 1 SCD record then redirect data flow to \u2018Lookup Match Output\u2019 way. I have also attached \u2018Trash Destination\u2019 component to the \u2018Lookup\u2019 transform (which is just a neat way of terminating data flow) but you can replace it something more meaningful e.g. script component to send an e-mail on error or similar. Now, following \u2018Lookup No Match Output\u2019 precedence constraint, drop \u2018Derived Column\u2019 transform onto the surface and create &#8216;DT_Valid_From&#8217;, &#8216;DT_Valid_To&#8217; and &#8216;Is_Current&#8217; derived columns which will be used for populating control attributes in the dimension table relating to the record history. Assign the following expressions and corresponding data types to newly created derived column as per below.<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_Derived_Column.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-1313\" title=\"SCD_Part2_SSIS_Derived_Column\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_Derived_Column.png\" alt=\"\" width=\"580\" height=\"79\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">Next, attach \u2018Data Conversion\u2019 transform and change Is_Current input column data type into Boolean as per below.<\/p>\n<p style=\"text-align: justify;\">\u00a0<a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_Data_Conversion.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1304\" title=\"SCD_Part2_SSIS_Data_Conversion\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_Data_Conversion.png\" alt=\"\" width=\"580\" height=\"64\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">Finally, for our \u2018OLE DB Destination\u2019 which takes records with no match, match up the records from our source to the DimCustomer table making sure that \u2018Is_Current_Boolean\u2019 is paired up with \u2018Is_Current\u2019 \u00a0and &#8216;Checksum&#8217; is paired up with &#8216;DimChecksum&#8217; in the destination table (see below).<\/p>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_OLEDB_NoMatch.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1315\" title=\"SCD_Part2_SSIS_OLEDB_NoMatch\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_OLEDB_NoMatch.png\" alt=\"\" width=\"580\" height=\"262\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_OLEDB_NoMatch.png 580w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_OLEDB_NoMatch-300x135.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">That is our Type 2 SCD or \u2018INSERT new rows\u2019 \u00a0records handled. Now let\u2019s go back to our \u2018Lookup\u2019 transform and connect \u2018Conditional Split\u2019 transform for\u00a0 \u2018Lookup Match Output\u2019. This will handle Type 1 SCD. What we will do here is create a conditional statement which will filter out the records based on the checksum values. This means that if a checksum value created by our \u2018Checksum Transform\u2019 matches that of our dimension table than the records will be left unchanged i.e. there is no change for the Type 1 \u2018Email_Address\u2019 attribute. If however the checksum value is different than the rows with the new values will be moved to our temporary table which we set up at the start. The content of our \u2018Conditional Split\u2019 transformation looks as per image below.<\/p>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_Conditional_Split.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1317\" title=\"SCD_Part2_SSIS_Conditional_Split\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_Conditional_Split.png\" alt=\"\" width=\"580\" height=\"79\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_Conditional_Split.png 580w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_Conditional_Split-300x40.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">From here, all the is left to do is to terminate execution by ending the data flow for those rows where checksum value was matched by linking the \u2018Conditional Split\u2019 to \u2018Trash Destination&#8217;. Those records where the checksum figure was different though will need to be inserted into \u00a0\u2018Temp_Customer_Updates\u2019 table. The final \u2018Data Flow\u2019 package layout should look as per below.<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_DataFlow_Final.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-1318\" title=\"SCD_Part2_SSIS_DataFlow_Final\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_DataFlow_Final.png\" alt=\"\" width=\"580\" height=\"325\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">Going back to our \u2018Control Flow\u2019 layout, let\u2019s input he following query into \u2018Update DimCustomer from Temp Table\u2019 Execute SQL task container.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nUPDATE DimCustomer\r\nSET First_Name = Temp.First_Name,\r\n    Last_Name = Temp.Last_Name,\r\n    City = Temp.City,\r\n    Email_Address = Temp.Email_Address,\r\n    DimChecksum = Temp.Checksum,\r\n    DT_Valid_From = GetDate(),\r\n    DT_Valid_To = '9999-12-31'\r\nFROM Temp_Customer_Updates as Temp INNER JOIN\r\nDimCustomer as Cust on Temp.CustomerSK = Cust.CustomerSK\r\nWHERE Is_Current = 1\r\n<\/pre>\n<p style=\"text-align: justify;\">What that does is update Type 1 values in the dimension table with changed values from our temporary table as well as resetting date fields for history tracking. Finally, in our last Execute SQL Task container we will update \u2018Is_Current\u2019 flag across the whole dimension to ensure that only one record (oldest one based on the date created) for each \u2018CustomerID\u2019 has the flag \u2018Is_Current\u2019 against it. So go ahead and execute the following query which should complete our SSIS package development.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nUPDATE DimCustomer\r\nSET Is_Current = 0, DT_Valid_To = GetDate() \r\nFROM DimCustomer \r\nINNER JOIN\r\n\t(SELECT CustomerID, MAX(DT_Valid_From) as Last_Update\r\n    FROM DimCustomer\r\n    GROUP BY CustomerID HAVING Count(CustomerID)&gt;1) as a\r\n    ON DimCustomer.CustomerID =a.CustomerID\r\n    AND DimCustomer.DT_Valid_From &lt; a.Last_Update\r\n<\/pre>\n<p style=\"text-align: justify;\">If you have followed through with the above steps, you can now execute the package which should populate our dimension table with ten records (based on the dummy data inserted into our source tables execution the script at the beginning of this post). If everything goes well, you should only see Type 2 or \u2018INSERT new rows\u2019 section of the \u2018Data Flow\u2019 processing our source data as at this stage all records should be treated as new (see image below).<\/p>\n<p style=\"text-align: justify;\">\u00a0<a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_DimPopulation.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1326\" title=\"SCD_Part2_SSIS_DimPopulation\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_DimPopulation.png\" alt=\"\" width=\"580\" height=\"299\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_DimPopulation.png 580w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_DimPopulation-300x154.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">If you run the package again, without making any changes to the source tables, no data should be altered in the \u2018DimCustomer\u2019 dimension and the \u2018Data Flow\u2019 should process the incoming steam via the \u2018Conditional Split\u2019 which points to \u2018Trash Destination\u2019, meaning that no data alteration or addition was detected. Now, to test if the package does what is expected of it, let\u2019s alter the source tables to see the changes reflected and handled according to our Type 1 and Type 2 SCD assumptions. What the below query does is it adds 3 records to the source \u2018Customer\u2019 table which should be processed as additions to our existing table, changes \u2018Email_Address\u2019 fields of three existing records in the source data which should be processed as Type 1 SCD and finally updates \u2018First_Name\u2019, \u2018Last_Name\u2019 and \u2018City\u2019 of three other records which should be processed as Type 2 SCD update. The code is pretty much self-explanatory.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n--INSERT NEW VALUES INTO CUSTOMER\r\nINSERT INTO Customer\r\n(First_Name, Middle_Name, Last_Name, Email_Address, Phone_Number, Created_Date, Modified_Date)\r\nSELECT '11_xxx', NULL, '11_yyy', '11_some_email_address', NULL, GetDate(), NULL UNION ALL\r\nSELECT '12_xxx', NULL, '12_yyy', '12_some_email_address', NULL, GetDate(), NULL UNION ALL\r\nSELECT '13_xxx', NULL, '13_yyy', '13_some_email_address', NULL, GetDate(), NULL\r\nGO\r\n\r\n--INSERT NEW VALUES INTO ADDRESS\r\nINSERT INTO Address\r\n(City, Is_Current_Flag)\r\nSELECT '11_some_city', 'Y' UNION ALL\r\nSELECT '12_some_city', 'Y' UNION ALL\r\nSELECT '13_some_city', 'Y'\r\nGO\r\n\r\n--UPDATE TYPE 1 SCD RECORDS. DON'T WORRY ABOUT UPDATING 'MODIFIED_DATE' COLUMN\r\nUPDATE Customer\r\nSET Email_Address = '1_some_email_address' WHERE CustomerID = 1\r\nUPDATE Customer\r\nSET Email_Address = '2_some_email_address' WHERE CustomerID = 2\r\nUPDATE Customer\r\nSET Email_Address = '3_some_email_address' WHERE CustomerID = 3\r\nGO\r\n\r\n--UPDATE TYPE 2 SCD RECORDS. DON'T WORRY ABOUT UPDATING 'MODIFIED_DATE' COLUMN\r\nUPDATE Customer\r\nSET First_Name = '4_some_first_Name', Last_Name = '4_some_last_Name' WHERE CustomerID = 4\r\nUPDATE Customer\r\nSET First_Name = '5_some_first_Name', Last_Name = '5_some_last_Name' WHERE CustomerID = 5\r\nUPDATE Customer\r\nSET First_Name = '6_some_first_Name', Last_Name = '6_some_last_Name' WHERE CustomerID = 6\r\n\r\n--UPDATE BRIDGING TABLE\r\nINSERT INTO Customer_Address_Bridge\r\n(CustomerID, AddressID)\r\nVALUES (11,11)\r\nINSERT INTO Customer_Address_Bridge\r\n(CustomerID, AddressID)\r\nVALUES (12,12)\r\nINSERT INTO Customer_Address_Bridge\r\n(CustomerID, AddressID)\r\nVALUES (13,13)\r\n<\/pre>\n<p style=\"text-align: justify;\">When we link the tables together and single out the attributes for our dimension population we should get 3 records which qualify for Type 2 SCD update (row 4, 5, 6), 3 records which qualify for Type 1 SCD update (row 1, 2, 3) and 3 records which qualify just for an INSERT (row 11, 12, 13) as per below.<\/p>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_Source_Tables_AfterUpdate.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1320\" title=\"SCD_Part2_SSIS_Source_Tables_AfterUpdate\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_Source_Tables_AfterUpdate.png\" alt=\"\" width=\"580\" height=\"290\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_Source_Tables_AfterUpdate.png 580w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2012\/08\/SCD_Part2_SSIS_Source_Tables_AfterUpdate-300x150.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">Now, let\u2019s run the package again to see how those records are handled by the \u2018Data Flow\u2019 in SSIS once the package finished processing.<\/p>\n<p style=\"text-align: center;\">\u00a0<a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_DimUpdate.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-1328\" title=\"SCD_Part2_SSIS_DimUpdate\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_DimUpdate.png\" alt=\"\" width=\"580\" height=\"275\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">What we can notice here is a standard SCD Type 1 and 2 as well as new records INSERT handling. Let\u2019s start from the top (image below). Records 1, 2 and 3 (blue rectangle) have been updated according to Type 1 SCD methodology i.e. email addresses on those have been altered to reflected the latest status for those 3 customers without creating additional records. Records 11, 12, 13 (green rectangle) have been created as Type 2 attributes were changed in the source, therefore new records needed to be added for already existing rows (notice that \u2018CustomerIDs\u2019 are exactly the same for rows 4 to 6 as they are for rows 11 to 13. Finally, new records which did not exist in the dimension table (not updates) are represented by row numbers 14, 15, 16 (red rectangle). These are not updates, therefore SCD Type 1 or SCD Type 2 do not apply. It is just a straight INSERT.<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_DimCustomer_AfterFinal_Update.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-1322\" title=\"SCD_Part2_SSIS_DimCustomer_AfterFinal_Update\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2012\/08\/SCD_Part2_SSIS_DimCustomer_AfterFinal_Update.png\" alt=\"\" width=\"580\" height=\"180\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">So there you go \u2013 another method which can be utilized in order to update dimension tables which handles SCD types and new data elegantly and efficiently. In the next part to this series (part 3) I will describing the Kimball method Slowly Changing Dimension component which is another method one can implement to manage SCD through a SSIS environment.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the previous post I briefly outlined the methodology and steps behind updating a dimension table using a default SCD component in Microsoft&#8217;s SQL Server Data Tools environment. This post also deals with SCD updates but this time I want to introduce a different, open-sourced, third party component which anyone can use to process SCD [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32,5,50],"tags":[12,49,13],"class_list":["post-1295","post","type-post","status-publish","format-standard","hentry","category-how-tos","category-sql","category-ssis","tag-code","tag-sql","tag-ssis"],"aioseo_notices":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/posts\/1295","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/comments?post=1295"}],"version-history":[{"count":24,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/posts\/1295\/revisions"}],"predecessor-version":[{"id":2012,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/posts\/1295\/revisions\/2012"}],"wp:attachment":[{"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/media?parent=1295"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/categories?post=1295"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/tags?post=1295"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}