How to implement Slowly Changing Dimensions – Part 4. Using MERGE T-SQL to load dimension data

August 26th, 2012 / 1 Comment » / by admin

In the first post I briefly outlined how to set up Slowly Changing Dimensions process using default ETL functionality (Slowly Changing Dimensions Component in SSIS, SQL Server 2012). This was followed by another 2 posts to this series outlining the usage of Checksum SSIS transformation as well as Dimension Merge SCD component which also take advantage of Microsoft’s Integration Services environment. As much as I appreciate wizard driven process setup and a quick and painless deployment cycle, on the downside, some of those components have minimal customization options and as it is in case of default SCD component – take a big performance hit when dealing with large volumes of data. Sometimes, custom coding seems to work best and given the fact that MERGE function has been present in SQL code repertoire since version 2008, it is a great alternative for using out-of-the-box, standard SSIS component.

In this example I will be using mostly the same objects and dummy dataset as I used for SSIS SCD in Part 1, Part 2 and Part 3, so if you’re keen to get your hands dirty just execute the SQL below to have something to work with.

--CREATE RELATIONAL TABLES AND CONSTRAINTS. POPULATE TABLES WITH SOME DUMMY DATA
CREATE TABLE Customer(
CustomerID int IDENTITY (1,1),
First_Name varchar (50) NULL,
Middle_Name varchar (50) NULL,
Last_Name varchar (50) NULL,
Email_Address varchar (100) NULL,
Phone_Number varchar (50) NULL,
Created_Date datetime NULL,
Modified_Date datetime NULL,
CONSTRAINT pk_CustID PRIMARY KEY (CustomerID))

CREATE TABLE Address(
AddressID int IDENTITY (1,1) NOT NULL,
Address1 varchar (50) NULL,
Address2 varchar (50) NULL,
City varchar (50) NULL,
Post_Code char (4) NULL,
Is_Current_Flag char(1) NULL,
CONSTRAINT pk_AddressID PRIMARY KEY (AddressID))

CREATE TABLE Customer_Address_Bridge(
CustomerID int NOT NULL,
AddressID int NOT NULL)

GO

INSERT INTO Customer
(First_Name, Middle_Name, Last_Name,Email_Address,Phone_Number,Created_Date,Modified_Date)
SELECT 'Mary', 'Joeanne', 'Black', 'mary0120@yahoo.com.au', '03-8573-9455', '2012-01-01', NULL UNION ALL
SELECT 'John', 'Lorance', 'Moore', 'johnnym@awol.com', '03-3423-1155', '2012-01-01', '2012-05-30' UNION ALL
SELECT 'Martin', NULL, 'Laser', 'mlaser91@aol.com', '03-2355-1109', '2012-01-01', '2012-05-12' UNION ALL
SELECT 'Spencer', 'Chris', 'McEvans', 'spencerdude@hotmail.com', '03-1122-0007', '2012-01-01', '2012-05-30' UNION ALL
SELECT 'Mark', NULL, 'King', 'mk038722@gmail.com', '03-3423-1155', '2012-01-01', '2012-05-30' UNION ALL
SELECT 'Mary', 'Susan', 'Grey', 'mmgrey@gmail.com', '03-1299-3859', '2012-01-01', NULL UNION ALL
SELECT 'Luis', 'Blake', 'Shimaro', 'shimarolou@yahoo.com.au', '03-0385-3999', '2012-01-01', NULL UNION ALL
SELECT 'Natalie', 'G', 'Chin', 'nataliechin@mediasmarts.com.au', '03-3759-1001', '2012-01-01', NULL UNION ALL
SELECT 'Marian', NULL, 'McErin', 'marianmcerin@gmail.com', '03-3400-3331', '2012-01-01', '2012-05-01' UNION ALL
SELECT 'Rick', 'Tony', 'Webster', 'rikky69@gmail.com', '03-9459-1112', '2012-01-01', NULL

INSERT INTO Address
(Address1, Address2, City, Post_Code, Is_Current_Flag)
SELECT '6 Agave Street', 'Apartment 4A', 'Launceston', '7250', 'Y' UNION ALL
SELECT '88 Dodge Street', NULL, 'Sunshine', '3020', 'Y' UNION ALL
SELECT '3 McKenzie Court', 'Level 9', 'Perthville', '2795', 'Y' UNION ALL
SELECT '5 Spencer Drive', 'Unit 9D', 'Melbourne', '3002', 'Y' UNION ALL
SELECT '8 Sunny Avenue', NULL, 'Sydney', '2000', 'Y' UNION ALL
SELECT '83 Mara Drive', NULL, 'Echuca', '3563', 'Y' UNION ALL
SELECT '1038 Mustang Street', NULL, 'Brisbane', '4000', 'Y' UNION ALL
SELECT '1 Bradman Street', NULL, 'Bendigo', '3550', 'Y' UNION ALL
SELECT '12 Cruger Drive', 'Block C', 'Cairns', '4870', 'Y' UNION ALL
SELECT '124 Lasting Court', NULL, 'Adelaide', '5000', 'Y'

INSERT INTO Customer_Address_Bridge
(CustomerID, AddressID)
SELECT 1, 1 UNION ALL
SELECT 2,2 UNION ALL
SELECT 3,3 UNION ALL
SELECT 4,4 UNION ALL
SELECT 5,5 UNION ALL
SELECT 6,6 UNION ALL
SELECT 7,7 UNION ALL
SELECT 8,8 UNION ALL
SELECT 9,9 UNION ALL
SELECT 10, 10

--CREATE CONSTRAINTS
ALTER TABLE Customer_Address_Bridge
ADD FOREIGN KEY (AddressID) REFERENCES Address(AddressID)
ALTER TABLE Customer_Address_Bridge
ADD FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)

GO

--CREATE DIMENSION 'DimCustomer' TABLE
CREATE TABLE DimCustomer(
CustomerSK int IDENTITY (1,1) NOT NULL,
CustomerID int NULL,
First_Name varchar (50) NULL,
Last_Name varchar (50) NULL,
Email_Address varchar (100) NULL,
City varchar (50) NULL,
DT_Valid_From datetime NULL,
DT_Valid_To datetime NULL,
Is_Current bit NULL,
CONSTRAINT PK_DimCustomer PRIMARY KEY CLUSTERED (CustomerSK asc))

GO

--POPULATE DimCustomer TABLE WITH DUMMY DATA
INSERT INTO DimCustomer
(CustomerID, First_Name, Last_Name, City, Email_Address, DT_Valid_From, DT_Valid_To, Is_Current)
SELECT cu.CustomerID, cu.First_Name, cu.Last_Name, ad.City, cu.Email_Address, GetDate(), '9999-12-31', 1
FROM dbo.Address AS ad INNER JOIN
dbo.Customer_Address_Bridge AS br ON ad.AddressID = br.AddressID INNER JOIN
dbo.Customer AS cu ON br.CustomerID = cu.CustomerID

When executed, this code should give us few tables populated with some dummy data. These, combined together, create a base for the last insert statement which is responsible for the initial ‘DimCustomer’ table population. Provided you executed the code, the ‘DimCustomer’ table should have the content as per the image below.

As you can see, all the records have the status of  ‘Is_Current’ and a very distant expiry date (‘DT_Valid_To’) which is a default for a currently valid record. To see how MERGE SQL works on updating dimension tables, first we need to make some changes to our source data. Let’s assume that ‘Last_Name’, ‘First_Name’ and ‘City’ have been defined as Type 2 attributes whereas ‘Email_Address’ will be treated as Type 1 column. Let’s execute the following UPDATE statements to alter our source data to see how MERGE T-SQL works in practice.

--UPDATE SOURCE TABLES
UPDATE Customer
SET First_Name = 'zzzzzzz' WHERE customerid = 4

GO

UPDATE Customer
SET Email_Address = 'zzzzzzz@yyy.com' WHERE CustomerID = 8

Now that we have our source  table updated, first, let’s look at how to handle SCD Type 2 using MERGE function. As you recall, Type 2 SCD allows for tracking history by creating  new record in a dimension table (as opposed to Type 0 which ignores updates completely or Type 1 which only keeps the latest record). Type 2 also handles records  termination by expiring the rows and inserting identifiers which are usually expressed by either populating dates column indication insertion and expiry or setting the flag attribute to an appropriate boolean value e.g. ‘Y’/’N’ or number 0/1 or both. In order to process Type 2 SCD, the code needs to do the following:

  • Insert brand new customer rows with appropriate effective and end dates
  • Expire the old rows for those rows which have a type 2 attribute change by setting the appriopriate end date (DT_Valid_To field in our code) and Boolean flag (Is_Current in our code) to 0
  • Insert the changed Type 2 rows with appropriate effective and end dates and Boolean flag (Is_Current in our code) to 1

The T-SQL MERGE statement can only update a single row per incoming row, but there is a trick that we can take advantage of by making use of the OUTPUT clause. Merge can output the results of what it has done, which in turn can be consumed by a separate INSERT statement. We will therefore use the MERGE statement to update the existing record, terminating it, and then pass the relevant source rows out to the INSERT statement to create the new row. This may sound a bit convoluted but in reality is rather simple. We have already established that the following columns to be treated as Type 2 SCD: ‘First_Name’, ‘Last_Name’, and ‘City’. This means that for those 3 columns we will want an additional record created for each changed value detected in the source table. We will omit ‘Email_Address’ attribute from this statement for the moment as this will be addressed when dealing with Type 1 SCD. The code starts with the outer INSERT and SELECT clause to handle the changed row inserts at the end of the MERGE statement. This has to come first because the MERGE is nested inside the INSERT. The code includes several references to ‘GETDATE()’ as it presumes the change was effective as of today. Finally, following the code, there are comments that refer to the line numbers.

--UPDATE CUSTOMER TABLE - TYPE 2 SLOWLY CHANGING DIMENSION
INSERT INTO DimCustomer
(
CustomerID, First_Name, Last_Name, Email_Address, City, Is_Current, DT_Valid_From, DT_Valid_To
)
SELECT
CustomerID, First_Name, Last_Name, Email_Address, City, 1, GetDate(), '9999-12-31'
FROM
	(
	MERGE DimCustomer as Target
		USING
			(SELECT cu.CustomerID,
			cu.First_Name,
			cu.Last_Name,
			ad.City,
			cu.Email_Address
			FROM dbo.Address AS ad INNER JOIN
			dbo.Customer_Address_Bridge AS br ON ad.AddressID = br.AddressID INNER JOIN
			dbo.Customer AS cu ON br.CustomerID = cu.CustomerID)
			as [Source]
	                ON Target.CustomerID = Source.CustomerID AND Target.Is_Current = 1
	                WHEN MATCHED AND
			(
			Target.First_Name <> [Source].First_Name OR
			Target.Last_Name <> [Source].Last_Name OR
			Target.City <> [Source].City
			)
				THEN UPDATE SET
				Is_Current = 0,
				DT_Valid_To = GETDATE()
	WHEN NOT MATCHED BY TARGET
			THEN INSERT
			(
                        CustomerID,
                        First_Name,
                        Last_Name,
                        Email_Address,
                        City,
                        DT_Valid_From,
                        DT_Valid_To,
                        Is_Current
                        )
			VALUES
			(
                        [Source].CustomerID,
                        [Source].First_Name,
                        [Source].Last_Name,
                        [Source].Email_Address,
                        [Source].City,
                        GetDate(),
                        '9999-12-31',
                        1
                        )
	WHEN NOT MATCHED BY SOURCE AND Target.Is_Current = 1
			THEN UPDATE
			SET IS_Current = 0,
			DT_Valid_From = GetDate(),
			DT_Valid_To = GetDate()
	OUTPUT $action as Action,[Source].*) as MergeOutput
WHERE MergeOutput.Action = 'UPDATE'
AND CustomerID IS NOT NULL;

Now, let’s go through the code in details and shed some light on what it is doing.

  • First 8 lines are just a standard INSERT from the subsequent MERGE statement
  • Line 10 qualifies ‘DimCustomer’ table as our destination table and aliases it as ‘Target’. This is where the actual MERGE function starts
  • Lines 11 – 20 creates our dimensional data query which will be used to populate ‘DimCustomer’ and alases it as ‘Source’
  • Line 21 specifies the link between the ‘Target’ and the ‘Source’
  • Lines 22 – 30 specifies what needs to be done when selected Type 2 attributes i.e. ‘First_Name’, ‘Last_Name’ and ‘City’ from ‘Source’ and ‘Target’ match. This is based based on the link set up in line 21. If the match is found, ‘Is_Current’ flag is set to 0 and DT_Valid_To is set to the current date and time.
  • Lines 31 – 53 specify what the query is required to do when the match between ‘Source’ and ‘Target’ is not found. In such instance the code inserts new records to the ‘Target’ dimension table.
  • Lines 54 – 58 specify that all rows of ‘Target’ table that do not match the rows returned by ‘Source’ and that satisfy any additional search condition are updated.
  • Line 59 returns a row for every row in ‘Target’ table that is updated, inserted or deleted, in no particular order. This line also includes $Action column of type NVARCHAR (10) which stores values for each row depending on what action was executed against it i.e. INSERT, UPDATE or DELETE. In our case we return everything from the ‘Source’ and qualify/alias this dataset as ‘MergeOutput’. This dataset will be further narrowed down by WHERE clause in the next 2 lines.
  • Finally, lines 60 – 61 provide the condition for the INSERT statement (see lines 1-8) which specifies that we only require records with the ‘Action’ attribute of ‘UPDATE’ and where ‘CustomerID’ is present.

That is Type 2 SCD handled. Now, let’s look at how we handle Type 1 SCD. SCD Type 2 attributes excluded Email_Address fields which, as mentioned before, was qualified to be overwritten by the new/updated version. For Type 1 we also want to know when the record was updated. This block of code updates the Type 1 attributes (in this case, Email_Address). Line 8 (DimCustomer. Is_Current) check is optional depending on whether you only want to update current or all records.

--UPDATE CUSTOMER TABLE - TYPE 1 SLOWLY CHANGING DIMENSION
UPDATE DimCustomer
SET Email_Address = Customer.Email_Address,
DT_Valid_From = GetDate()
FROM DimCustomer INNER JOIN Customer
ON DimCustomer.CustomerID = Customer.CustomerID AND
DimCustomer.Email_Address <> Customer.Email_Address AND
DimCustomer.Is_Current = 1 -------> Optional

When both of those SQL statements are executed, the output in our dimension table should be as follows (click on the image to enlarge).

 

What we can observe here is that one row was added  (bottom red rectangle) to ‘DimCustomer’ table which precisely reflects our previous UPDATE statement to the source data. As you can recall we updated ‘CustomerID’ number 4 with a new first name changing the value from ‘Spencer’ to ‘zzzzzzz’. As ‘First_Name’ is a Type 2 attribute, a new record was added and the previous one (top red rectangle) expired. Also, as per our second update statement, we updated email address for ‘CustomerID’ number 8 from ‘nataliechin@mediasmarts.com’ to ‘zzzzzzz@yyy.com’. At this column was qualified as Type 1 Slowly Changing Dimension, no extra row was created and the existing record updated with a new ‘DT_Vlaid_From’ date (blue rectangle).

This is the last post to the series describing Slowly Changing Dimensions update using a variety of methods. If using MERGE T-SQL does not rock your boat you can always try implementing other alternatives to SCD handling. Please see my other posts about managing SCD using checksum transformation, SCD SSIS default component or SCD Dimension Merge SSIS transformation.

Tags: , ,

How to implement Slowly Changing Dimensions – Part 3. Using SSIS Dimension Merge SCD Component to load dimension data

August 23rd, 2012 / No Comments » / by admin

In the first post to the series I explained how SSIS default component for handling Slowly Changing Dimensions can be used when incorporated into a package. This method was followed by a second post depicting managing SCD via ‘Checksum Transformation’ third party add-in into Integration Services. In this post to the series I will not be leaving SSIS environment just yet as I still have not exhausted the options for additional transformations which can potentially be used for SCD processing. Before I explain how Merge T-SQL function can be utilised outside SSIS (see subsequent post), I would like to explore one more add-in and outline SSIS Dimension Merge SCD Component (from here on called DMSC). The component can be downloaded from HERE. Unfortunately, at the time of writing this post there is no version released for SQL Server 2012 hence all screen captures are based on BIDS environment and not SQL Server Data Tools.

Again, if you want to get your hands dirty, after component installation, execute THIS SQL script to create all database objects I will be using throughout this outline and create a new SSIS project in BIDS.

--CREATE RELATIONAL TABLES AND CONSTRAINTS. POPULATE TABLES WITH SOME DUMMY DATA
CREATE TABLE Customer(
CustomerID int IDENTITY (1,1),
First_Name varchar (50) NULL,
Middle_Name varchar (50),
Last_Name varchar (50) NULL,
Email_Address varchar (100) NULL,
Phone_Number varchar (50) NULL,
Created_Date datetime NULL,
Modified_Date datetime NULL,
CONSTRAINT pk_CustID PRIMARY KEY (CustomerID))

CREATE TABLE Address(
AddressID int IDENTITY (1,1) NOT NULL,
Address1 varchar (50) NULL,
Address2 varchar (50) NULL,
City varchar (50)  NULL,
Post_Code char (4) NULL,
Is_Current_Flag char(1) NULL,
CONSTRAINT pk_AddressID PRIMARY KEY (AddressID))

CREATE TABLE Customer_Address_Bridge(
CustomerID int NOT NULL,
AddressID int NOT NULL)
GO

INSERT INTO Customer
(First_Name, Middle_Name, Last_Name,Email_Address,Phone_Number,Created_Date,Modified_Date)
SELECT 'Mary', 'Joeanne', 'Black', 'mary0120@yahoo.com.au', '03-8573-9455', '2012-01-01', NULL UNION ALL
SELECT 'John', 'Lorance', 'Moore', 'johnnym@awol.com', '03-3423-1155', '2012-01-01', '2012-05-30' UNION ALL
SELECT 'Martin', NULL, 'Laser', 'mlaser91@aol.com', '03-2355-1109', '2012-01-01', '2012-05-12' UNION ALL
SELECT 'Spencer', 'Chris', 'McEvans', 'spencerdude@hotmail.com', '03-1122-0007', '2012-01-01', '2012-05-30' UNION ALL
SELECT 'Mark', NULL, 'King', 'mk038722@gmail.com', '03-3423-1155', '2012-01-01', '2012-05-30' UNION ALL
SELECT 'Mary', 'Susan', 'Grey', 'mmgrey@gmail.com', '03-1299-3859', '2012-01-01', NULL UNION ALL
SELECT 'Luis', 'Blake', 'Shimaro', 'shimarolou@yahoo.com.au', '03-0385-3999', '2012-01-01', NULL UNION ALL
SELECT 'Natalie', 'G', 'Chin', 'nataliechin@mediasmarts.com.au', '03-3759-1001', '2012-01-01', NULL UNION ALL
SELECT 'Marian', NULL, 'McErin', 'marianmcerin@gmail.com', '03-3400-3331', '2012-01-01', '2012-05-01' UNION ALL
SELECT 'Rick', 'Tony', 'Webster', 'rikky69@gmail.com', '03-9459-1112', '2012-01-01', NULL

INSERT INTO Address
(Address1, Address2, City, Post_Code, Is_Current_Flag)
SELECT '6 Agave Street', 'Apartment 4A', 'Launceston', '7250', 'Y' UNION ALL
SELECT '88 Dodge Street', NULL, 'Sunshine', '3020', 'Y' UNION ALL
SELECT '3 McKenzie Court', 'Level 9', 'Perthville', '2795', 'Y' UNION ALL
SELECT '5 Spencer Drive', 'Unit 9D', 'Melbourne', '3002', 'Y' UNION ALL
SELECT '8 Sunny Avenue', NULL, 'Sydney', '2000', 'Y' UNION ALL
SELECT '83 Mara Drive', NULL, 'Echuca', '3563', 'Y' UNION ALL
SELECT '1038 Mustang Street', NULL, 'Brisbane', '4000', 'Y' UNION ALL
SELECT '1 Bradman Street', NULL, 'Bendigo', '3550', 'Y' UNION ALL
SELECT '12 Cruger Drive', 'Block C', 'Cairns', '4870', 'Y' UNION ALL
SELECT '124 Lasting Court', NULL, 'Adelaide', '5000', 'Y'

INSERT INTO Customer_Address_Bridge
(CustomerID, AddressID)
SELECT 1, 1 UNION ALL
SELECT 2,2 UNION ALL
SELECT 3,3 UNION ALL
SELECT 4,4 UNION ALL
SELECT 5,5 UNION ALL
SELECT 6,6 UNION ALL
SELECT 7,7 UNION ALL
SELECT 8,8 UNION ALL
SELECT 9,9 UNION ALL
SELECT 10, 10

--CREATE CONSTRAINTS
ALTER TABLE Customer_Address_Bridge
ADD FOREIGN KEY (AddressID) REFERENCES Address(AddressID)
ALTER TABLE Customer_Address_Bridge
ADD FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)
GO

--CREATE DIMENSION 'DimCustomer' TABLE
CREATE TABLE DimCustomer(
CustomerSK int IDENTITY (1,1) NOT NULL,
CustomerID int NULL,
First_Name varchar (50) NULL,
Last_Name varchar (50) NULL,
Email_Address varchar (100) NULL,
City varchar (50) NULL,
DT_Valid_From datetime NULL,
DT_Valid_To datetime NULL,
Is_Current bit NULL,
CONSTRAINT PK_DimCustomer PRIMARY KEY CLUSTERED (CustomerSK asc))

Now that we have the groundwork laid out let’s see what the query and the query output for constructing our dimension table looks like – we will only be selecting a limited amount of columns needed to populate our dimension. Executing the following SQL code will give you an idea of what attributes we will be using to populate out ‘DimCustomer’ table.

SELECT cu.CustomerID,
cu.First_Name,
cu.Last_Name,
ad.City,
cu.Email_Address
FROM dbo.Address AS ad INNER JOIN
dbo.Customer_Address_Bridge AS br ON ad.AddressID = br.AddressID INNER JOIN
dbo.Customer AS cu ON br.CustomerID = cu.CustomerID

Now the fun part. Let’s create a simple SSIS solution in BIDS. Drop Data Flow Task onto the Control Flow surface and double-click it to enter the Data Flow development pane. Here, drop 2 OLE DB Source containers (here called ‘Merge Input Source’ and ‘Merge Input Target’), 1 Dimension Merge Slowly Changing Dimension transformation, 2 OLE DB Command transformations (here called ‘Update SCD Type 1 Records’ and ‘Expire SCD Type 2 Records‘) and 2 OLE DB Destination containers (here called ‘Insert New Records’ and ‘Insert SCD Type 2 Records’) onto the surface as per below.

Let’s go to each individual ‘source’ components to adjust their properties. First, in ‘Merge Input Source’ OLE DB Source create a database connection, change the ‘Data Access Mode’ to ‘SQL Command’ and input the same query as we used before to construct dimensional data from our source tables.

In the second OLD DB Source which we called ‘Merge Input Target’, choose ‘Table or View’ data access mode and select ‘DimCustomer’ table as our target object. Finally, link the two OLE DB Sources to DMSC transform ensuring that the one with the query is linked as ‘Source System’ and the one with the dimension destination table is marked as ‘Existing Dimension’. Don’t connect the remaining components just yet.

These are our source tables/query configured. Now, let’s tweak the core transformation which will do all the heavy-duty lifting for us in this package – our DMSC transformation. Follow the screenshots below to adjust individual tab properties leaving Row Change Detection, Surrogate Key Handling, Inferred Member Behaviour, Auditing, Logging and Performance tabs on default settings. The remaining tab properties adjustments are depicted below with the key sections to pay attention to highlighted in red.

If you look just at the screenshot above, you will notice that in the ‘Output Column Selection’ we have removed ‘Deleted’ output from the options provided. This is because we will not be deleting any records out of the existing dimension but without excluding it from this section, the package would not allow us to go any further, asking for explicitly specifying the records for deletion. Don’t worry about the error message when you click ‘Finish’, this should disappear when we configure the outputs for this DMSC. To do this, first, link DMSC transformation to ‘Insert New Records’ OLE DB Source selecting ‘New’ from the ‘Input Output Selection’. Go into the newly linked OLE DB Source, select our dimension table and create the mapping ensuring that surrogate keys are omitted. We don’t want those populated by the package as ‘CustomerSK’ is an identity column and should increment automatically. For our second OLE DB Source, link DMSC transformation selecting ‘New SCD 2’ option from the ‘Input Output Selection’. As per previous OLE DB Source configuration, create columns mappings, leaving out surrogate keys. All we should have left unlinked in our Data Flow pane are the two OLE DB Commands. Link DMSC to the first one – ‘Update SCD Type 1 Records’ OLE DB Command, select ‘Updated SCD1’ from the option dropdown and proceed to adjusting its properties. Select our previously created database connection from the ‘Connection Managers’ tab, go to ‘Component Properties’ tab and in its ‘SQLCommand’ property input the following query.

UPDATE DimCustomer
SET
Email_Address = ?,
DT_Valid_From = ?
WHERE CustomerSK = ?
AND Is_Current = 1

Go to the next tab and map the parameters and columns accordingly as per image below.

Lastly, configure the ‘Expire SCD Type 2 Records’ OLE DB Command. Link it to DMSC selecting ‘Expired SCD 2 with SCD 1 Changes’ from the selection available, input the database connection and in ‘Component Properties’ input the following string value.

UPDATE DimCustomer
SET
First_Name = ?,
Last_Name = ?,
City = ?,
DT_Valid_From = ?,
DT_Valid_To = ?,
Is_Current = ?
WHERE CustomerSK = ?
AND Is_Current = 1

Complete component adjustments by mapping columns to their corresponding parameters as per below.

That is our SSIS package finished. Go ahead and run it. All it should do at this stage is simply populate the ‘DimCustomer’ table with new values, inserting exactly 10 rows as per below.

Now let’s try and see how the package handles new, SCD Type 1 and SCD Type 2 records when we commit an update to our source tables by executing the following SQL script.

--INSERT NEW VALUES INTO CUSTOMER
INSERT INTO Customer
(First_Name, Middle_Name, Last_Name, Email_Address, Phone_Number, Created_Date, Modified_Date)
SELECT '11_xxx', NULL, '11_yyy', '11_some_email_address', NULL, GetDate(), NULL UNION ALL
SELECT '12_xxx', NULL, '12_yyy', '12_some_email_address', NULL, GetDate(), NULL UNION ALL
SELECT '13_xxx', NULL, '13_yyy', '13_some_email_address', NULL, GetDate(), NULL
GO

--INSERT NEW VALUES INTO ADDRESS
INSERT INTO Address
(City, Is_Current_Flag)
SELECT '11_some_city', 'Y' UNION ALL
SELECT '12_some_city', 'Y' UNION ALL
SELECT '13_some_city', 'Y'
GO

--UPDATE TYPE 1 SCD RECORDS. DON'T WORRY ABOUT UPDATING 'MODIFIED_DATE' COLUMN
UPDATE Customer
SET Email_Address = '1_some_email_address' WHERE CustomerID = 1
UPDATE Customer
SET Email_Address = '2_some_email_address' WHERE CustomerID = 2
UPDATE Customer
SET Email_Address = '3_some_email_address' WHERE CustomerID = 3
GO

--UPDATE TYPE 2 SCD RECORDS. DON'T WORRY ABOUT UPDATING 'MODIFIED_DATE' COLUMN
UPDATE Customer
SET First_Name = '4_some_first_Name', Last_Name = '4_some_last_Name' WHERE CustomerID = 4
UPDATE Customer
SET First_Name = '5_some_first_Name', Last_Name = '5_some_last_Name' WHERE CustomerID = 5
UPDATE Customer
SET First_Name = '6_some_first_Name', Last_Name = '6_some_last_Name' WHERE CustomerID = 6

--UPDATE BRIDGING TABLE
INSERT INTO Customer_Address_Bridge
(CustomerID, AddressID)
VALUES (11,11)
INSERT INTO Customer_Address_Bridge
(CustomerID, AddressID)
VALUES (12,12)
INSERT INTO Customer_Address_Bridge
(CustomerID, AddressID)
VALUES (13,13)

Try running the package again and your output should be identical to the one below.

What we have done here is made the DMSC to recognize that 3 records needed to be inserted (new data), 3 records required a Type 1 update and finally 3 records had to be expired due to Type 2 SCD insert of 3 more rows. This means that a total of 6 records needed to be inserted into our dimension table – 3 new ones and 3 changed ones (Type 2), adjusting the total row number to 16. This should be synonymous with what I have done in Part 1, 2 and 4, the only exception using Dimension Merge Slowly Changing Dimension transformation rather than T-SQL Merge function, checksum transformation or a default SSIS component. The output to this second execution is as per image below (click on it to enlarge).

What you can observe here is that records 1, 2 and 3 (blue rectangle) were updated according to SCD Type 1 i.e. no new rows were created, records 4, 5 and 6 were added as new rows which previously did not exist in this table and finally rows 7, 8 and 9 were created as SCD 2 update to existing records i.e. new entries were created for rows which already exist in the table but were expired due to changes to Type 2 attributes i.e. ‘First_Name’ and ‘Last_Name’.

Next up, the final part to this series, using T-SQL Merge function to update dimensional data with Type 1 and Type 2 SCD.

Tags: , ,