How to implement SSIS Package Checkpoints – The Power of Restartability

Data transfers can take a substantial amount of time and when the package fails during execution it would be counter-productive at best and dangerous at worst to restart the whole package and start from the beginning. This can be prevented by means of using checkpoints which store the execution log inside a file to enable rerunning the routine without repeating the already executed steps. To see how that works let’s create a simple data source and data destination table by executing the following script.

--CREATE SOURCE TABLE
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'Tbl_Source'))
DROP TABLE Tbl_Source
CREATE TABLE Tbl_Source
(
ID int NULL,
First_Name varchar (50) NULL,
Last_Name varchar (50) NULL,
Age int NULL,
Age_Category varchar (50) NULL
)
ON [PRIMARY]

--CREATE DESTINATION TABLE
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'Tbl_Destination'))
DROP TABLE Tbl_Destination
CREATE TABLE Tbl_Destination
(
ID INT NULL,
First_Name varchar (50) NULL,
Last_Name varchar (50) NULL,
Age int NULL,
Age_Category varchar (50) NULL
)
ON [PRIMARY]

--POPULATE SOURCE TABLE
INSERT INTO Tbl_Source
(ID, First_Name, Last_Name, Age, Age_Category)
SELECT 1, 'John', 'Smith', 44, NULL
UNION ALL
SELECT 2, 'Chris', 'Martin',23, NULL
UNION ALL
SELECT 3, 'Jerry', 'March',59, NULL
UNION ALL
SELECT 4, 'Kate', 'Johnson',20, NULL
UNION ALL
SELECT 5, 'Mary', 'McKinnon',37, NULL

This should create a rudimentary source table which will contain 6 records and a destination table which we will be loading the source data into. Next, let’s create a simple SSIS package. When you finished lying out and joining all the components together you should be able to see a structure which resembles the image below.

To step through the individual components, first, place a ‘Date Flow Task’ on the Control Flow and going into ‘Data Flow’ tab place ‘OLE DB Source’ and ‘OLE DB Destination’, link them together and establish the mapping between the columns from Tbl_Source to Tbl_Destination. Go back to the ‘Control Flow’ and place another ‘Execute SQL Task’, call it ‘SQL Task – Raise Error’. Then create a connection to the database and write the following statement in the ‘SQLStatement’ property.

--DUMMY SELECT TO THROW AN ERROR
SELECT 1/0

The purpose of this is for the package to fail and return an error on runtime. Once more, add ‘Execute SQL Task’, call it ‘SQL Task – Update Age Category’ and input the following query which will be responsible for updates to the ‘Age Category’ column in the destination table.

--UPDATE tbl_Destination WITH Age_Category
UPDATE Tbl_Destination
SET Age_Category = CASE WHEN Age < 18 THEN 'Juvenile'                         
                        WHEN Age >=18 AND Age < 50 THEN 'Middle Age'                         
                        WHEN Age > 50 THEN 'Senior' END

At this point all we have got is just a simple package which, when run, will fail on step two (division by zero). As the purpose of this exercise is to implement restartability, we need to enable package checkpoint functionality so the next time the package is run, it will start from the step which has failed and hasn’t been executed, rather than from the very start. This will prevent the data from the destination table to be transferred twice for the exactly the same records and prevent duplication and issues with further processing and output. To enable checkpoints in our package, thus creating a point of failure where the package will restart from, click on an empty area in the ‘Control Flow’ and go to ‘Properties’ window. Next, in the ‘Checkpoints’ category set ‘SaveCheckpoints’ as ‘True’, ‘CheckpointUsage’ as ‘IfExists’ and set ‘CheckpointFileName’ to a file with any name, ending in .XML. For the purpose of this post, it is probably best to save it in your local c:\ drive for convenience but in a production environment you will need to have a designated folder for this purpose. Finally, change the ‘FailPackageOnFailure’ property to ‘True’ on all ‘Control Flow’ tasks as per below.

It is time to run the package for the first time now and during its first run you should get an error half way through the execution.

This is expected as our SQL statement ‘SELECT 1/0’ should throw an exception. Also, our destination table should be incomplete as the UPDATE Statement has not been yet executed and all the ‘Age_Category’ fields should be unpopulated (NULL values). Let’s change this SQL code in ‘SQL Task – Raise Error’ component into the following statement in order to rectify this error.

--CORRECT SELECT STATEMENT TO AVOID ERROR
SELECT 1/1

Now, let’s run the package again. This time around you should notice that instead of running from the beginning, the package should start from step 2 i.e. ‘SQL Task – Raise Error’ and finish successfully.

Also, the previously unpopulated ‘Age_Category’ column in ‘Tbl_Destination’ table should now be populated with the corresponding values from the UPDATE statement.

The trick to this clever behavior lies in the XML file the SSIS process created in our selected location. This file stores the entry for each task executed by the package and in case of unsuccessful execution; it sets the re-start entry to the task which encountered an error. If the package runs without any issues, the file will be deleted automatically. Below is the content of the file opened in WordPad.

Package checkpoints is a really handy way of cutting back on package execution times when an error is encountered, particularly in case of long running or complex builds.

http://scuttle.org/bookmarks.php/pass?action=add

Tags: ,

This entry was posted on Monday, August 13th, 2012 at 12:10 pm and is filed under How To's, SQL, SSIS. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply