Designing data acquisition framework in SQL Server and SSIS – how to source and integrate external data for a decision support system or data warehouse (Part 4)

Note: Part 1 to this series can be found HERE, Part 2 HERE, Part 3 HERE and all the code and additional files for this post can be downloaded from my OneDrive folder HERE.

Closing this series with the final post, this part focuses on the SSIS package structure and ties up all the code and architecture described in previous posts. The SQL Server Integration Services solution featured here is very minimalistic as all the heavy lifting is done by the code described mainly in parts two and three. The package is meant to mainly provide a structure or framework to execute individual components and control the workflow and, beyond this functionality, does not utilise any features which facilitate data acquisition e.g. SSIS lookups, merges, conditional splits, data conversion or aggregation etc. There is a little bit of C# code driving some transformations, but most of the components that constitute the package’s structure act as triggers to external code execution only.

The whole package (Control Flow as there is no Data Flow transformations) consists mainly of Execute SQL Task transformations, with a few Script Tasks here and there as per the image below.

Data_Acquisition_Framework_Part4_Complete_SSIS_Package

With the exception of two For Each Loop containers which, as the name suggests, loop through small and large tables, all components execute in a sequential order and once only. Most transformations are connected using Precedence Constraints based on two evaluation operators: Expression and Constraint, as per the image below.

Data_Acquisition_Framework_Part4_SSIS_Precedence_Constraint_View

This ensures that all the tasks executed not only are required to finish with the status of SUCCESS but also that the underlying code (a stored procedure in most cases) has completed without errors i.e. the returned variable’s @Is_All_OK value is one (1). If the value of @Is_All_OK variable is not one (alternative value is zero), the package workflow should steer into executing task(s) responsible for notifying administrators of the potential issue and terminating subsequent tasks execution.

Let’s go over the package skeleton and how some of the transformations are structured starting with the two top components i.e. ‘Set_All_OK and Sync_Exec_StartTime Variables Values’ and ‘Set DBMail_Receipients Variable Value’. The first transformation simply populates two variables which are used in subsequent package workflow – The @Sync_Exec_Start which is storing package start date and time and @Is_All_OK one which is preliminarily set to zero (0). The following C# snippet is used for this functionality.

#region Help:  Introduction to the script task
/* The Script Task allows you to perform virtually any operation that can be accomplished in
 * a .Net application within the context of an Integration Services control flow. 
 * 
 * Expand the other regions which have "Help" prefixes for examples of specific ways to use
 * Integration Services features within this script task. */
#endregion


#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
#endregion

namespace ST_44c20947eb6c4c5cb2f698bdd17b3534
{
    /// <summary>
    /// ScriptMain is the entry point class of the script.  Do not change the name, attributes,
    /// or parent of this class.
    /// </summary>
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        #region Help:  Using Integration Services variables and parameters in a script
        /* To use a variable in this script, first ensure that the variable has been added to 
         * either the list contained in the ReadOnlyVariables property or the list contained in 
         * the ReadWriteVariables property of this script task, according to whether or not your
         * code needs to write to the variable.  To add the variable, save this script, close this instance of
         * Visual Studio, and update the ReadOnlyVariables and 
         * ReadWriteVariables properties in the Script Transformation Editor window.
         * To use a parameter in this script, follow the same steps. Parameters are always read-only.
         * 
         * Example of reading from a variable:
         *  DateTime startTime = (DateTime) Dts.Variables["System::StartTime"].Value;
         * 
         * Example of writing to a variable:
         *  Dts.Variables["User::myStringVariable"].Value = "new value";
         * 
         * Example of reading from a package parameter:
         *  int batchId = (int) Dts.Variables["$Package::batchId"].Value;
         *  
         * Example of reading from a project parameter:
         *  int batchId = (int) Dts.Variables["$Project::batchId"].Value;
         * 
         * Example of reading from a sensitive project parameter:
         *  int batchId = (int) Dts.Variables["$Project::batchId"].GetSensitiveValue();
         * */

        #endregion

        #region Help:  Firing Integration Services events from a script
        /* This script task can fire events for logging purposes.
         * 
         * Example of firing an error event:
         *  Dts.Events.FireError(18, "Process Values", "Bad value", "", 0);
         * 
         * Example of firing an information event:
         *  Dts.Events.FireInformation(3, "Process Values", "Processing has started", "", 0, ref fireAgain)
         * 
         * Example of firing a warning event:
         *  Dts.Events.FireWarning(14, "Process Values", "No values received for input", "", 0);
         * */
        #endregion

        #region Help:  Using Integration Services connection managers in a script
        /* Some types of connection managers can be used in this script task.  See the topic 
         * "Working with Connection Managers Programatically" for details.
         * 
         * Example of using an ADO.Net connection manager:
         *  object rawConnection = Dts.Connections["Sales DB"].AcquireConnection(Dts.Transaction);
         *  SqlConnection myADONETConnection = (SqlConnection)rawConnection;
         *  //Use the connection in some code here, then release the connection
         *  Dts.Connections["Sales DB"].ReleaseConnection(rawConnection);
         *
         * Example of using a File connection manager
         *  object rawConnection = Dts.Connections["Prices.zip"].AcquireConnection(Dts.Transaction);
         *  string filePath = (string)rawConnection;
         *  //Use the connection in some code here, then release the connection
         *  Dts.Connections["Prices.zip"].ReleaseConnection(rawConnection);
         * */
        #endregion


        /// <summary>
        /// This method is called when this script task executes in the control flow.
        /// Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.
        /// To open Help, press F1.
        /// </summary>
        /// 
        public void Main()
        {
            // TODO: Add your code here
            DateTime saveNow = DateTime.Now;
            Dts.Variables["Is_All_OK"].Value = 0;
            Dts.Variables["Sync_Exec_StartTime"].Value = saveNow;
            Dts.TaskResult = (int)ScriptResults.Success;
        }

        #region ScriptResults declaration
        /// <summary>
        /// This enum provides a convenient shorthand within the scope of this class for setting the
        /// result of the script.
        /// 
        /// This code was generated automatically.
        /// </summary>
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

    }
}

The ‘Set DBMail_Receipients Variable Value’ task uses a SQL function described in part 1 to fetch e-mail addresses stored in one of the control tables and map those to one of the package variables used for sending out notifications in the event of any errors (see image below).

Data_Acquisition_Framework_Part4_SSIS_Email_Notification_Function_ExecuteSQLTask

Continuing on, we have a bunch of Execute SQL Task transformations, all simply triggering stored procedures executions (code outlined in part 3) while passing a number of parameter values or reading values passed back into the package from the underlying code. Some of the SSIS variables are predefined with their respective values, whereas others are assigned to values as the package executes. Below is a sample image depicting parameter mapping for a task responsible for schema definition changes check.

Data_Acquisition_Framework_Part4_SSIS_Task_Parameter_Mapping

As we can see, a number of input parameters are mapped to user pre-defined variables. A full list of package-defined variables is as per image below (click on image to enlarge).

Data_Acquisition_Framework_Part4_SSIS_Package_Variables_View

Moving on, after all pre-acquisition tasks have been completed successfully, a Sequence Container encompassing actual data acquisition tasks is placed. This transformation is broken down into two groups – one for small tables (running dynamic merge statement) and one for larger ones (running parallel INSERTS). Each of those tasks runs in a For Each Loop container, iterating through object names assigned based on the query from the preceding tasks which populate ‘Object’ data type variables as per image below.

Data_Acquisition_Framework_Part4_SSIS_Get_Big_Tables_SQL_Task

Once schema and table names are retrieved, this data is passed into the looping transformation which executes data acquisition routines for each combination of schema and table name as per image below.

Data_Acquisition_Framework_Part4_SSIS_ForEachLoop_Editor

Both data acquisition routines, even though looping through each table in sequential order, execute in simultaneously until all objects marked for synchronisation have been accounted for. Since both data acquisition stored procedures are designed not to halt data copying process in case of any exception occurring, but rather log any issues and continue on, this sequence container also links up with a package size error triggering process which assigns predefined values to error-related variables using Script Task and send out an e-mail notification to nominated e-mail addresses. The code snippet below provides predefined set of values for further error notification routine execution.

#region Help:  Introduction to the script task
/* The Script Task allows you to perform virtually any operation that can be accomplished in
 * a .Net application within the context of an Integration Services control flow. 
 * 
 * Expand the other regions which have "Help" prefixes for examples of specific ways to use
 * Integration Services features within this script task. */
#endregion


#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
#endregion

namespace ST_7be0163238e6428e99c1178dd2a1c435
{
    /// <summary>
    /// ScriptMain is the entry point class of the script.  Do not change the name, attributes,
    /// or parent of this class.
    /// </summary>
	[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
	public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
	{
        #region Help:  Using Integration Services variables and parameters in a script
        /* To use a variable in this script, first ensure that the variable has been added to 
         * either the list contained in the ReadOnlyVariables property or the list contained in 
         * the ReadWriteVariables property of this script task, according to whether or not your
         * code needs to write to the variable.  To add the variable, save this script, close this instance of
         * Visual Studio, and update the ReadOnlyVariables and 
         * ReadWriteVariables properties in the Script Transformation Editor window.
         * To use a parameter in this script, follow the same steps. Parameters are always read-only.
         * 
         * Example of reading from a variable:
         *  DateTime startTime = (DateTime) Dts.Variables["System::StartTime"].Value;
         * 
         * Example of writing to a variable:
         *  Dts.Variables["User::myStringVariable"].Value = "new value";
         * 
         * Example of reading from a package parameter:
         *  int batchId = (int) Dts.Variables["$Package::batchId"].Value;
         *  
         * Example of reading from a project parameter:
         *  int batchId = (int) Dts.Variables["$Project::batchId"].Value;
         * 
         * Example of reading from a sensitive project parameter:
         *  int batchId = (int) Dts.Variables["$Project::batchId"].GetSensitiveValue();
         * */

        #endregion

        #region Help:  Firing Integration Services events from a script
        /* This script task can fire events for logging purposes.
         * 
         * Example of firing an error event:
         *  Dts.Events.FireError(18, "Process Values", "Bad value", "", 0);
         * 
         * Example of firing an information event:
         *  Dts.Events.FireInformation(3, "Process Values", "Processing has started", "", 0, ref fireAgain)
         * 
         * Example of firing a warning event:
         *  Dts.Events.FireWarning(14, "Process Values", "No values received for input", "", 0);
         * */
        #endregion

        #region Help:  Using Integration Services connection managers in a script
        /* Some types of connection managers can be used in this script task.  See the topic 
         * "Working with Connection Managers Programatically" for details.
         * 
         * Example of using an ADO.Net connection manager:
         *  object rawConnection = Dts.Connections["Sales DB"].AcquireConnection(Dts.Transaction);
         *  SqlConnection myADONETConnection = (SqlConnection)rawConnection;
         *  //Use the connection in some code here, then release the connection
         *  Dts.Connections["Sales DB"].ReleaseConnection(rawConnection);
         *
         * Example of using a File connection manager
         *  object rawConnection = Dts.Connections["Prices.zip"].AcquireConnection(Dts.Transaction);
         *  string filePath = (string)rawConnection;
         *  //Use the connection in some code here, then release the connection
         *  Dts.Connections["Prices.zip"].ReleaseConnection(rawConnection);
         * */
        #endregion


		/// <summary>
        /// This method is called when this script task executes in the control flow.
        /// Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.
        /// To open Help, press F1.
        /// </summary>
		public void Main()
		{
			// TODO: Add your code here
            Dts.Variables["Error_Message"].Value = "Execution of the Mizzen data acquisition package failed on an unspecyfied step, most likely due to an SSIS transformation failure. Please troubleshoot.";
            Dts.Variables["Process_Name"].Value = "Non-specyfic SSIS Job Transformation Failure";
            Dts.Variables["Object_Name"].Value = "N/A";
            Dts.TaskResult = (int)ScriptResults.Success;
		}

        #region ScriptResults declaration
        /// <summary>
        /// This enum provides a convenient shorthand within the scope of this class for setting the
        /// result of the script.
        /// 
        /// This code was generated automatically.
        /// </summary>
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

	}
}

Following on, the rest of the tasks are responsible for index recreation, statistics refresh and performing a number of rudimentary checks to ensure the data copied across is consistent with the source environment. Recreating indexes task is utilising the same stored procedure as the one responsible for dropping them, the only difference is the argument passed to the underlying code i.e. instead DROP we are using CREATE statement.

Data_Acquisition_Framework_Part4_SSIS_Index_SQL_Statement

Next up we have another Execute SQL Task for updating table statistics and completing acquisition process a simple Execute Script task to store core package execution tasks completion date and time. This Execute Script task carries almost identical code to the one used to log package execution start date and time, the only difference being the variable name used i.e. ‘Sync_Exec_EndTime’ rather then ‘Sync_Exec_StartTime’.

public void Main()
        {
            // TODO: Add your code here
            DateTime saveNow = DateTime.Now;
            Dts.Variables["Sync_Exec_EndTime"].Value = saveNow;
            Dts.TaskResult = (int)ScriptResults.Success;
        }

Final section of this workflow deals with record count comparison between source and target and querying AdminDBA database for any errors that may have been logged. Details of the actual code is explained in part 3 to this series so for the sake of succinctness I will refrain from repeating myself here, but hopefully by now you have a fairly good understanding of how the package works and that, in its core, it is just a shell to control individual steps execution workflow. The whole package, as previously mentioned, can be downloaded from my OneDrive folder HERE.

To conclude, this short series should provide anyone versed in Microsoft Business Intelligence tools with a basic framework used to source transactional data for staging purposes. As a blueprint, many steps outlined in previous parts can be skipped or extended depending on your business needs as this implementation was ‘ripped out’ one of my clients’ environments and may not be suitable in its entirety as a drop-in solution in other projects. However, unless streaming or near-real time staging data is a requirement, this sort of framework can become and good springboard for the majority of projects, allowing for further modifications and tweaking if required.

http://scuttle.org/bookmarks.php/pass?action=add

Tags: , , ,

This entry was posted on Wednesday, May 25th, 2016 at 6:34 am and is filed under Data Modelling, SQL, SQL Server, SSIS. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply