{"id":4298,"date":"2021-06-01T09:15:57","date_gmt":"2021-05-31T23:15:57","guid":{"rendered":"http:\/\/bicortex.com\/?p=4298"},"modified":"2021-09-20T14:56:09","modified_gmt":"2021-09-20T04:56:09","slug":"sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework","status":"publish","type":"post","link":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/","title":{"rendered":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your &#8216;E&#8217; in ELT\/ETL Using a Simple T-SQL Framework"},"content":{"rendered":"<h3 style=\"text-align: center;\">Introduction<\/h3>\n<p class=\"Standard\" style=\"text-align: justify;\">Note: All artifacts used in this demo can be downloaded from my shared OneDrive folder <a href=\"https:\/\/1drv.ms\/u\/s!AuEyKKgH71pxhN57_KkHtnRY1UOqdg?e=23m5w3\" target=\"_blank\" rel=\"noopener\">HERE<\/a>.<\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">Big part of Data Warehouse development has always been tied to structuring data acquisition pipelines before more rigid modelling takes place and data is wrangled (most likely) using the most the venerable methodologies today: Data Vault, Kimbal or Inmon. I have covered data acquisition topic in a number of previous posts (mainly <a href=\"http:\/\/bicortex.com\/data-acquisition-framework-using-custom-python-wrapper-for-concurrent-bcp-utility-execution\/\" target=\"_blank\" rel=\"noopener\">HERE<\/a> and <a href=\"http:\/\/bicortex.com\/designing-data-acquisition-framework-in-sql-server-and-ssis-how-to-source-and-integrate-external-data-for-a-decision-support-system-or-data-warehouse-part-1\/\" target=\"_blank\" rel=\"noopener\">HERE<\/a>), but would like to further expand on this subject with a short post on how to enhance this process (using a simplified example) and enable hash-partitioning on individual tables. Having concurrent executions i.e. a number of loads running in parallel, one for every source table, is fairly straightforward to achieve using SQL Server platform and a little bit of T-SQL. There are certainly more elegant approaches which enable turning sequential loads into a parallel ones, but with a little bit of elbow grease one can script out a batch of good, old-fashioned SQL Server Agent jobs with ease to be spun up and run simultaneously (example <a href=\"http:\/\/bicortex.com\/asynchronous-sql-execution-via-sql-server-agent-jobs\/\" target=\"_blank\" rel=\"noopener\">HERE<\/a>). However, in some cases, source tables can be quite large and having a single transaction responsible for the insertion of the whole table\u2019s content into a target object can be quite time-consuming. The situation can be further exacerbated if the primary key on the source table is a composite one or not of a numeric type.<\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">One of the clients I worked with a few years ago had this problem. They needed to reload the entire content of a few source tables on a nightly schedule and the framework they used was up to the task until data stored in those objects grew by a large margin. Those tables contained a lot of text data i.e. email, chat and telephone transcripts so many of the fields were quite wide. As the acquisition framework they used dynamically generated BIML scrips and SSIS packages, there was very little one could do to tune or modify the logic due to the proprietary nature of the framework\u2019s design. Data volumes increased exponentially and over time packages responsible for objects with very wide attributes started timing out, requiring expensive, bespoke development. Another, much more recent case, involved my current employer\u2019s choice of survey software. If you\u2019ve ever used <a href=\"https:\/\/www.limesurvey.org\/\" target=\"_blank\" rel=\"noopener\">LimeSurvey<\/a>, you will know that each survey\u2019s data is stored in a single, very wide table (think hundreds of attributes), with a dedicated column assigned to each question. That\u2019s not a big issue, especially that MySQL (database used in LimeSurvey) has a row limit of 65,535 bytes, however, this schema architecture enabled objects with very wide rows and shallow row count to be proliferated across the database. Extracting this data, particularly when lots of textual values are involved, can be quite slow when thousands of tables are involved.<\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">To demonstrate how to mitigate this type of problem, I will create a simple scenario where a source system table acquisition job can be run across multiple, hash-partitioned &#8216;chunks&#8217; concurrently. Based on a distinct primary (single column or composite) key, we will be able to distribute the table content across multiple, consistent and non-blocking streams of data, thus providing a significant acceleration of data movement and cutting the time dedicated to &#8216;E&#8217; (out of ELT\/ETL) by a large margin. I will also compare the time it takes to load this data using variable number of streams i.e. between 2 and 8, to demonstrate how this framework can be tuned to enable accelerated performance.<\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">The framework\u2019s simplified architecture view can be depicted as per the diagram below.<\/p>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2021\/06\/Hash_Partitioned_Data_Sync_Architecture.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4483\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2021\/06\/Hash_Partitioned_Data_Sync_Architecture.png\" alt=\"\" width=\"580\" height=\"429\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2021\/06\/Hash_Partitioned_Data_Sync_Architecture.png 896w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2021\/06\/Hash_Partitioned_Data_Sync_Architecture-300x222.png 300w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2021\/06\/Hash_Partitioned_Data_Sync_Architecture-768x568.png 768w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">On the left we have our source system (augmented by the system metadata stored in a separate database) and a &#8216;parent&#8217; process used for jobs coordination and dispatching. In the middle we have individual &#8216;workers&#8217; operating on a hash-defined partition of source data. Finally, on the right we have our target system where it all gets assembled into a mirror copy of the source table. I will go over how all this &#8216;hangs together&#8217; in more details in the section below.<\/p>\n<h3 style=\"text-align: center;\">Azure Source Database and Dummy Data Setup<\/h3>\n<p class=\"Standard\" style=\"text-align: justify;\">First, let\u2019s create all required resources on Azure. The following Python script is used to generate Azure resource group, Azure SQL server and an empty database. You can easily replace it with a different self-hosted SQL Server instance (it doesn&#8217;t need to live in Azure).<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nfrom azure.common.client_factory import get_client_from_cli_profile\r\nfrom azure.mgmt.resource import ResourceManagementClient\r\nfrom azure.mgmt.sql import SqlManagementClient\r\nfrom msrestazure.azure_exceptions import CloudError\r\nfrom os import popen\r\nimport pyodbc\r\n\r\nRESOURCE_GROUP = 'Test_Resource_Group'\r\nLOCATION = 'australiasoutheast'\r\nSQL_SERVER = 'sourceserver2020'\r\nSQL_DB = 'sourcedb'\r\nUSERNAME = 'testusername'\r\nPASSWORD = 'MyV3ry$trongPa$$word'\r\nDRIVER = '{ODBC Driver 17 for SQL Server}'\r\nexternal_IP = popen(&quot;curl -s ifconfig.me&quot;).readline()\r\n\r\n\r\ndef create_resource_group(resource_client, RESOURCE_GROUP, LOCATION):\r\n    print(&quot;\\nCreating Azure RG {rg_name}...&quot;.format(\r\n        rg_name=RESOURCE_GROUP), end=&quot;&quot;, flush=True)\r\n    try:\r\n        resource_client.resource_groups.create_or_update(\r\n            RESOURCE_GROUP, {'location': LOCATION})\r\n    except CloudError as e:\r\n        print(e)\r\n    rg = &#x5B;g.name for g in resource_client.resource_groups.list()]\r\n    if RESOURCE_GROUP in rg:\r\n        print('OK')\r\n\r\n\r\ndef create_sql_server(sql_client, RESOURCE_GROUP, SQL_SERVER, LOCATION, USERNAME, PASSWORD):\r\n    print(&quot;Creating Azure SQL Server {ssvr_name}...&quot;.format(\r\n        ssvr_name=SQL_SERVER), end=&quot;&quot;, flush=True)\r\n    try:\r\n        sql_server = sql_client.servers.begin_create_or_update(\r\n            RESOURCE_GROUP,\r\n            SQL_SERVER,\r\n            {\r\n                'location': LOCATION,\r\n                'version': '12.0',\r\n                'administrator_login': USERNAME,\r\n                'administrator_login_password': PASSWORD\r\n            }\r\n        )\r\n        sql_server.wait()\r\n    except CloudError as e:\r\n        print(e)\r\n    ssvr = &#x5B;i.name for i in sql_client.servers.list_by_resource_group(\r\n        RESOURCE_GROUP)]\r\n    if SQL_SERVER in ssvr:\r\n        print('OK')\r\n\r\n\r\ndef create_sql_db(sql_client, RESOURCE_GROUP, SQL_SERVER, SQL_DB, LOCATION):\r\n    print(&quot;Creating Azure SQL Database {db_name}...&quot;.format(\r\n        db_name=SQL_DB), end=&quot;&quot;, flush=True)\r\n    try:\r\n        sql_db = sql_client.databases.begin_create_or_update(\r\n            RESOURCE_GROUP,\r\n            SQL_SERVER,\r\n            SQL_DB,\r\n            {\r\n                'location': LOCATION,\r\n                'collation': 'SQL_Latin1_General_CP1_CI_AS',\r\n                'create_mode': 'default',\r\n                'requested_service_objective_name': 'Basic'\r\n            }\r\n        )\r\n        sql_db.wait()\r\n    except CloudError as e:\r\n        print(e)\r\n    dbs = &#x5B;i.name for i in sql_client.databases.list_by_server(\r\n        RESOURCE_GROUP, SQL_SERVER)]\r\n    if SQL_DB in dbs:\r\n        print('OK')\r\n\r\n\r\ndef configure_firewall(sql_client, DRIVER, RESOURCE_GROUP, SQL_SERVER, SQL_DB, USERNAME, PASSWORD, external_IP):\r\n    print(&quot;Configuring Azure SQL Server Firewall Settings...&quot;, end=&quot;&quot;, flush=True)\r\n    try:\r\n        sql_client.firewall_rules.create_or_update(\r\n            RESOURCE_GROUP,\r\n            SQL_SERVER,\r\n            &quot;firewall_rule_name_&quot; + external_IP,\r\n            {\r\n                &quot;startIpAddress&quot;: external_IP,\r\n                &quot;endIpAddress&quot;: external_IP\r\n            }\r\n        )\r\n    except CloudError as e:\r\n        print(e)\r\n    SQL_SERVER = SQL_SERVER + '.database.windows.net'\r\n    with pyodbc.connect('DRIVER='+DRIVER+';SERVER='+SQL_SERVER+';PORT=1433;DATABASE='+SQL_DB+';UID='+USERNAME+';PWD=' + PASSWORD) as conn:\r\n        with conn.cursor() as cursor:\r\n            cursor.execute(&quot;SELECT @@version&quot;)\r\n            row = cursor.fetchone()\r\n    if row:\r\n        print('OK')\r\n\r\n\r\ndef main():\r\n    # create resource client\r\n    resource_client = get_client_from_cli_profile(ResourceManagementClient)\r\n    create_resource_group(resource_client, RESOURCE_GROUP, LOCATION)\r\n    # create sql client\r\n    sql_client = get_client_from_cli_profile(SqlManagementClient)\r\n    # create sql server\r\n    create_sql_server(sql_client, RESOURCE_GROUP, SQL_SERVER,\r\n                      LOCATION, USERNAME, PASSWORD)\r\n    # create sql db in the basic tier\r\n    create_sql_db(sql_client, RESOURCE_GROUP, SQL_SERVER, SQL_DB, LOCATION)\r\n    # configure firewall\r\n    configure_firewall(sql_client, DRIVER, RESOURCE_GROUP,\r\n                       SQL_SERVER, SQL_DB, USERNAME, PASSWORD, external_IP)\r\n\r\n\r\nif __name__ == &quot;__main__&quot;:\r\n    main()\r\n<\/pre>\n<p class=\"Standard\" style=\"text-align: justify;\">Providing all necessary Azure SDK for Python modules were installed and referenced correctly and we&#8217;ve signed in using Azure CLI, we should see the following status output at the end of script execution.<\/p>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2020\/12\/Hash_Partitioned_Data_Sync_Azure_DB_Deployment.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4326\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2020\/12\/Hash_Partitioned_Data_Sync_Azure_DB_Deployment.png\" alt=\"\" width=\"580\" height=\"181\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2020\/12\/Hash_Partitioned_Data_Sync_Azure_DB_Deployment.png 690w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2020\/12\/Hash_Partitioned_Data_Sync_Azure_DB_Deployment-300x93.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">Next, let\u2019s generate some mock data. This is required to simulate the scenario I was referring to before i.e. a table with a large number of columns and variable-length text data which we will try to acquire as efficiently and quickly as possible. The following script creates a small stored procedure used to generate dummy data, Dummy_Table object and finally, it assigns ID column as a primary key on the newly populated table.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nfrom pathlib import PureWindowsPath\r\nimport pyodbc\r\n\r\nSQL_SERVER = 'sourceserver2020.database.windows.net'\r\nSQL_DB = 'sourcedb'\r\nUSERNAME = 'testusername'\r\nPASSWORD = 'MyV3ry$trongPa$$word'\r\nDRIVER = '{ODBC Driver 17 for SQL Server}'\r\n\r\n\r\nsql_file = PureWindowsPath(\r\n    '\/Path\/Azure_SQLDB_Deployment\/SQL\/create_wide_tbl.sql')\r\n\r\nwith open(sql_file, &quot;r&quot;) as f:\r\n    sqlFile = f.read()\r\n\r\nsql = sqlFile.rstrip('\\n')\r\n\r\ntry:\r\n    with pyodbc.connect('DRIVER='+DRIVER+';SERVER='+SQL_SERVER+';PORT=1433;DATABASE='+SQL_DB+';UID='+USERNAME+';PWD=' + PASSWORD) as conn:\r\n        with conn.cursor() as cursor:\r\n            cursor.execute('DROP PROCEDURE IF EXISTS usp_generate_dummy_data')\r\n            cursor.execute(sql)\r\n            cursor.execute('EXEC dbo.usp_generate_dummy_data')\r\n            cursor.execute('SELECT TOP (1) 1 FROM dbo.Dummy_Table')\r\n            rows = cursor.fetchone()\r\n            if rows:\r\n                print('All Good!')\r\n            else:\r\n                raise ValueError(\r\n                    'No data generated in the source table. Please troubleshoot!'\r\n                )\r\nexcept pyodbc.Error as ex:\r\n    sqlstate = ex.args&#x5B;1]\r\n    print(sqlstate)\r\n<\/pre>\n<p class=\"Standard\" style=\"text-align: justify;\">The stored procedure (code below) is configured to create 300 columns across 100000 rows of dense, text data. When finished (executed for approx. 20min with the default configuration of data volume and Azure resources specified in the script above), the very wide table schema as well as synthetically created data will look similar to the one in the image below (click on image to enlarge).<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nCREATE PROCEDURE usp_generate_dummy_data\r\nAS\r\nBEGIN\r\n    SET NOCOUNT ON;\r\n    DECLARE @wide INT = 100;\r\n    DECLARE @deep INT = 100000;\r\n    DECLARE @allchars VARCHAR(100) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ';\r\n    DECLARE @target_table_name VARCHAR(56) = 'Dummy_Table';\r\n\r\n    DROP TABLE IF EXISTS ##metadata;\r\n    CREATE TABLE ##metadata\r\n    (\r\n        ID INT IDENTITY(1, 1),\r\n        Column_Name VARCHAR(256),\r\n        Data_Type VARCHAR(56),\r\n        Column_Size VARCHAR(56)\r\n    );\r\n\r\n\r\n    DECLARE @count INT = 1;\r\n    WHILE @count &lt;= @wide\r\n    BEGIN\r\n        INSERT INTO ##metadata\r\n        (\r\n            Column_Name,\r\n            Data_Type,\r\n            Column_Size\r\n        )\r\n        SELECT 'Column_' + CAST(@count AS VARCHAR(56)),\r\n               'varchar',\r\n               CEILING(RAND() * 100);\r\n\r\n        SET @count = @count + 1;\r\n    END;\r\n\r\n    DECLARE @SQL VARCHAR(MAX);\r\n    SELECT @SQL\r\n        = 'DROP TABLE IF EXISTS ' + @target_table_name + '; CREATE TABLE ' + @target_table_name\r\n          + ' (Id INT IDENTITY(1,1),' + STRING_AGG(Column_Name + ' ' + Data_Type + '(' + Column_Size + ')', ',') + ')'\r\n    FROM ##metadata;\r\n    EXEC (@SQL);\r\n\r\n    SET @count = 1;\r\n    WHILE @count &lt;= @deep\r\n    BEGIN\r\n        DECLARE @vals VARCHAR(MAX);\r\n        SELECT @vals\r\n            = STRING_AGG(\r\n                            CAST(QUOTENAME(\r\n                                              SUBSTRING(\r\n                                                           RIGHT(LEFT(@allchars, ABS(BINARY_CHECKSUM(NEWID()) % 35) + 5), 5)\r\n                                                           + RIGHT(LEFT(@allchars, ABS(BINARY_CHECKSUM(NEWID()) % 35)\r\n                                                                                   + 5), 5)\r\n                                                           + RIGHT(LEFT(@allchars, ABS(BINARY_CHECKSUM(NEWID()) % 35)\r\n                                                                                   + 5), 5)\r\n                                                           + RIGHT(LEFT(@allchars, ABS(BINARY_CHECKSUM(NEWID()) % 35)\r\n                                                                                   + 5), 5)\r\n                                                           + RIGHT(LEFT(@allchars, ABS(BINARY_CHECKSUM(NEWID()) % 35)\r\n                                                                                   + 5), 5)\r\n                                                           + RIGHT(LEFT(@allchars, ABS(BINARY_CHECKSUM(NEWID()) % 35)\r\n                                                                                   + 5), 5)\r\n                                                           + RIGHT(LEFT(@allchars, ABS(BINARY_CHECKSUM(NEWID()) % 35)\r\n                                                                                   + 5), 5)\r\n                                                           + RIGHT(LEFT(@allchars, ABS(BINARY_CHECKSUM(NEWID()) % 35)\r\n                                                                                   + 5), 5)\r\n                                                           + RIGHT(LEFT(@allchars, ABS(BINARY_CHECKSUM(NEWID()) % 35)\r\n                                                                                   + 5), 5)\r\n                                                           + RIGHT(LEFT(@allchars, ABS(BINARY_CHECKSUM(NEWID()) % 35)\r\n                                                                                   + 5), 5),\r\n                                                           1,\r\n                                                           CAST(Column_Size AS INT)\r\n                                                       ),\r\n                                              ''''\r\n                                          ) AS NVARCHAR(MAX)),\r\n                            ','\r\n                        )\r\n        FROM ##metadata;\r\n        SELECT @SQL\r\n            = 'INSERT INTO ' + @target_table_name + '(' + STRING_AGG(Column_Name, ',') + ') SELECT ' + @vals + ''\r\n        FROM ##metadata;\r\n        EXEC (@SQL);\r\n        SET @count = @count + 1;\r\n    END;\r\n\r\n    ALTER TABLE dbo.Dummy_Table\r\n    ADD PRIMARY KEY (ID);\r\n\r\n    DROP TABLE IF EXISTS ##metadata;\r\nEND;\r\n<\/pre>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2020\/12\/Hash_Partitioned_Data_Sync_Dummy_Data.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4314\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2020\/12\/Hash_Partitioned_Data_Sync_Dummy_Data.png\" alt=\"\" width=\"580\" height=\"203\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2020\/12\/Hash_Partitioned_Data_Sync_Dummy_Data.png 2462w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2020\/12\/Hash_Partitioned_Data_Sync_Dummy_Data-300x105.png 300w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2020\/12\/Hash_Partitioned_Data_Sync_Dummy_Data-1024x359.png 1024w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2020\/12\/Hash_Partitioned_Data_Sync_Dummy_Data-768x269.png 768w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2020\/12\/Hash_Partitioned_Data_Sync_Dummy_Data-1536x538.png 1536w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2020\/12\/Hash_Partitioned_Data_Sync_Dummy_Data-2048x717.png 2048w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">Next, I will create a linked server connection from my on-premise SQL Server instance to the one I\u2019ve just provisioned in Azure and recreate the source Dummy_Table schema in the target database.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nUSE &#x5B;master]\r\nGO\r\nEXEC master.dbo.sp_addlinkedserver @server = N'AZURESOURCELINKEDSVR', @srvproduct=N'', @provider=N'SQLNCLI', @datasrc=N'sourceserver2020.database.windows.net', @catalog=N'sourcedb'\r\nGO\r\nEXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'AZURESOURCELINKEDSVR',@useself=N'False',@locallogin=NULL,@rmtuser=N'testusername',@rmtpassword='MyV3ry$trongPa$$word'\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'collation compatible', @optvalue=N'false'\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'data access', @optvalue=N'true'\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'dist', @optvalue=N'false'\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'pub', @optvalue=N'false'\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'rpc', @optvalue=N'true'\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'rpc out', @optvalue=N'true'\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'sub', @optvalue=N'false'\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'connect timeout', @optvalue=N'0'\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'collation name', @optvalue=NULL\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'lazy schema validation', @optvalue=N'false'\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'query timeout', @optvalue=N'0'\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'use remote collation', @optvalue=N'true'\r\nGO\r\nEXEC master.dbo.sp_serveroption @server=N'AZURESOURCELINKEDSVR', @optname=N'remote proc transaction promotion', @optvalue=N'false'\r\nGO\r\n<\/pre>\n<p class=\"Standard\" style=\"text-align: justify;\">Finally, in this example I will also be useing two small databases (EDW_Control and AdminDBA) which contain metadata information on the object(s) I will be processing and error logs for loads which fail to run correctly. The structure of EDW_Control database is pretty straightforward &#8211; it uses a collection of tables to track things like source and target schema names, record counts, table and indexes size etc. The important part is the fields called \u2018Is_Big\u2019 and \u2018ETL_Batch_No\u2019 which dictate whether the nominated object should be hash-partitioned and if so, into how many batches\/streams. I wrote more about how these tables are structured in one of my previous blog post <a href=\"http:\/\/bicortex.com\/speeding-up-sql-server-data-warehouse-architecure-with-automation-procedures-10-problem-solution-scenarios-to-jump-start-your-development\/\" target=\"_blank\" rel=\"noopener\">HERE<\/a>. Details on how to build and deploy AdminDBA database can be found in my previous posts <a href=\"http:\/\/bicortex.com\/how-to-create-a-simple-etlstored-procedure-error-capturing-database-schema-model-sql-code-and-implementation-part-1\/\" target=\"_blank\" rel=\"noopener\">HERE<\/a> and <a href=\"http:\/\/bicortex.com\/how-to-create-a-simple-etlstored-procedure-error-capturing-database-schema-model-sql-code-and-implementation-part-2\/\" target=\"_blank\" rel=\"noopener\">HERE<\/a>. For reference, the EDW_Control database schema and a snapshot of metadata I will be processing looks as per below.<\/p>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2020\/12\/Hash_Partitioned_Data_Sync_Metadata_Schema.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4309\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2020\/12\/Hash_Partitioned_Data_Sync_Metadata_Schema.png\" alt=\"\" width=\"580\" height=\"524\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2020\/12\/Hash_Partitioned_Data_Sync_Metadata_Schema.png 731w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2020\/12\/Hash_Partitioned_Data_Sync_Metadata_Schema-300x271.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<h3 style=\"text-align: center;\">Hash-Partitioned Data Acquisition<\/h3>\n<p class=\"Standard\" style=\"text-align: justify;\">Now that we\u2019re all set up let\u2019s explore the main code base behind the process responsible for all the heavy lifting. This acquisition method is based on two different stored procedures: one which handles metadata, processes coordination and jobs assignment and the second one which builds the actual SQL statements responsible for data insertion. You can download all the code from my OneDrive folder <a href=\"https:\/\/1drv.ms\/u\/s!AuEyKKgH71pxhN57_KkHtnRY1UOqdg?e=23m5w3\" target=\"_blank\" rel=\"noopener\">HERE<\/a> so I will only go over some of the more interesting aspects of these stored procedures and finally provide a quick demo depicting runtime behavior and a short summary of its performance.<\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">As mentioned, the magic sauce which allows for building reliable concurrent acquisition job is the hash-partitioning which converts single or composite primary key into a hash string, dynamically building self-contained stream of data. We can specify as many partitions as we want (no greater than the number of rows) which can be further scaled back based on the number of CPU cores available e.g. in the demo below I will be partitioning the source table into 8 partitions. We can also have the process determine the number of partitions (when run for multiple tables) based on the metadata collected e.g. row counts, data size etc. by assigning the value of 1 to \u2018@Concurrent_Batch_No_Override\u2019 parameter. The main functionality, however, is tied to the following bit of code which creates even \u2018chunks\u2019 of target data based on the sorted hash values.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nSET @SQL = 'DECLARE @R1 INT = (SELECT  MIN(id) from #Hash_Temp)'\t\t\t\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'DECLARE @R1_Hash VARCHAR(128) = (SELECT hash_key FROM #Hash_Temp WHERE id = @R1)'\t+CHAR(13);\r\nSET @SQL = @SQL + 'DECLARE @R2 BIGINT = (SELECT (MAX(id)-MIN(id)+1)'\t\t\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + '\/'+CAST(@etl_batch_no AS VARCHAR(10))+' as id FROM #Hash_Temp)'\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'DECLARE @R2_Hash VARCHAR (128) = (SELECT hash_key FROM #Hash_Temp WHERE id = @R2)'\t+CHAR(13);\r\nSET @SQL = @SQL + 'DECLARE @R3 BIGINT = (SELECT MAX(id) from #Hash_Temp)'\t\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'DECLARE @R3_Hash VARCHAR(128) = (SELECT hash_key FROM #Hash_Temp WHERE id = @R3)'\t+CHAR(13);\r\nSET @SQL = @SQL + 'INSERT INTO #Ids_Range'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + '(range_FROM, range_TO, hash_FROM, hash_TO) '\t\t\t\t\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'SELECT @R1, @R1+@R2, @R1_Hash,(SELECT hash_key FROM #Hash_Temp WHERE id =@R1+@R2)'\t+CHAR(13);\r\nSET @SQL = @SQL + 'DECLARE @z INT = 1'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'WHILE @z &lt;= '+CAST(@etl_batch_no AS VARCHAR(10))+'-1'\t\t\t\t\t\t\t\t   +CHAR(13);\r\nSET @SQL = @SQL + 'BEGIN'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'INSERT INTO #Ids_Range (Range_FROM, Range_TO, Hash_FROM, Hash_TO)'\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'SELECT LAG(Range_TO,0) OVER (ORDER BY id DESC)+1, '\t\t\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'CASE WHEN LAG(Range_TO,0) OVER (ORDER BY id DESC)+@R2+1 &gt;= @R3'\t\t\t\t\t   +CHAR(13);\r\nSET @SQL = @SQL + 'THEN @R3 ELSE LAG(Range_TO,0) OVER (ORDER BY id DESC)+@R2+1 END,'\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + '(SELECT hash_key FROM #Hash_Temp WHERE id =(SELECT LAG(Range_TO,0) '\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'OVER (ORDER BY id DESC)+1 FROM #Ids_Range WHERE @z = id)),'\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + '(SELECT Hash_key FROM #Hash_Temp WHERE id =(SELECT'\t\t\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'CASE WHEN LAG(Range_TO,0) OVER (ORDER BY id DESC)+@R2+1 &gt;= @R3'\t\t\t\t\t   +CHAR(13);\r\nSET @SQL = @SQL + 'THEN @R3 ELSE LAG(Range_TO,0) OVER (ORDER BY id DESC)+@R2+1 END'\t\t\t\t\t    +CHAR(13);\r\nSET @SQL = @SQL + 'FROM #Ids_Range WHERE @z = id))'\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'FROM #Ids_Range WHERE @z = id'\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'SET @z = @z+1'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13);\r\nSET @SQL = @SQL + 'END'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13);\r\nEXEC(@SQL)\r\n<\/pre>\n<p class=\"Standard\" style=\"text-align: justify;\">This code creates uniform &#8216;slices&#8217; of source table\u2019s data using metadata information stored in Hash_Temp temp table. This metadata is created using HASHBYTES() function with SHA1 algorithm allowing for efficient values indexation and sorting, which in turn allows for partition ranges creation. For wide objects with small number of rows i.e. less than 1 million this should work relatively speedy, however, for bigger tables you may want to move some of this logic into a memory-optimized table. Given that my source data set is quite small (100K rows), I did not encounter any performance bottlenecks with TempDB-stored tables in this set up. Additionally, this stored procedure acts as a coordinator\/dispatcher, ensuring that all conditions are validated, the number of concurrent streams does not exceed number of CPU cores, the runtime (checked every 5 seconds and capped at 200 lookups) does not go over this limit, in which case any outstanding jobs are terminated.<\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">Finally, a self-contained SQL Server agent job is instantiated, calling the second stored procedure which is responsible for data insertion based on the partition number and hash values boundaries.<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nDECLARE @sql_job\t\tNVARCHAR(MAX)\t=\t\t\r\n'USE &#x5B;targetdb]\r\nEXEC\t&#x5B;dbo].&#x5B;'+@Worker_Proc_Name+']\r\n@Source_Server_Name = '+@Source_Server_Name+',\r\n@Source_Server_DB_Name = '''''+@Source_Server_DB_Name+''''',\r\n@Source_Server_Schema_Name = '''''+@Source_Server_Schema_Name+''''',\r\n@Source_Server_Object_Name = '''''+@table+''''',\r\n@Target_Server_DB_Name = N'''''+@Target_Server_DB_Name+''''',\r\n@Target_Server_Schema_Name = '''''+@Target_Server_Schema_Name+''''',\r\n@Hash_SQL = N'''''+@hash_SQL+''''',\r\n@Hash_FROM = N'''''+@hash_FROM+''''',\r\n@Hash_TO = N'''''+@hash_TO+''''',\r\n@Is_Big = N'''''+CAST(@is_big AS CHAR(1))+''''',\r\n@Col_List_MSSQL = N'''''+@Col_List_MSSQL+''''',\r\n@Target_Server_Object_Name = '''''+@table+''''',\r\n@Exec_Instance_GUID\t='''''+CAST(@Exec_Instance_GUID AS VARCHAR(128))+''''',\r\n@Package_Name='''''+@Package_Name+''''' '\r\n\t\t\t\t\r\n\r\nSET @SQL =\t\t\t'IF EXISTS'\r\nSET @SQL = @SQL +\t'(SELECT TOP 1 1 FROM msdb..sysjobs_view job JOIN msdb.dbo.sysjobactivity activity '\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'ON job.job_id = activity.job_id WHERE job.name = N'''+@job+''''\t\t\t\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'AND job.date_created IS NOT NULL AND activity.stop_execution_date IS NULL)'\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'BEGIN'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'EXEC msdb..sp_stop_job @job_name=N'''+@job+''';'\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13)\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\r\nSET @SQL = @SQL +\t'EXEC msdb..sp_delete_job @job_name=N'''+@job+''', @delete_unused_schedule=1'\t\t\t\t\t\t\t+CHAR(13)\t\t\t\t\t\t\t\t\t\r\nSET @SQL = @SQL +\t'END'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'IF EXISTS'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'(SELECT TOP 1 1 FROM msdb..sysjobs_view job JOIN msdb.dbo.sysjobactivity activity'\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'ON job.job_id = activity.job_id WHERE job.name = N'''+@job+''''\t\t\t\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'AND job.date_created IS NULL AND activity.stop_execution_date IS NOT NULL)'\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'BEGIN'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'EXEC msdb..sp_delete_job @job_name=N'''+@job+''', @delete_unused_schedule=1'\t\t\t\t\t\t\t+CHAR(13)\t\t\t\t\t\t\t\t\t\r\nSET @SQL = @SQL +\t'END'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'IF EXISTS'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'(SELECT TOP 1 1 FROM msdb..sysjobs_view job WHERE job.name = N'''+@job+''')'\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'BEGIN'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'EXEC msdb..sp_delete_job @job_name=N'''+@job+''', @delete_unused_schedule=1'\t\t\t\t\t\t\t+CHAR(13)\t\t\t\t\t\t\t\t\t\r\nSET @SQL = @SQL +\t'END'\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'EXEC msdb..sp_add_job '''+@job+''', @owner_login_name= '''+@job_owner+''';'\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'EXEC msdb..sp_add_jobserver @job_name= '''+@job+''';'\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13)\t\t\t\r\nSET @SQL = @SQL +\t'EXEC msdb..sp_add_jobstep @job_name='''+@job+''', @step_name= ''Step1'', '\t\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'@command = '''+@sql_job+''', @database_name = '''+@Target_Server_DB_Name+''', @on_success_action = 3;'\t+CHAR(13)\t\t\t\t\t\t\r\nSET @SQL = @SQL +\t'EXEC msdb..sp_add_jobstep @job_name = '''+@job+''', @step_name= ''Step2'','\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +   '@command = '''+@delete_job_sql+''''\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t+CHAR(13)\r\nSET @SQL = @SQL +\t'EXEC msdb..sp_start_job @job_name= '''+@job+''', @output_flag = 0'\t\r\n<\/pre>\n<p class=\"Standard\" style=\"text-align: justify;\">The short footage below demonstrates how our target Dummy_Table data is loaded using this architecture.<\/p>\n<p><iframe loading=\"lazy\" width=\"580\" height=\"330\" src=\"https:\/\/www.youtube.com\/embed\/eSUxQamrGrg\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">It is also worth highlighting that this code can run acquisitions for tables which are not hash-partitioned, in which case it will still create multiple parallel streams, providing a significant boost in comparison to sequential loads. As such we can mix and match objects which need to be broken into multiple streams i.e. large tables, as well as stand-alone loads which can move data across in a single transaction. Another interesting feature is the ability to &#8216;throttle&#8217; the number of concurrent streams using the @Queue_Size parameter, which is set to the number of cores allocated to a SQL Server instance this process is executing on or a number we can specify. As such, even when we partition a large table into a number of streams which exceed that of the available cores, the queue size will always stay consistent and not outgrow the maximum number of concurrent streams allowed. This enables the process to &#8216;nibble&#8217; at the queue (queue size check set to 1 second interval), maximizing resources utilization and ensuring that we have an optimal number of jobs running concurrently.<\/p>\n<h3 style=\"text-align: center;\">Sample Data Testing Results<\/h3>\n<p class=\"Standard\" style=\"text-align: justify;\">Now that I\u2019ve demonstrated how this technique can increase the performance of moving data, let\u2019s look at some concrete numbers. I re-run the code for two small samples of data i.e. tables with 200 columns and 300 columns, across 100K rows of synthetic data and the results clearly demonstrated that even with the additional overhead of calculating hash values on primary keys, it is possible to achieve at least 4x performance boost with this architecture. Even on my old-ish E5-2670 v3 rig clocked at 2.3GHz with SAN storage attached we can observe a significant speed increase, with Azure SQL DB CPU not exceeding 15% utilization at any time. The rule of diminishing results will most likely kick in when a hash value for a very long composite primary key will need to be calculated. Likewise, I would expect this solution to regress in performance if long tables e.g. 10M+ rows are involved. To prevent this, it may be possible that with enough network throughput and a well optimized source table we can re-factor this code to transition the disk-based temporary hash table (as it&#8217;s implemented in this demo) to <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/relational-databases\/in-memory-oltp\/introduction-to-memory-optimized-tables?view=sql-server-ver15\" target=\"_blank\" rel=\"noopener\">memory-optimized table<\/a>, but I have not tried it myself.<\/p>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2020\/12\/Hash_Partitioned_Data_Sync_Performance_Comparison.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4336\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2020\/12\/Hash_Partitioned_Data_Sync_Performance_Comparison.png\" alt=\"\" width=\"580\" height=\"526\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2020\/12\/Hash_Partitioned_Data_Sync_Performance_Comparison.png 580w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2020\/12\/Hash_Partitioned_Data_Sync_Performance_Comparison-300x272.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">My testing machine had only 4 cores at its disposal so I had limited resources available, however, with 32 core machines becoming the norm these days, with the right application I would expect those numbers to be significantly better. As a matter of fact, transitioning to this method yielded 8-10x performance increase for few tables sourced from Dynamics365 schema using more powerful hardware on my current project.<\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">Naturally, the overall performance will depend heavily on the number of factors e.g. network and storage speed, CPU specs, schema structure, data types etc. but when possible (and applicable) one could improve data acquisition performance without resorting to bespoke frameworks, on a single machine and all using T-SQL\/SQL Server Agent functionality as the underlining technology.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Note: All artifacts used in this demo can be downloaded from my shared OneDrive folder HERE. Big part of Data Warehouse development has always been tied to structuring data acquisition pipelines before more rigid modelling takes place and data is wrangled (most likely) using the most the venerable methodologies today: Data Vault, Kimbal or [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[64,56,5,46],"tags":[65,62,79,81,18,24,41,49,19],"class_list":["post-4298","post","type-post","status-publish","format-standard","hentry","category-azure","category-programming","category-sql","category-sql-server","tag-azure","tag-cloud-computing","tag-data-warehouse","tag-etl","tag-microsoft","tag-programming","tag-python","tag-sql","tag-sql-server"],"aioseo_notices":[],"aioseo_head":"\n\t\t<!-- All in One SEO 4.9.8 - aioseo.com -->\n\t<meta name=\"description\" content=\"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your &#039;E&#039; in ELT\/ETL Using a Simple T-SQL Framework\" \/>\n\t<meta name=\"robots\" content=\"max-image-preview:large\" \/>\n\t<meta name=\"author\" content=\"admin\"\/>\n\t<link rel=\"canonical\" href=\"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/\" \/>\n\t<meta name=\"generator\" content=\"All in One SEO (AIOSEO) 4.9.8\" \/>\n\t\t<meta property=\"og:locale\" content=\"en_US\" \/>\n\t\t<meta property=\"og:site_name\" content=\"bicortex | Business Intelligence &amp; Analytics\" \/>\n\t\t<meta property=\"og:type\" content=\"article\" \/>\n\t\t<meta property=\"og:title\" content=\"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your \u2018E\u2019 in ELT\/ETL Using a Simple T-SQL Framework | bicortex\" \/>\n\t\t<meta property=\"og:description\" content=\"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your &#039;E&#039; in ELT\/ETL Using a Simple T-SQL Framework\" \/>\n\t\t<meta property=\"og:url\" content=\"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/\" \/>\n\t\t<meta property=\"article:published_time\" content=\"2021-05-31T23:15:57+00:00\" \/>\n\t\t<meta property=\"article:modified_time\" content=\"2021-09-20T04:56:09+00:00\" \/>\n\t\t<meta name=\"twitter:card\" content=\"summary\" \/>\n\t\t<meta name=\"twitter:title\" content=\"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your \u2018E\u2019 in ELT\/ETL Using a Simple T-SQL Framework | bicortex\" \/>\n\t\t<meta name=\"twitter:description\" content=\"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your &#039;E&#039; in ELT\/ETL Using a Simple T-SQL Framework\" \/>\n\t\t<script type=\"application\/ld+json\" class=\"aioseo-schema\">\n\t\t\t{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\\\/#article\",\"name\":\"SQL Server Hash-Partitioned Parallel Data Acquisition \\u2013 How to Accelerate Your \\u2018E\\u2019 in ELT\\\/ETL Using a Simple T-SQL Framework | bicortex\",\"headline\":\"SQL Server Hash-Partitioned Parallel Data Acquisition \\u2013 How to Accelerate Your &#8216;E&#8217; in ELT\\\/ETL Using a Simple T-SQL Framework\",\"author\":{\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/author\\\/admin\\\/#author\"},\"publisher\":{\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/#organization\"},\"image\":{\"@type\":\"ImageObject\",\"url\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/wp-content\\\/post_content\\\/\\\/2021\\\/06\\\/Hash_Partitioned_Data_Sync_Architecture.png\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\\\/#articleImage\",\"width\":896,\"height\":663},\"datePublished\":\"2021-06-01T09:15:57+10:00\",\"dateModified\":\"2021-09-20T14:56:09+10:00\",\"inLanguage\":\"en-US\",\"commentCount\":7,\"mainEntityOfPage\":{\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\\\/#webpage\"},\"isPartOf\":{\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\\\/#webpage\"},\"articleSection\":\"Azure, Programming, SQL, SQL Server, Azure, Cloud Computing, Data Warehouse, ETL, Microsoft, Programming, Python, SQL, SQL Server\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\\\/#breadcrumblist\",\"itemListElement\":[{\"@type\":\"ListItem\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex#listItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\\\/\\\/bicortex.com\\\/bicortex\",\"nextItem\":{\"@type\":\"ListItem\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/category\\\/sql\\\/#listItem\",\"name\":\"SQL\"}},{\"@type\":\"ListItem\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/category\\\/sql\\\/#listItem\",\"position\":2,\"name\":\"SQL\",\"item\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/category\\\/sql\\\/\",\"nextItem\":{\"@type\":\"ListItem\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\\\/#listItem\",\"name\":\"SQL Server Hash-Partitioned Parallel Data Acquisition \\u2013 How to Accelerate Your &#8216;E&#8217; in ELT\\\/ETL Using a Simple T-SQL Framework\"},\"previousItem\":{\"@type\":\"ListItem\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex#listItem\",\"name\":\"Home\"}},{\"@type\":\"ListItem\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\\\/#listItem\",\"position\":3,\"name\":\"SQL Server Hash-Partitioned Parallel Data Acquisition \\u2013 How to Accelerate Your &#8216;E&#8217; in ELT\\\/ETL Using a Simple T-SQL Framework\",\"previousItem\":{\"@type\":\"ListItem\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/category\\\/sql\\\/#listItem\",\"name\":\"SQL\"}}]},{\"@type\":\"Organization\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/#organization\",\"name\":\"bicortex\",\"description\":\"Business Intelligence & Analytics\",\"url\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/\"},{\"@type\":\"Person\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/author\\\/admin\\\/#author\",\"url\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/author\\\/admin\\\/\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\\\/#authorImage\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/3a576dc34dfe5a6c6f4be85d0bb81edc0ea83d4042b91e21337ef75c4e752c81?s=96&d=mm&r=g\",\"width\":96,\"height\":96,\"caption\":\"admin\"}},{\"@type\":\"WebPage\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\\\/#webpage\",\"url\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\\\/\",\"name\":\"SQL Server Hash-Partitioned Parallel Data Acquisition \\u2013 How to Accelerate Your \\u2018E\\u2019 in ELT\\\/ETL Using a Simple T-SQL Framework | bicortex\",\"description\":\"SQL Server Hash-Partitioned Parallel Data Acquisition \\u2013 How to Accelerate Your 'E' in ELT\\\/ETL Using a Simple T-SQL Framework\",\"inLanguage\":\"en-US\",\"isPartOf\":{\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/#website\"},\"breadcrumb\":{\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\\\/#breadcrumblist\"},\"author\":{\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/author\\\/admin\\\/#author\"},\"creator\":{\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/author\\\/admin\\\/#author\"},\"datePublished\":\"2021-06-01T09:15:57+10:00\",\"dateModified\":\"2021-09-20T14:56:09+10:00\"},{\"@type\":\"WebSite\",\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/#website\",\"url\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/\",\"name\":\"bicortex\",\"description\":\"Business Intelligence & Analytics\",\"inLanguage\":\"en-US\",\"publisher\":{\"@id\":\"http:\\\/\\\/bicortex.com\\\/bicortex\\\/#organization\"}}]}\n\t\t<\/script>\n\t\t<!-- All in One SEO -->\n\n","aioseo_head_json":{"title":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your \u2018E\u2019 in ELT\/ETL Using a Simple T-SQL Framework | bicortex","description":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your 'E' in ELT\/ETL Using a Simple T-SQL Framework","canonical_url":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/","robots":"max-image-preview:large","keywords":"","webmasterTools":{"miscellaneous":""},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/#article","name":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your \u2018E\u2019 in ELT\/ETL Using a Simple T-SQL Framework | bicortex","headline":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your &#8216;E&#8217; in ELT\/ETL Using a Simple T-SQL Framework","author":{"@id":"http:\/\/bicortex.com\/bicortex\/author\/admin\/#author"},"publisher":{"@id":"http:\/\/bicortex.com\/bicortex\/#organization"},"image":{"@type":"ImageObject","url":"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2021\/06\/Hash_Partitioned_Data_Sync_Architecture.png","@id":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/#articleImage","width":896,"height":663},"datePublished":"2021-06-01T09:15:57+10:00","dateModified":"2021-09-20T14:56:09+10:00","inLanguage":"en-US","commentCount":7,"mainEntityOfPage":{"@id":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/#webpage"},"isPartOf":{"@id":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/#webpage"},"articleSection":"Azure, Programming, SQL, SQL Server, Azure, Cloud Computing, Data Warehouse, ETL, Microsoft, Programming, Python, SQL, SQL Server"},{"@type":"BreadcrumbList","@id":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/#breadcrumblist","itemListElement":[{"@type":"ListItem","@id":"http:\/\/bicortex.com\/bicortex#listItem","position":1,"name":"Home","item":"http:\/\/bicortex.com\/bicortex","nextItem":{"@type":"ListItem","@id":"http:\/\/bicortex.com\/bicortex\/category\/sql\/#listItem","name":"SQL"}},{"@type":"ListItem","@id":"http:\/\/bicortex.com\/bicortex\/category\/sql\/#listItem","position":2,"name":"SQL","item":"http:\/\/bicortex.com\/bicortex\/category\/sql\/","nextItem":{"@type":"ListItem","@id":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/#listItem","name":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your &#8216;E&#8217; in ELT\/ETL Using a Simple T-SQL Framework"},"previousItem":{"@type":"ListItem","@id":"http:\/\/bicortex.com\/bicortex#listItem","name":"Home"}},{"@type":"ListItem","@id":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/#listItem","position":3,"name":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your &#8216;E&#8217; in ELT\/ETL Using a Simple T-SQL Framework","previousItem":{"@type":"ListItem","@id":"http:\/\/bicortex.com\/bicortex\/category\/sql\/#listItem","name":"SQL"}}]},{"@type":"Organization","@id":"http:\/\/bicortex.com\/bicortex\/#organization","name":"bicortex","description":"Business Intelligence & Analytics","url":"http:\/\/bicortex.com\/bicortex\/"},{"@type":"Person","@id":"http:\/\/bicortex.com\/bicortex\/author\/admin\/#author","url":"http:\/\/bicortex.com\/bicortex\/author\/admin\/","name":"admin","image":{"@type":"ImageObject","@id":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/#authorImage","url":"https:\/\/secure.gravatar.com\/avatar\/3a576dc34dfe5a6c6f4be85d0bb81edc0ea83d4042b91e21337ef75c4e752c81?s=96&d=mm&r=g","width":96,"height":96,"caption":"admin"}},{"@type":"WebPage","@id":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/#webpage","url":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/","name":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your \u2018E\u2019 in ELT\/ETL Using a Simple T-SQL Framework | bicortex","description":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your 'E' in ELT\/ETL Using a Simple T-SQL Framework","inLanguage":"en-US","isPartOf":{"@id":"http:\/\/bicortex.com\/bicortex\/#website"},"breadcrumb":{"@id":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/#breadcrumblist"},"author":{"@id":"http:\/\/bicortex.com\/bicortex\/author\/admin\/#author"},"creator":{"@id":"http:\/\/bicortex.com\/bicortex\/author\/admin\/#author"},"datePublished":"2021-06-01T09:15:57+10:00","dateModified":"2021-09-20T14:56:09+10:00"},{"@type":"WebSite","@id":"http:\/\/bicortex.com\/bicortex\/#website","url":"http:\/\/bicortex.com\/bicortex\/","name":"bicortex","description":"Business Intelligence & Analytics","inLanguage":"en-US","publisher":{"@id":"http:\/\/bicortex.com\/bicortex\/#organization"}}]},"og:locale":"en_US","og:site_name":"bicortex | Business Intelligence &amp; Analytics","og:type":"article","og:title":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your \u2018E\u2019 in ELT\/ETL Using a Simple T-SQL Framework | bicortex","og:description":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your 'E' in ELT\/ETL Using a Simple T-SQL Framework","og:url":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/","article:published_time":"2021-05-31T23:15:57+00:00","article:modified_time":"2021-09-20T04:56:09+00:00","twitter:card":"summary","twitter:title":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your \u2018E\u2019 in ELT\/ETL Using a Simple T-SQL Framework | bicortex","twitter:description":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your 'E' in ELT\/ETL Using a Simple T-SQL Framework"},"aioseo_meta_data":{"post_id":"4298","title":null,"description":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your 'E' in ELT\/ETL Using a Simple T-SQL Framework","keywords":[],"keyphrases":{"focus":[],"additional":[]},"primary_term":null,"canonical_url":null,"og_title":null,"og_description":null,"og_object_type":"default","og_image_type":"default","og_image_url":null,"og_image_width":null,"og_image_height":null,"og_image_custom_url":null,"og_image_custom_fields":null,"og_video":"","og_custom_url":null,"og_article_section":null,"og_article_tags":[],"twitter_use_og":false,"twitter_card":"default","twitter_image_type":"default","twitter_image_url":null,"twitter_image_custom_url":null,"twitter_image_custom_fields":null,"twitter_title":null,"twitter_description":null,"schema":{"blockGraphs":[],"customGraphs":[],"default":{"data":{"Article":[],"Course":[],"Dataset":[],"FAQPage":[],"Movie":[],"Person":[],"Product":[],"ProductReview":[],"Car":[],"Recipe":[],"Service":[],"SoftwareApplication":[],"WebPage":[]},"graphName":"","isEnabled":true},"graphs":[],"defaultGraph":"","defaultPostTypeGraph":""},"schema_type":"none","schema_type_options":"{\"article\":{\"articleType\":\"BlogPosting\"},\"course\":{\"name\":\"\",\"description\":\"\",\"provider\":\"\"},\"faq\":{\"pages\":[]},\"product\":{\"reviews\":[]},\"recipe\":{\"ingredients\":[],\"instructions\":[],\"keywords\":[]},\"software\":{\"reviews\":[],\"operatingSystems\":[]},\"webPage\":{\"webPageType\":\"WebPage\"},\"book\":[],\"event\":[],\"jobPosting\":[],\"music\":[],\"person\":[],\"restaurant\":[],\"service\":[],\"video\":[]}","pillar_content":false,"robots_default":true,"robots_noindex":false,"robots_noarchive":false,"robots_nosnippet":false,"robots_nofollow":false,"robots_noimageindex":false,"robots_noodp":false,"robots_notranslate":false,"robots_max_snippet":"-1","robots_max_videopreview":"-1","robots_max_imagepreview":"large","priority":null,"frequency":"default","location":null,"local_seo":null,"breadcrumb_settings":null,"limit_modified_date":false,"ai":null,"created":"2020-12-23 05:02:14","updated":"2025-07-01 08:23:22","seo_analyzer_scan_date":null},"aioseo_breadcrumb":"<div class=\"aioseo-breadcrumbs\"><span class=\"aioseo-breadcrumb\">\n\t\t\t<a href=\"http:\/\/bicortex.com\/bicortex\" title=\"Home\">Home<\/a>\n\t\t<\/span><span class=\"aioseo-breadcrumb-separator\">&raquo;<\/span><span class=\"aioseo-breadcrumb\">\n\t\t\t<a href=\"http:\/\/bicortex.com\/bicortex\/category\/sql\/\" title=\"SQL\">SQL<\/a>\n\t\t<\/span><span class=\"aioseo-breadcrumb-separator\">&raquo;<\/span><span class=\"aioseo-breadcrumb\">\n\t\t\tSQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your \u2018E\u2019 in ELT\/ETL Using a Simple T-SQL Framework\n\t\t<\/span><\/div>","aioseo_breadcrumb_json":[{"label":"Home","link":"http:\/\/bicortex.com\/bicortex"},{"label":"SQL","link":"http:\/\/bicortex.com\/bicortex\/category\/sql\/"},{"label":"SQL Server Hash-Partitioned Parallel Data Acquisition \u2013 How to Accelerate Your &#8216;E&#8217; in ELT\/ETL Using a Simple T-SQL Framework","link":"http:\/\/bicortex.com\/bicortex\/sql-server-hash-partitioned-parallel-data-acquisition-how-to-accelerate-your-e-in-elt-etl-using-a-simple-tsql-framework\/"}],"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/posts\/4298","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/comments?post=4298"}],"version-history":[{"count":47,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/posts\/4298\/revisions"}],"predecessor-version":[{"id":4484,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/posts\/4298\/revisions\/4484"}],"wp:attachment":[{"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/media?parent=4298"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/categories?post=4298"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/tags?post=4298"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}