{"id":3634,"date":"2018-09-21T02:36:00","date_gmt":"2018-09-21T02:36:00","guid":{"rendered":"http:\/\/bicortex.com\/?p=3634"},"modified":"2020-05-03T21:47:31","modified_gmt":"2020-05-03T11:47:31","slug":"kicking-the-tires-on-bigquery-googles-serverless-enterprise-data-warehouse-part-1","status":"publish","type":"post","link":"http:\/\/bicortex.com\/bicortex\/kicking-the-tires-on-bigquery-googles-serverless-enterprise-data-warehouse-part-1\/","title":{"rendered":"Kicking the tires on BigQuery &#8211; Google\u2019s Serverless Enterprise Data Warehouse (Part 1)"},"content":{"rendered":"<p class=\"Standard\" style=\"text-align: justify;\">Note: Part 2 can be found <a href=\"http:\/\/bicortex.com\/kicking-the-tires-on-bigquery-googles-serverless-enterprise-data-warehouse-part-2\" target=\"_blank\" rel=\"noopener noreferrer\">HERE<\/a>.<\/p>\n<h3 style=\"text-align: center;\">Introduction<\/h3>\n<p class=\"Standard\" style=\"text-align: justify;\">As traditional, on-premise hosted data warehouse solutions are increasingly becoming harder to scale and manage, a new breed of vendors and products is starting to emerge, one which can easily accommodate exponentially growing data volumes with little up-front cost and very little operational overhead. More and more do we get to see cash-strapped IT departments or startups being pushed by their VC overlords to provide immediate ROI, not being able to take time and build up their analytical capability following the old-fashion, waterfall trajectory. Instead, the famous Facebook mantra of &#8220;move fast and break things&#8221; is gradually becoming a standard to live up to, with cloud-first, pay-for-what-you-use, ops-free services making inroads or displacing\/replacing &#8216;old guard&#8217; technologies. Data warehousing domain has successfully avoided being dragged into the relentless push for optimisation and efficiency for a very long time, with many venerable vendors (IBM, Oracle, Teradata) resting on their laurels, sometimes for decades. However, with the advent of the cloud computing, new juggernauts emerged and Google\u2019s BigQuery is slowly becoming synonymous with petabyte-scale, ops-free, ultra-scalable and extremely fast data warehouse service. And while BigQuery has not (yet) become the go-to platform for large volumes of data storage and processing, big organisations such as New York Times and Spotify decided to adopt it, bucking the industry trend of selecting existing cloud data warehouse incumbents e.g. AWS Redshift or Microsoft SQL DW.<\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">BigQuery traces its roots back to 2010, when Google released beta access to a new SQL processing system based on its distributed query engine technology, called Dremel, which was described in an influential paper released the same year. BigQuery and Dremel share the same underlying architecture and by incorporating columnar storage and tree architecture of Dremel, BigQuery manages to offer unprecedented performance. But, BigQuery is much more than Dremel which serves as the execution engine for the BigQuery. In fact, BigQuery service leverages Google\u2019s innovative technologies like Borg &#8211; large-scale cluster management system, Colossus &#8211; Google\u2019s latest generation distributed file system, Capacitor \u2013 columnar data storage format, and Jupiter \u2013 Google\u2019s networking infrastructure. As illustrated below, a BigQuery client interact with Dremel engine via a client interface. Borg &#8211; Google\u2019s large-scale cluster management system &#8211; allocates the compute capacity for the Dremel jobs. Dremel jobs read data from Google\u2019s Colossus file systems using Jupiter network, perform various SQL operations and return results to the client.<\/p>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2018\/12\/bigquery_architecture_high_level_diagram.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3641\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2018\/12\/bigquery_architecture_high_level_diagram.png\" alt=\"\" width=\"580\" height=\"644\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2018\/12\/bigquery_architecture_high_level_diagram.png 580w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2018\/12\/bigquery_architecture_high_level_diagram-270x300.png 270w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">Having previously worked with a number of other vendors which have successfully occupied this realm for a very long time, it was refreshing to see how different BigQuery is and what separates it from the rest of the competition. But before I get into the general observations and the actual conclusion, let\u2019s look at how one can easily process, load and query large volumes of data (TPC-DS data in this example) using BigQuery and a few Python scripts for automation.<\/p>\n<h3 style=\"text-align: center;\">TPC-DS Schema Creation and Data Loading<\/h3>\n<p class=\"Standard\" style=\"text-align: justify;\">BigQuery offers a number of ways one can interact with it and take advantage of its functionality. The most basic one is through the GCP Web UI but for repetitive tasks users will find themselves mainly utilising Google Cloud SDK or via Google BigQuery API. For the purpose of this example, I have generated two TPC-DS datasets &#8211; 100GB and 300GB &#8211; and staged them on the local machine\u2019s folder. I have also created a TPC-DS schema JSON file which is used by the script to generate all dataset objects. All these resources, as well as the scripts used in this post can be found in my OneDrive folder <a href=\"https:\/\/1drv.ms\/f\/s!AuEyKKgH71pxg9laQE8_EeHVDGOrUw\" target=\"_blank\" rel=\"noopener noreferrer\">HERE<\/a>.<\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">The following config file parameters and Python code are used to mainly clean up the flat files (TPC-DS delimits line ending with a &#8220;|&#8221; character which needs to be removed), break them up into smaller chunks and upload them to Google Cloud storage service (it assumes that gs:\/\/tpc_ds_data_100GB and gs:\/\/tpc_ds_data_300GB buckets has already been created). The schema structure used for this project is a direct translation of the PostgreSQL-specific TPC-DS schema used in one of my earlier projects where data types and their NULL-ability have been converted and standardized to conform to BigQuery standards.<\/p>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2018\/12\/bigquery_config_file_parameters.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3643\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2018\/12\/bigquery_config_file_parameters.png\" alt=\"\" width=\"580\" height=\"145\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2018\/12\/bigquery_config_file_parameters.png 994w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2018\/12\/bigquery_config_file_parameters-300x75.png 300w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2018\/12\/bigquery_config_file_parameters-768x192.png 768w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nimport configparser\r\nfrom tqdm import tqdm\r\nfrom os import path, rename, listdir, remove, walk\r\nfrom shutil import copy2, move\r\nfrom google.cloud import storage\r\nfrom csv import reader, writer\r\n\r\n\r\nconfig = configparser.ConfigParser()\r\nconfig.read(&quot;params.cfg&quot;)\r\n\r\nbq_tpc_ds_schema_as_json = config.get(\r\n    &quot;Files_Path&quot;, path.normpath(&quot;bq_tpc_ds_schema_as_json&quot;))\r\nbq_local_source_files_path = config.get(\r\n    &quot;Files_Path&quot;, path.normpath(&quot;bq_local_source_files_path&quot;))\r\nbq_local_target_files_path = config.get(\r\n    &quot;Files_Path&quot;, path.normpath(&quot;bq_local_target_files_path&quot;))\r\nstorage_client = config.get(&quot;Big_Query&quot;, path.normpath(&quot;bq_client&quot;))\r\nbq_bucket_path = config.get(&quot;Big_Query&quot;, &quot;bq_bucket_path&quot;)\r\nbq_storage_client = storage.Client.from_service_account_json(storage_client)\r\n\r\n\r\nkeep_src_files = True\r\nkeep_headers = False\r\nkeep_src_csv_files = True\r\nrow_limit = 1000000\r\noutput_name_template = '_%s.csv'\r\ndelimiter = '|'\r\n\r\n\r\ndef RemoveFiles(DirPath):\r\n    filelist = &#x5B;f for f in listdir(DirPath)]\r\n    for f in filelist:\r\n        remove(path.join(DirPath, f))\r\n\r\n\r\ndef SourceFilesRename(bq_local_source_files_path):\r\n    for file in listdir(bq_local_source_files_path):\r\n        if file.endswith(&quot;.dat&quot;):\r\n            rename(path.join(bq_local_source_files_path, file),\r\n                   path.join(bq_local_source_files_path, file&#x5B;:-4]+'.csv'))\r\n\r\n\r\ndef SplitLargeFiles(file, row_count, bq_local_target_files_path, output_name_template, keep_headers=False):\r\n    &quot;&quot;&quot;\r\n    Split flat files into smaller chunks based on the comparison between row count \r\n    and 'row_limit' variable value. This creates file_row_count\/row_limit number of files which can \r\n    facilitate upload speed if parallelized. \r\n    &quot;&quot;&quot;\r\n    file_handler = open(path.join(bq_local_target_files_path, file), 'r')\r\n    csv_reader = reader(file_handler, delimiter=delimiter)\r\n    current_piece = 1\r\n    current_out_path = path.join(bq_local_target_files_path, path.splitext(file)&#x5B;\r\n                                 0]+output_name_template % current_piece)\r\n    current_out_writer = writer(\r\n        open(current_out_path, 'w', newline=''), delimiter=delimiter)\r\n    current_limit = row_limit\r\n    if keep_headers:\r\n        headers = next(csv_reader)\r\n        current_out_writer.writerow(headers)\r\n    pbar = tqdm(total=row_count)\r\n    for i, row in enumerate(csv_reader):\r\n        pbar.update()\r\n        if i + 1 &gt; current_limit:\r\n            current_piece += 1\r\n            current_limit = row_limit * current_piece\r\n            current_out_path = path.join(bq_local_target_files_path, path.splitext(file)&#x5B;\r\n                                         0]+output_name_template % current_piece)\r\n            current_out_writer = writer(\r\n                open(current_out_path, 'w', newline=''), delimiter=delimiter)\r\n            if keep_headers:\r\n                current_out_writer.writerow(headers)\r\n        current_out_writer.writerow(row)\r\n    pbar.close()\r\n    print(&quot;\\n&quot;)\r\n\r\n\r\ndef ProcessLargeFiles(bq_local_source_files_path, bq_local_target_files_path, output_name_template, row_limit):\r\n    &quot;&quot;&quot;\r\n    Remove trailing '|' characters from the files generated by the TPC-DS utility\r\n    as they intefere with BQ load function. As BQ does not allow for denoting end-of-line \r\n    characters in the flat file, this breaks data load functionality. This function also calls\r\n    the 'SplitLargeFiles' function resulting in splitting files with the row count &gt; 'row_limit'\r\n    variable value into smaller chunks\/files.\r\n    &quot;&quot;&quot;\r\n    RemoveFiles(bq_local_target_files_path)\r\n    fileRowCounts = &#x5B;]\r\n    for file in listdir(bq_local_source_files_path):\r\n        if file.endswith(&quot;.csv&quot;):\r\n            fileRowCounts.append(&#x5B;file, sum(1 for line in open(\r\n                path.join(bq_local_source_files_path, file), newline=''))])\r\n    for file in fileRowCounts:\r\n        if file&#x5B;1] &gt; row_limit:\r\n            print(\r\n                &quot;Removing trailing characters in {f} file...&quot;.format(f=file&#x5B;0]))\r\n            RemoveTrailingChar(row_limit, path.join(\r\n                bq_local_source_files_path, file&#x5B;0]), path.join(bq_local_target_files_path, file&#x5B;0]))\r\n            print(&quot;\\nSplitting file:&quot;, file&#x5B;0],)\r\n            print(&quot;Measured row count:&quot;, file&#x5B;1])\r\n            print(&quot;Progress...&quot;)\r\n            SplitLargeFiles(\r\n                file&#x5B;0], file&#x5B;1], bq_local_target_files_path, output_name_template)\r\n            remove(path.join(bq_local_target_files_path, file&#x5B;0]))\r\n        else:\r\n            print(\r\n                &quot;Removing trailing characters in {f} file...&quot;.format(f=file&#x5B;0]))\r\n            RemoveTrailingChar(row_limit, path.join(\r\n                bq_local_source_files_path, file&#x5B;0]), path.join(bq_local_target_files_path, file&#x5B;0]))\r\n    RemoveFiles(bq_local_source_files_path)\r\n\r\n\r\ndef RemoveTrailingChar(row_limit, bq_local_source_files_path, bq_local_target_files_path):\r\n    &quot;&quot;&quot;\r\n    Remove trailing '|' characters from files' line ending. \r\n    &quot;&quot;&quot;\r\n    line_numer = row_limit\r\n    lines = &#x5B;]\r\n    with open(bq_local_source_files_path, &quot;r&quot;) as r, open(bq_local_target_files_path, &quot;w&quot;) as w:\r\n        for line in r:\r\n            if line.endswith('|\\n'):\r\n                lines.append(line&#x5B;:-2]+&quot;\\n&quot;)\r\n                if len(lines) == line_numer:\r\n                    w.writelines(lines)\r\n                    lines = &#x5B;]\r\n        w.writelines(lines)\r\n\r\n\r\ndef GcsUploadBlob(bq_storage_client, bq_local_target_files_path, bq_bucket_path):\r\n    &quot;&quot;&quot;\r\n    Upload files into the nominated GCP bucket if it does not exists. \r\n    &quot;&quot;&quot;\r\n    bucket_name = (bq_bucket_path.split(&quot;\/\/&quot;))&#x5B;1].split(&quot;\/&quot;)&#x5B;0]\r\n    CHUNK_SIZE = 10485760\r\n    bucket = bq_storage_client.get_bucket(bucket_name)\r\n    print(&quot;\\nCommencing files upload...&quot;)\r\n    for file in listdir(bq_local_target_files_path):\r\n        try:\r\n            blob = bucket.blob(file, chunk_size=CHUNK_SIZE)\r\n            file_exist = storage.Blob(bucket=bucket, name=file).exists(\r\n                bq_storage_client)\r\n            if not file_exist:\r\n                print(&quot;Uploading file {fname} into {bname} GCP bucket...&quot;.format(\r\n                    fname=file, bname=bucket_name))\r\n                blob.upload_from_filename(\r\n                    path.join(bq_local_target_files_path, file))\r\n            else:\r\n                print(&quot;Nominated file {fname} already exists in {bname} bucket. Moving on...&quot;.format(\r\n                    fname=file, bname=bucket_name))\r\n        except Exception as e:\r\n            print(e)\r\n\r\n\r\nif __name__ == &quot;__main__&quot;:\r\n    SourceFilesRename(bq_local_source_files_path)\r\n    ProcessLargeFiles(bq_local_source_files_path,\r\n                      bq_local_target_files_path, output_name_template, row_limit)\r\n    GcsUploadBlob(bq_storage_client,\r\n                  bq_local_target_files_path, bq_bucket_path)\r\n<\/pre>\n<p class=\"Standard\" style=\"text-align: justify;\">Once all the TPC-DS data has been moved across into GCP storage bucket, we can proceed with creating BigQuery dataset (synonymous with the schema name in a typical Microsoft SQL Server or PostgreSQL RDBMS hierarchy) and its corresponding tables. Dataset is associated with a project which forms the overarching container, somewhat equivalent to the database if you come from a traditional RDBMS experience. It is also worth pointing out that dataset names must be unique per project, all tables referenced in a query must be stored in datasets in the same location and finally, geographic location can be set at creation time only. The following is an excerpt from the JSON file used to create the dataset schema which can be downloaded from my OneDrive folder <a href=\"https:\/\/1drv.ms\/f\/s!AuEyKKgH71pxg9laQE8_EeHVDGOrUw\" target=\"_blank\" rel=\"noopener noreferrer\">HERE<\/a>.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n{\r\n    &quot;table_schema&quot;: &#x5B;\r\n        {\r\n            &quot;table_name&quot;: &quot;dbgen_version&quot;,\r\n            &quot;fields&quot;: &#x5B;\r\n                {\r\n                    &quot;name&quot;: &quot;dv_version&quot;,\r\n                    &quot;type&quot;: &quot;STRING&quot;,\r\n                    &quot;mode&quot;: &quot;NULLABLE&quot;\r\n                },\r\n                {\r\n                    &quot;name&quot;: &quot;dv_create_date&quot;,\r\n                    &quot;type&quot;: &quot;DATE&quot;,\r\n                    &quot;mode&quot;: &quot;NULLABLE&quot;\r\n                },\r\n                {\r\n                    &quot;name&quot;: &quot;dv_create_time&quot;,\r\n                    &quot;type&quot;: &quot;TIME&quot;,\r\n                    &quot;mode&quot;: &quot;NULLABLE&quot;\r\n                },\r\n                {\r\n                    &quot;name&quot;: &quot;dv_cmdline_args&quot;,\r\n                    &quot;type&quot;: &quot;STRING&quot;,\r\n                    &quot;mode&quot;: &quot;NULLABLE&quot;\r\n                }\r\n            ]\r\n        }\r\n    ]\r\n}\r\n<\/pre>\n<p class=\"Standard\" style=\"text-align: justify;\">Now that the dataset is in place, we can create the required tables and populate them with the flat files data which was moved across into Google cloud storage using the previous code snippet. The following Python script creates tpc_ds_test_data dataset, creates all tables based on the TPC-DS schema stored in the JSON file and finally populates them with text files data.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nfrom google.cloud import bigquery\r\nfrom google.cloud.exceptions import NotFound\r\nfrom google.cloud import storage\r\nfrom os import path\r\nimport configparser\r\nimport json\r\nimport re\r\n\r\n\r\nconfig = configparser.ConfigParser()\r\nconfig.read(&quot;params.cfg&quot;)\r\n\r\nbq_tpc_ds_schema_as_json = config.get(\r\n    &quot;Files_Path&quot;, path.normpath(&quot;bq_tpc_ds_schema_as_json&quot;))\r\nbq_client = config.get(&quot;Big_Query&quot;, path.normpath(&quot;bq_client&quot;))\r\nbq_client = bigquery.Client.from_service_account_json(bq_client)\r\nstorage_client = config.get(&quot;Big_Query&quot;, path.normpath(&quot;bq_client&quot;))\r\nbq_bucket_path = config.get(&quot;Big_Query&quot;, &quot;bq_bucket_path&quot;)\r\nbq_storage_client = storage.Client.from_service_account_json(storage_client)\r\nbq_ref_dataset = config.get(&quot;Big_Query&quot;, &quot;bq_ref_dataset&quot;)\r\n\r\n\r\ndef dataset_bq_exists(bq_client, bq_ref_dataset):\r\n    try:\r\n        bq_client.get_dataset(bq_ref_dataset)\r\n        return True\r\n    except NotFound as e:\r\n        return False\r\n\r\n\r\ndef bq_create_dataset(bq_client, bq_ref_dataset):\r\n    &quot;&quot;&quot;\r\n    Create BigQuery dataset if it does not exists in the\r\n    Australian location\r\n    &quot;&quot;&quot;\r\n    if not dataset_bq_exists(bq_client, bq_ref_dataset):\r\n        print(&quot;Creating {d} dataset...&quot;.format(d=bq_ref_dataset))\r\n        dataset = bq_client.dataset(bq_ref_dataset)\r\n        dataset = bigquery.Dataset(dataset)\r\n        dataset.location = 'australia-southeast1'\r\n        dataset = bq_client.create_dataset(dataset)\r\n    else:\r\n        print(&quot;Nominated dataset already exists. Moving on...&quot;)\r\n\r\n\r\ndef table_bq_exists(bq_client, ref_table):\r\n    try:\r\n        bq_client.get_table(ref_table)\r\n        return True\r\n    except NotFound as e:\r\n        return False\r\n\r\n\r\ndef bq_drop_create_table(bq_client, bq_ref_dataset, bq_tpc_ds_schema_as_json):\r\n    &quot;&quot;&quot;\r\n    Drop (if exists) and create schema tables on the nominated BigQuery dataset.\r\n    Schema details are stored inside 'tpc_ds_schema.json' file.\r\n    &quot;&quot;&quot;\r\n    dataset = bq_client.dataset(bq_ref_dataset)\r\n    with open(bq_tpc_ds_schema_as_json) as schema_file:\r\n        data = json.load(schema_file)\r\n        for t in data&#x5B;'table_schema']:\r\n            table_name = t&#x5B;'table_name']\r\n            ref_table = dataset.table(t&#x5B;'table_name'])\r\n            table = bigquery.Table(ref_table)\r\n            if table_bq_exists(bq_client, ref_table):\r\n                print(&quot;Table '{tname}' already exists in the '{dname}' dataset. Dropping table '{tname}'...&quot;.format(\r\n                    tname=table_name, dname=bq_ref_dataset))\r\n                bq_client.delete_table(table)\r\n            for f in t&#x5B;'fields']:\r\n                table.schema += (\r\n                    bigquery.SchemaField(f&#x5B;'name'], f&#x5B;'type'], mode=f&#x5B;'mode']),)\r\n            print(&quot;Creating table {tname} in the '{dname}' dataset...&quot;.format(\r\n                tname=table_name, dname=bq_ref_dataset))\r\n            table = bq_client.create_table(table)\r\n\r\n\r\ndef bq_load_data_from_file(bq_client, bq_ref_dataset,  bq_bucket_path, bq_storage_client):\r\n    &quot;&quot;&quot;\r\n    Load data stored on the nominated GCP bucket into dataset tables\r\n    &quot;&quot;&quot;\r\n    bucket_name = (bq_bucket_path.split(&quot;\/\/&quot;))&#x5B;1].split(&quot;\/&quot;)&#x5B;0]\r\n    bucket = bq_storage_client.get_bucket(bucket_name)\r\n    blobs = bucket.list_blobs()\r\n    dataset = bq_client.dataset(bq_ref_dataset)\r\n    for blob in blobs:\r\n        file_name = blob.name\r\n        rem_char = re.findall(&quot;_\\d+&quot;, file_name)\r\n        if len(rem_char) == 0:\r\n            ref_table = dataset.table(file_name&#x5B;:file_name.index(&quot;.&quot;)])\r\n            table_name = file_name&#x5B;:file_name.index(&quot;.&quot;)]\r\n        else:\r\n            rem_char = str(&quot;&quot;.join(rem_char))\r\n            ref_table = dataset.table(file_name&#x5B;:file_name.index(rem_char)])\r\n            table_name = file_name&#x5B;:file_name.index(rem_char)]\r\n        print(&quot;Loading file {f} into {t} table...&quot;.format(\r\n            f=file_name, t=table_name))\r\n        job_config = bigquery.LoadJobConfig()\r\n        job_config.source_format = &quot;CSV&quot;\r\n        job_config.skip_leading_rows = 0\r\n        job_config.field_delimiter = &quot;|&quot;\r\n        job_config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND\r\n        job = bq_client.load_table_from_uri(\r\n            path.join(bq_bucket_path, file_name), ref_table, job_config=job_config)\r\n        result = job.result()\r\n        print(result.state)\r\n\r\n\r\nif __name__ == &quot;__main__&quot;:\r\n    bq_create_dataset(bq_client, bq_ref_dataset)\r\n    bq_drop_create_table(bq_client, bq_ref_dataset, bq_tpc_ds_schema_as_json)\r\n    bq_load_data_from_file(bq_client, bq_ref_dataset,\r\n                           bq_bucket_path, bq_storage_client)\r\n<\/pre>\n<p class=\"Standard\" style=\"text-align: justify;\">BigQuery offers some special tables whose contents represent metadata, such as the list of tables and views in a dataset. To confirm all objects have been created in the correct dataset, we can interrogate the __TABLES__ or __TABLES_SUMMARY__ meta-tables as per the query below.<\/p>\n<p><a href=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2018\/12\/bigquery_project_meta_view.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3666\" src=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/\/2018\/12\/bigquery_project_meta_view.png\" alt=\"\" width=\"580\" height=\"527\" srcset=\"http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2018\/12\/bigquery_project_meta_view.png 1270w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2018\/12\/bigquery_project_meta_view-300x273.png 300w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2018\/12\/bigquery_project_meta_view-768x698.png 768w, http:\/\/bicortex.com\/bicortex\/wp-content\/post_content\/2018\/12\/bigquery_project_meta_view-1024x930.png 1024w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/a><\/p>\n<p class=\"Standard\" style=\"text-align: justify;\">In the <a href=\"http:\/\/bicortex.com\/kicking-the-tires-on-bigquery-googles-serverless-enterprise-data-warehouse-part-2\" target=\"_blank\" rel=\"noopener noreferrer\">next post<\/a> I will go over some of TPC-DS queries execution performance on BigQuery, BigQuery ML &#8211; Google&#8217;s take on in-database machine learning as well as some basic interactive dashboard building in Tableau.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Note: Part 2 can be found HERE. Introduction As traditional, on-premise hosted data warehouse solutions are increasingly becoming harder to scale and manage, a new breed of vendors and products is starting to emerge, one which can easily accommodate exponentially growing data volumes with little up-front cost and very little operational overhead. More and more [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[61,56],"tags":[76,62,79,75,73,24,41],"class_list":["post-3634","post","type-post","status-publish","format-standard","hentry","category-cloud-computing","category-programming","tag-bigquery","tag-cloud-computing","tag-data-warehouse","tag-gcp","tag-mpp-rdbms","tag-programming","tag-python"],"aioseo_notices":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/posts\/3634","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/comments?post=3634"}],"version-history":[{"count":34,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/posts\/3634\/revisions"}],"predecessor-version":[{"id":4121,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/posts\/3634\/revisions\/4121"}],"wp:attachment":[{"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/media?parent=3634"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/categories?post=3634"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/bicortex.com\/bicortex\/wp-json\/wp\/v2\/tags?post=3634"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}