How To Build A Data Mart Using Microsoft BI Stack Part 7 – Cube Creation, Cube Deployment and Data Validation

September 16th, 2013 / No Comments » / by admin

In the PREVIOUS POST I showed you how to refine our SSAS project and went over the concepts of dimension properties and hierarchies before cube creation and deployment process can be initiated.  In this post I would like to focus on the concepts of SSAS solution deployment and cube creation as well as go through some basic validation techniques to ensure that our data mart has correct data in it before it can be reported out of.

Creating The SSAS Cube

Let’s continue on with the sample SSAS project which we polished up last time. All files relevant to further develop, deploy and test this solution as well as all the code used in previous posts to this series can be found HERE. In order to browse the dimensions we must first deploy them to our SSAS database. Visual Studio uses XML files for data sources, data source views, dimensions, and cubes, and there is a hidden master file that contains all of this code. The act of building of an SSAS project checks the validity of these files and combines them into this master file. The quickest way to deploy is to it from Visual Studio environment itself, however, we must first specify the SSAS instance we would like our database to be created on. To do this, right-click on root level of the project in Solution Explorer and select Properties as per image below (Alt + Enter should also work).

Project_properties_SSAS_HTDM

Next, select Deployment option from the Configuration Properties pane and populate the Target Server and Target Database as per below.

Deployment_server_properties_SSAS_HTDM

To process individual dimension we have meticulously created and deploy the data onto our SSAS server we can right-click on each dimension individually and select Process from the options provided. This should present us with another dialog window where providing we have done correct modelling decisions we can select Run to process our dimension. In case there are errors and you cannot deploy the data onto the specified SSAS server, go through the log to establish the cause for the issues which in most cases would pertain to the connection credentials or target deployment properties.

Finally, let’s look into how to create an SSAS cube. In order to start the cube wizard, right-click on the Cubes folder under solution explorer and select New Cube – this should start the wizard which will guide us through this process and is similar to the steps taken in order to create individual dimensions. After the welcome pane is displayed, click Next and on Select Creation Method click on Use Existing Tables from the options provided. Next step prompts us to select tables used for measure groups. This window allows you to choose which fact tables will be included in the cube. In our example, we have two fact tables: one that contains the measures and one that is used as a bridge between the Authors many-to-many dimension – let’s select them both and click next to continue with the wizard.  In the next step, the wizard suggests columns and measures to use by comparing the columns’ data types. For example, Author Order is recommended as a measure because it is a fact table and it has a numeric value. This simple criteria is how the wizard decides what should and should not be included. You will also notice that Fact Sales Count and Fact Titles Authors Count have been added. The wizard suggests these additional measures to provide row counts for Fact Sales and Fact Titles Authors for your convenience, in case you need totals for each. The wizard labels these as Count measures for you, but you can change the names if you prefer. Following on, we select the four previously created dimensions and in the next step give our cube a meaningful name which should conclude the wizard. On completion Visual Studio displays cube designer window as per image below.

Cube_schema_SSAS_HTDM

All there is left to do is to deploy the cube onto our server, configure it and test the data for any inconsistencies. The quickest way to deploy this is to right-click on the cube and select Process from the context menu provided. Once the process is completed, the following window should notify you of the process success/failure and the steps taken during cube processing and deployment.

Processing_progress_cube_SSAS_HTDM

There are other steps involved in further cube configuration which depends on the project requirements e.g. setting up translations, creating partitions, defining perspectives, adding custom calculations etc. however, given the intended purpose of this series I will not expand on those here. However, just like dimension, the cube also requires certain level of configuration input and validation so le’s run through those here.

Most of the cube configuration is done through adjusting the settings on the cube designer tabs. Each of those tabs is responsible for different cube properties. The image below shows the tabs selection for cube designer.

Cube_design_toolbar_SSAS_HTDM

As far as configuring the properties of each measure and dimension which constitutes the cube content, these are usually accessed through the properties window on the Cube Structure tab. If you can’t see in it should be as easy as right-clicking on individual fact or dimension and selecting Properties from the context menu as per below.

Cube_attributes_properties_pane_SSAS_HTDM

To save space and leave the most procedural tasks out of this post I have put together a document which can be downloaded from HERE. It outlines the changes necessary to be made in order to configure the cube and its properties. Alternatively, you can view and download all the necessary files for this and other posts for this series from HERE.

Database vs. Cube Data Validation

Finally, let’s superficially validate the figures in the cube versus data warehouse objects to find out if what we have put together checks out and is a valid representation of our measures and dimensions. There are many very elaborate methods of validating the data between SSAS and database objects and in this post I will cover two ways of testing to highlight the potential discrepancies. The easiest way to compare the data between what’s in the cube and on the table level is to navigate to the Browser tab from the cube designer window (last tab) and drag desired measures and dimensions onto the query pane. This process is very much like building a pivot table in Excel and once the objects data is displayed and aggregated we can conveniently export the results into Excel straight from the SSAS project (version 2012 and up only) with a click of a button as per below.

Compare_cube_and_db_code_2_HTDM

In the above image I dragged SalesQuantity and NumberOfSales measures as well as Date dimension onto the query pane and SSAS automatically aggregated the data for me. In order to validate that these figures are, in fact, what we store in the database let’s write a quick SQL and compare the output from both queries (even though the query can be composed by dragging and dropping measures/dimensions onto the query pane SSAS writes MDX code in the background).

USE DW_Sample
SELECT
d.datename as Date,
SUM([SalesQuantity]) as SalesQuantity,
COUNT(*) as NumberOfSales
FROM [DW_Sample].[dbo].[FactSales] s join DimDates d ON s.orderdatekey = d.DateKey
GROUP BY d.datename

When executed, the figures from the database which feeds out SSAS project are identical to those stored in the cube as per image below.

Compare_cube_and_db_code_1_HTDM

This method works well for small data sets but isn’t very effective with large volumes of data. Another, slightly more elaborate but also much more accurate and effective way is to create a linked server to our OLAP database and execute OPENQUERY SQL statement to extract the dimensional data in a tabular fashion for further comparison. First, let’s create a linked server executing the following SQL.

EXEC master.dbo.sp_addlinkedserver
@server = N'SQL2012_MultiDim',
@srvproduct = '',
@provider = N'MSOLAP',
@datasrc = N'Sherlock\sql2012_MultiDim',
@catalog = N'DW_Sample_SSAS'

As you remember, when visually building a result set in a Browser tab in Visual Studio, SSAS also creates MDX code in the background. To access we simply click on the Design Mode button to change the interface from visual to code. If you had the desired result set already created visually, the MDX code should look similar to the one below.

Query_MDX_code_SSAS_HTDM

Let’s copy the code and wrapping it around with an OPENQUERY SQL statement let’s bring back the results in Microsoft SQL Server Management Studio as per below.

SELECT *
FROM OPENQUERY
(SQL2012_MultiDim,
'SELECT NON EMPTY { [Measures].[SalesQuantity], [Measures].[NumberOfSales] } ON COLUMNS,
NON EMPTY { ([DimDates].[Date].[Date].ALLMEMBERS ) } DIMENSION PROPERTIES MEMBER_CAPTION,
MEMBER_UNIQUE_NAME ON ROWS FROM [DW_Sample_Cube]
CELL PROPERTIES VALUE, BACK_COLOR, FORE_COLOR,
FORMATTED_VALUE, FORMAT_STRING, FONT_NAME, FONT_SIZE, FONT_FLAGS')

Finally, after a little clean-up we can go ahead and compare the results between the cube and database using, for example, EXCEPT SQL statement as per the code snippet below.

SELECT
CAST("[DimDates].[Date].[Date].[Member_Caption]" AS nvarchar (50)) AS DateName,
CAST("[Measures].[SalesQuantity]" AS int) AS SalesQuantity,
CAST("[Measures].[NumberOfSales]" AS int) AS NumberOfSales
FROM OPENQUERY
(SQL2012_MultiDim,
'SELECT NON EMPTY { [Measures].[SalesQuantity], [Measures].[NumberOfSales] } ON COLUMNS,
NON EMPTY { ([DimDates].[Date].[Date].ALLMEMBERS ) } DIMENSION PROPERTIES MEMBER_CAPTION,
MEMBER_UNIQUE_NAME ON ROWS FROM [DW_Sample_Cube]
')

EXCEPT

SELECT
d.datename As DateName,
SUM([SalesQuantity]) as SalesQuantity,
COUNT(*) as SalesCount
FROM [DW_Sample].[dbo].[FactSales] s
join [DW_Sample].[dbo].[DimDates] d ON s.orderdatekey = d.DateKey
GROUP BY d.datename

If, after executing, the result brings back no values, this indicates data overlap and is symptomatic of that fact that there are no differences between the cube and the database. Similarly, you can try using INTERSECT SQL command in place if EXCEPT which should provide the opposite results i.e. only return the rows with identical data.

This concludes this post. As usual, the code used in this series as well as solution files for this SSAS project can be found and downloaded from HERE. In the NEXT ITERATION to the series I will be building a simple SSRS (SQL Server Reporting Services) report based on our data mart.

Please also check other posts from this series:

All SQL code and solution files can be found and downloaded from HERE.

Tags: ,

How To Build A Data Mart Using Microsoft BI Stack Part 6 – Refining SSAS Data Warehouse Dimensions

September 16th, 2013 / No Comments » / by admin

In the PREVIOUS POST I started exploring the concept of using SQL Server Analysis Services as a storage engine for our dimensional data and outlined how to start building a simple SSAS project with Data Source View and SASS dimensions defined. In this post I will continue to refine our SSAS project and go over the concepts of the dimension properties and hierarchies that still require attention before cube creation and deployment process.

Now that we have our dimensions defined, we will continue with further configuration steps. First, a little bit of theory. Each dimension attribute has a set of properties defined against it which can be adjusted in the Properties pane (bottom-right corner).

Attribute_properties_SSAS_HTDM

The most commonly adjusted attributes are:

  • KeyColumns – A collection of one or more columns uniquely identifying one row from another in the dimension tableName – The logical name of an attribute
  • Name – The logical name of an attribute
  • NameColumn – The values that are displayed in the client application when a particular attribute is selected
  • Type – The type of data the attribute represents (for example, day, month, or year)

Another important concept when configuring dimensions is setting up hierarchies. When we first create a dimension, each of the attributes is independent of each other, but user hierarchies allow you to group attributes into one or more parent-child relationships. Each relationship forms a new level of the hierarchy and provides reporting software with a way to browse data as a collection of levels-based attributes. Let’s take a specific dimension’s data as an example (DimAuthors) and look at table content. The data coming from DimAuthors can be rolled up or down based on AuthorState column. The partial output from two different queries below shows that many authors can come from the same state, therefore there is a parent-child relationship between AuthorState attribute and AuthorID and AuthorName. This is good candidate for a hierarchy if end users would like to, for example, group SalesQuantity measure based state.

Hierarchy_query_DimAuthors_HTDM

The act of creating a hierarchy is quite simple. All we do is drag and drop the attribute into the Hierarchies section, and the hierarchy will be created automatically. Below is an example of DimAuthors dimension with the hierarchy created based on AutorState and AuthorKey attributes.

Dim_design_pane_SSAS_HTDM

Once we finish defining hierarchies, we still need to create associations between attributes. If we go to Attribute Relationships pane (next to Dimension Structure) this is where we can set up the relationships between individual attributes. When you create a dimension with the wizard, it tries to implicitly map the relationships for you. But more often than not, we will have to adjust them ourselves. To adjust the relationships, begin by removing the incorrect ones on the Attribute Relationships tab. Next, locate the Attribute Relationships pane at the bottom portion of Visual Studio, highlight the existing relationships, right-click, and select Delete from the context menu. Once the relationships are removed, the Attribute Relationships tab will look as per image below, where the attributes are shown independent of each other.

No_attribute_relationships_SSAS_HTDM

Now our job is to map the relationships correctly. A simple way to accomplish this is to right-click a child attribute and then click the New Attribute Relationship option from the context menu as per image below.

Attribute_relationships_incomplete_SSAS_HTDM

A new dialogue window will appear allowing you to map the relationship child to a parent. To map the relationship, we select the related attribute in the Name drop-down box and then determine whether the relationship is expected to change over time or whether it should be considered an inflexible (aka rigid) relationship. Clicking OK will configure the relationship. Next we can define associations so that the finished relationships pane will look as per below.

Attribute_relationships_complete_SSAS_HTDM

Coming back to the Dimension Structure pane we can see a blue squiggly line highlighting DimAuthors dimension name. This is because the implied individual hierarchy exists even after we create our own user-defined attribute hierarchies. In our example, we have three attributes and have added a user-defined hierarchy, so we now have four hierarchies total. Below image is a conceptual depiction of this concept.

Dimensions_hierarchy_diagram_SSAS_HTDM

The attributes AuthorID, AuthorKey, and AuthorState still form their own hierarchical parent-child relationship with the All attribute. Microsoft calls these Attribute Hierarchies in contrast to the term User Hierarchies. Microsoft recommends hiding attribute hierarchies if you will not use them independently in your reporting applications. They can be hidden by setting the AttributeHierarchyVisible property to False, as per image below.

Attribute_hierarchy_flag_SSAS_HTDM

In that way in Microsoft Excel, the AuthorState and AuthorKey attributes will no longer be displayed independently of the AuthorState. This is Microsoft’s recommended approach, however, bear in mind that sometimes end users will prefer more options so hiding Attribute Hierarchies is not always recommended and you typically should proceed by what project requirements dictate versus application recommendations.

Finally, let’s step through all dimensions defined in previous posts to this series, configuring names, keys and hierarchies as per the document which you can access from HERE. It has all the necessary steps to configure this projects dimensions in order to create a cube.

This concludes this post. Most of the hard work is done and even though getting through all project aspects can be a bit of a toll, we nearly covered the key concepts and principles of how to build a data mart from scratch. As usual, the code used in this series as well as solution files for this SSAS project can be found and downloaded from HERE. In the NEXT ITERATION to the series I will be describing the concepts of SSAS solution deployment and cube creation as well as go through some basic validation techniques to ensure that our data mart has correct data in it before it can be reported out of.

Please also check other posts from this series:

All SQL code and solution files can be found and downloaded from HERE.

Tags: ,