Written by: Reza Rad, trainer, consultant, and mentor at RADACAD.
When I explain the Star Schema and best practices of data modeling in Power BI, often I get questions such as what if we want to have budget and actuals in the same data model? What if there are two tables with different granularity? We cannot solve it with simple data modeling and relationships, this needs complex DAX calculation. My answer is: That can also be solved with a zero complexity model. In this article, I’m going to explain an example of how two tables with different granularity (like budget and actual) can be modeled easily using Power BI.
The Sample dataset used here is the AdventureWorksDW Excel file which can be downloaded here.
Sales Star Schema
A fact table is a table that contains happening of action, it keeps numeric and aggregatable data in it. An example of a fact table is the FactResellerSales table in the model below, which is connected to DimProduct, DimDate, DimSalesterritory, and DimEmployee.
To create this model:
- Get data from AdventureWorksDW data source from above
- Select all the tables mentioned
- Remove the relationship between DimSalesTerritory and Dim Employee
- Set the relationship between FactResellerSales and DimEmployee as Active
- Create a relationship between OrderDateKey in the FactResellerSales table and the DataKey in the DimDate table
The model above is a perfect representative of a star schema model. This model can easily answer all questions regarding sales. However, it seems difficult to think about star schema when you want to bring another model in with different granularity.
Sales Quota: Different Granularity
For this example, instead of the budget data, I have Sales Quota which is under a different granularity. The concept and principles remain the same. We have a table with different granularity which we want to connect to this model. The Sales Quota has the quota of sales for each employee in each quarter. The granularity of this table is per employee and per quarter, while the granularity of the sales table is per employee, per day, and per combination of SalesTerritory and Product also.
If you get the FactSalesQuota from AdventureWorksDW, you will notice that there is a Date column and the DataKey which I have removed in my example below because they will be confusing for you to understand the table is showing quarterly data.
Here is how the data in the Sales Quota table looks: (FactSalesQuota in the AdventureWorksDW):
Challenges of Two Different Granularities
If you want to build the star schema for Sales Quota and Sales (similar to actual vs. budget), then the challenges would be:
- How to connect the Sales Quote to the date table?
- If I create a Quarter dimension, then how to connect Quarter dimension to the date dimension? Does this create a snowflake scenario?
I’m going to explain an easy method to solve it and then it will be a proper star schema again.
Connect SalesQuota to the Date Table
Although the sales quota information is quarterly based, you can still connect it to the date table. This would avoid creating an extra dimension for the quarter and snowflake between quarter dimension and the date dimension. The only thing to consider is that you have to consider a specific date in each quarter as your default value. For example, we can consider the first day of each quarter as our data value.
To achieve this purpose we need to create a column in the SalesQuota table which has the DataKey in it (the DataKey used in my sample is in this format YYYYMMDD), so it can be something like 20190101. This is what we’re going to build in this part:
To create that field, you can easily do these steps:
- Create month number based on the quarter number
- Create the first date of that month as a custom column
To create the first month of the quarter from the quarter number, you can follow a simple calculation of ((<quarter-1>*3)+1), as below:
The calculation above is done in the Power Query Editor.
The second part is to create a Datakey based on it;
To do this part you first need to make sure the month number is a two-character length field, you can do that with these 3 steps: 1. converting the data type to text. 2. adding a prefix of “0” to this column. 3. extracting the right two characters of this column.
Convert the data type to text:
Then add a prefix to the column:
and add zero as the prefix
Then extract the last two characters:
The final result will be:
Now that we’ve got everything needed, the last step is to create a custom column which is the concatenation of Year, Month, and Day (make sure before this step you changed the data type of CalendarYear to text).
And then change the data type of the DataKey to the whole number, you can also do remove these columns: CalendarYear, CalendarQuarter, and First Month of the Quarter.
I haven’t explained the details of each step, because your data might not be at a quarter level and the date key that you use might have a different format. You might even use Merge option in Power Query to connect this to an aggregated version of the date table based on year and quarter and then fetch the first of each quarter from the expand column. No matter how you do it, the main point of all these steps is to make sure you can connect your table to the date dimension without needing an extra dimension.
Star Schema Everywhere
After adding the DataKey to the Sales Quota table, you can connect it to the date table, and also the employee table, as you can see we still have Star Schema everywhere.
There is no need for a bi-directional relationship or connecting a dimension to another dimension. This model can answer your budget vs actual questions easily.
To show you some sample analysis I fetched from the Sales Quota vs the ResellerSales in the above scenario, I have now created some visual reports. All these visuals work without needing any complicated DAX calculation or bi-directional relationship. All of that is possible because the data model is set up right.
Simple DAX Measures
I only created two measures for this calculation, very simple measures: Bud vs Act:
and Bud vs Act %:
And here are the reports:
And the details which are a drill through from the other two pages;
Benefits of this Model
I showed you a fully-star-schema model design for budget vs actual in this article. Benefits of this model are basically all benefits of the star schema, including but not limited to:
- No extra relationship. Direct single relationship between each fact table and each dimension connected to it.
- No need for bi-directional relationship.
- No need for many-to-many relationship.
- No need for complicated DAX calculations.
- Fast performing model that answers the budget vs actual requirements.
Resource Credit | RADACAD