- SCD Wizard Demo – SSIS Junkie blog example of a package using the Slowly Changing Dimension transformation in SSIS
- MSDN Article on the Slowly Changing Dimension transformation in SSIS
Since SQL Server Integration Services 2005 and 2008 include a SCD transformation it is not too hard to implement such dimensions.
Here I am discussing a typical requirement – to be able to have a SCD and a Current version of the dimension.
First, it is important to notice that a SCD should have two dimension keys: a unique surrogate key identifying every version of the dimension members and a non-unique code, which is common for all versions for a dimension member. This is also very important if we want to be able to determine the current version of a dimension member. An example of a very simple dimension table utilising this design is:
Here we have two distinct dimension members with Code of 1 and 2. Member1 has two versions and Member2 has three. The SKeys (surrogate keys) for these versions are unique but the codes stay the same for each member. Also, notice the From and To dates which allow us to distinguish the periods for the member versions. We can have an IsActive or IsCurrent bit column, which shows us the latest version of a node, but we can also just filter on dates which are 9999-12-31, which will give us the same result.
Assuming the described design I will move on to discuss the ways to build a dimension in SSAS.
First, the standard way to link the dimension table to our fact table is through the surrogate key. We can have a regular relationship between the two tables. As the fact data is usually also linked to a Time dimension, fact records linked against the periods between the From and To dates of our SCD will be linked to that versions SKey. An example of a fact table with a few rows, which can be linked to the dimension table above is:
The row with a FactKey of 1 will be linked against Member1Ver1, while FactKey 2 will go against Member1Ver2. Therefore, when we slice our cube by Time and our dimension we will see:
WHERE ToDate = ’9999-12-31′
The result will be:
Then we need to replace our fact table with a Named Query, which shows the DimSKeys for current version dimension members:
FROM FactTable ft
INNER JOIN DimTable dt
ON ft.DimSKey = dt.SKey
INNER JOIN DimTable dt_current
ON dt.Code = dt_current.Code
WHERE dt_current.ToDate = ’9999-12-31′
This will give us the following result:
When we slice our cube, all records for Member1 will be against the latest version:
Implementing this, we can have two dimensions in our cube, so our users can use the one that makes more sense for their needs:
- Dimension and
- Dimension (Historical), and the Historical designation stands for, in technical terms, a SCD
However, we can also implement this in a different way, which allows us to avoid building such logic in a view or our DSV. The trade-off is some space on our disks and one more column in our fact table. Instead of adding a new column through writing SQL, we can simply add the dimension Code in the fact table. Then, we can build our dimension again by getting the latest versions, but instead of having the SKey as a dimension key, we can use the Code. It is of course unique across all dimension members, as long as we filter our the non-current versions. The query for doing this is exactly the same as the one we used before. However, we need to change our fact table design and add a DimCode column:
Then, we create two dimensions again, but we link the Historical dimension with the DimSKey column and the Current one with the DimCode column. The result of slicing the cube by the current version is exactly the same as before. The trade-off is space vs. processing time and CPU usage. It is up to the developer to choose the more appropriate way to build the solution.
So far I discussed two ways of having our SCD and Current Version dimension in different dimensions in our cubes. There is, however a way to combine both in the same dimension. To do this, we need to have two levels in the dimension: a parent level, which contains the current version of the dimension members, and a child level, which contains the historical versions. In example:
This way the historical versions aggregate up to the current version and we can use either level, depending on what we want to achieve. To build this, we can use our current dimension table and add a parent level through SQL. This way, we do not need to update all records when a new version comes:
, dt_p.SKey AS ParentSKey
FROM DimTable dt
INNER JOIN DimTable dt_p
ON dt.Code = dt_p.Code
WHERE dt_p.ToDate = ’9999-12-31′
The result is:
Then, we can build our Parent-Child dimension and we can use the Parent level is we want to have current versions and the Child level for the historical ones.
This approach allows us to combine the two dimensions into one. It is also possible to implement it in a non-parent child fashion because the hierarchy is not ragged.
It is always advisable to make sure we actually need a SCD and avoid it whenever possible because it is not always intuitive for users to use one. Splitting our fact data on multiple rows can be surprising for users and understanding how the historical dimension works and the multiple nodes it consists of can be a problem. However, it lets us satisfy a common requirement and therefore it is quite important to know how to build.