<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Boyan Penev on Microsoft BI &#187; Slowly Changing Dimension</title>
	<atom:link href="http://www.bp-msbi.com/tag/slowly-changing-dimension/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.bp-msbi.com</link>
	<description>A practical blog about Microsoft BI tools, techniques and practices written by a developer for other fellow developers.</description>
	<lastBuildDate>Sun, 29 Jan 2012 03:23:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Combining Slowly Changing Dimensions and Current Dimension Versions</title>
		<link>http://www.bp-msbi.com/2009/02/combining-slowly-changing-dimensions/</link>
		<comments>http://www.bp-msbi.com/2009/02/combining-slowly-changing-dimensions/#comments</comments>
		<pubDate>Tue, 24 Feb 2009 01:49:00 +0000</pubDate>
		<dc:creator>Boyan Penev</dc:creator>
				<category><![CDATA[SSAS]]></category>
		<category><![CDATA[Analysis Services]]></category>
		<category><![CDATA[dimensions]]></category>
		<category><![CDATA[hierarchies]]></category>
		<category><![CDATA[Slowly Changing Dimension]]></category>

		<guid isPermaLink="false">http://www.bp-msbi.com/2009/02/combining-slowly-changing-dimensions-and-current-dimension-versions/</guid>
		<description><![CDATA[When we need to see historical changes of a dimension in our OLAP cube the common practice is to implement it as a SCD &#8211; or a Slowly Changing Dimension. There are a few ways to do this and a really good definition of the different types of SCDs can be found in Wikipedia: Slowly [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: left;"><span style="color: #000000;">When we need to see historical changes of a dimension in our OLAP cube the common practice is to implement it as a SCD &#8211; or a Slowly Changing Dimension. There are a few ways to do this and a really good definition of the different types of SCDs can be found in Wikipedia: </span><a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension">Slowly Changing Dimension</a><span style="color: #000000;">. Also, there are quite a few articles on Implementing SCD ETLs in SSIS, two of which are:</span></div>
<ul>
<li><a href="http://blogs.conchango.com/jamiethomson/archive/2005/06/06/1543.aspx">SCD Wizard Demo</a><span style="color: #000000;"> &#8211; SSIS Junkie blog example of a package using the Slowly Changing Dimension transformation in SSIS</span></li>
<li><a href="http://msdn.microsoft.com/en-us/library/ms141715.aspx">MSDN Article</a> <span style="color: #000000;">on the Slowly Changing Dimension transformation in SSIS</span></li>
</ul>
<p><span style="color: #000000;">Since SQL Server Integration Services 2005 and 2008 include a SCD transformation it is not too hard to implement such dimensions.</span></p>
<p><span style="color: #000000;">Here I am discussing a typical requirement &#8211; to be able to have a SCD and a Current version of the dimension.</span></p>
<p><span style="color: #000000;">First, it is important to notice that a SCD should have two dimension keys: a unique surrogate key identifying every version of the dimension members and a non-unique code, which is common for all versions for a dimension member. This is also very important if we want to be able to determine the current version of a dimension member. An example of a very simple dimension table utilising this design is:</span></p>
<p style="text-align: center;"><span style="color: #000000;"><img id="BLOGGER_PHOTO_ID_5306203289514277810" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 359px; height: 121px;" src="http://2.bp.blogspot.com/_BMa6MDrkUyA/SaNrx0GB-7I/AAAAAAAABss/oVLHwR1cPR4/s400/table8.PNG" border="0" alt="" /></span></p>
<p><span style="color: #000000;">Here we have two distinct dimension members with Code of 1 and 2. Member1 has two versions and Member2 has three. The SKeys (surrogate keys) for these versions are unique but the codes stay the same for each member. Also, notice the From and To dates which allow us to distinguish the periods for the member versions. We can have an IsActive or IsCurrent bit column, which shows us the latest version of a node, but we can also just filter on dates which are 9999-12-31, which will give us the same result.</span></p>
<p><span style="color: #000000;">Assuming the described design I will move on to discuss the ways to build a dimension in SSAS.</span></p>
<p><span style="color: #000000;">First, the standard way to link the dimension table to our fact table is through the surrogate key. We can have a regular relationship between the two tables. As the fact data is usually also linked to a Time dimension, fact records linked against the periods between the From and To dates of our SCD will be linked to that versions SKey. An example of a fact table with a few rows, which can be linked to the dimension table above is:</span></p>
<p><span style="color: #000000;"><img id="BLOGGER_PHOTO_ID_5306203291655356610" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 400px; height: 53px;" src="http://1.bp.blogspot.com/_BMa6MDrkUyA/SaNrx8EgQMI/AAAAAAAABsk/w3NFWx8nC3E/s400/table7.PNG" border="0" alt="" /></span></p>
<p><span style="color: #000000;">The row with a FactKey of 1 will be linked against Member1Ver1, while FactKey 2 will go against Member1Ver2. Therefore, when we slice our cube by Time and our dimension we will see:</span></p>
<p><a href="http://2.bp.blogspot.com/_BMa6MDrkUyA/SaNrp59Vx7I/AAAAAAAABsE/hsj49Zcbjts/s1600-h/table3.PNG" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"><span style="color: #000000;"><img id="BLOGGER_PHOTO_ID_5306203153649485746" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 358px; height: 61px;" src="http://2.bp.blogspot.com/_BMa6MDrkUyA/SaNrp59Vx7I/AAAAAAAABsE/hsj49Zcbjts/s400/table3.PNG" border="0" alt="" /></span></a></p>
<p><span style="color: #000000;"> </span></p>
<div style="text-align: left;"><span style="color: #000000;">This is the standard way to implement our SCD and these are the results we would expect. Now, we get a new requirement. We want to be able to see both this and an aggregation against the current version of our dimension. We have a few ways to implement it. One obvious way is to create another dimension containing only the current dimension members. This can be easily achieved if we add a Named Query in our DSV, which shows only the current dimension members:</span></div>
<blockquote><p><span style="font-family: 'courier new';"><span style="color: #663300;">SELECT</span><span style="white-space: pre;"><span style="color: #663300;"> </span></span><span style="color: #663300;">SKey<br />
</span><span style="white-space: pre;"><span style="color: #663300;"> </span></span><span style="color: #663300;">, Code<br />
</span><span style="white-space: pre;"><span style="color: #663300;"> </span></span><span style="color: #663300;">, Description<br />
FROM DimTable<br />
WHERE ToDate = &#8217;9999-12-31&#8242;</span></span></p></blockquote>
<p><span style="color: #000000;">The result will be:</span></p>
<p style="text-align: center;"><span style="color: #0000ee;"><span style="color: #000000;"><a href="http://1.bp.blogspot.com/_BMa6MDrkUyA/SaNsrr89ZgI/AAAAAAAABs0/l_LQO5AvLV0/s1600-h/table4.PNG" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"><span style="color: #000000;"><img id="BLOGGER_PHOTO_ID_5306204283761157634" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 257px; height: 60px;" src="http://1.bp.blogspot.com/_BMa6MDrkUyA/SaNsrr89ZgI/AAAAAAAABs0/l_LQO5AvLV0/s400/table4.PNG" border="0" alt="" /></span></a></span></span></p>
<p><span style="color: #000000;">Then we need to replace our fact table with a Named Query, which shows the DimSKeys for current version dimension members:</span></p>
<blockquote><p><span style="color: #663300;"><span style="font-family: 'courier new';">SELECT<span style="white-space: pre;"> </span>ft.FactSkey<br />
<span style="white-space: pre;"> </span>, dt_current.DimSKey<br />
<span style="white-space: pre;"> </span>, ft.TimeKey<br />
<span style="white-space: pre;"> </span>, ft.Amount<br />
FROM FactTable ft<br />
<span style="white-space: pre;"> </span>INNER JOIN DimTable dt<br />
<span style="white-space: pre;"> </span>ON ft.DimSKey = dt.SKey<br />
<span style="white-space: pre;"> </span>INNER JOIN DimTable dt_current<br />
<span style="white-space: pre;"> </span>ON dt.Code = dt_current.Code<br />
WHERE dt_current.ToDate = &#8217;9999-12-31&#8242;</span></span></p></blockquote>
<p><span style="color: #000000;">This will give us the following result:</span></p>
<p style="text-align: center;"><span style="color: #0000ee;"><span style="color: #000000;"><a href="http://3.bp.blogspot.com/_BMa6MDrkUyA/SaNrqNT9PCI/AAAAAAAABsU/CVYIfcfG35Q/s1600-h/table5.PNG" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"><span style="color: #000000;"><img id="BLOGGER_PHOTO_ID_5306203158844619810" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 358px; height: 60px;" src="http://3.bp.blogspot.com/_BMa6MDrkUyA/SaNrqNT9PCI/AAAAAAAABsU/CVYIfcfG35Q/s400/table5.PNG" border="0" alt="" /></span></a></span></span></p>
<p><span style="color: #000000;">When we slice our cube, all records for Member1 will be against the latest version:</span></p>
<p style="text-align: center;"><span style="color: #0000ee;"><span style="color: #000000;"><a href="http://3.bp.blogspot.com/_BMa6MDrkUyA/SaNtD9y-o2I/AAAAAAAABtE/A3nbPo6dLh0/s1600-h/table6.PNG" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"><span style="color: #000000;"><img id="BLOGGER_PHOTO_ID_5306204700867994466" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 359px; height: 40px;" src="http://3.bp.blogspot.com/_BMa6MDrkUyA/SaNtD9y-o2I/AAAAAAAABtE/A3nbPo6dLh0/s400/table6.PNG" border="0" alt="" /></span></a></span></span></p>
<p><span style="color: #000000;">Implementing this, we can have two dimensions in our cube, so our users can use the one that makes more sense for their needs:</span></p>
<ul>
<li><span style="color: #000000;"><strong>Dimension</strong> and</span></li>
<li><span style="color: #000000;"><strong>Dimension (Historical)</strong>, and the <em>Historical</em> designation stands for, in technical terms, a SCD</span></li>
</ul>
<p><span style="color: #000000;">However, we can also implement this in a different way, which allows us to avoid building such logic in a view or our DSV. The trade-off is some space on our disks and one more column in our fact table. Instead of adding a new column through writing SQL, we can simply add the dimension Code in the fact table. Then, we can build our dimension again by getting the latest versions, but instead of having the SKey as a dimension key, we can use the Code. It is of course unique across all dimension members, as long as we filter our the non-current versions. The query for doing this is exactly the same as the one we used before. However, we need to change our fact table design and add a DimCode column:</span></p>
<p><a href="http://2.bp.blogspot.com/_BMa6MDrkUyA/SaNtMypllOI/AAAAAAAABtM/K3jEz9WC1s4/s1600-h/table7.PNG" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"><img id="BLOGGER_PHOTO_ID_5306204852494636258" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 400px; height: 53px;" src="http://2.bp.blogspot.com/_BMa6MDrkUyA/SaNtMypllOI/AAAAAAAABtM/K3jEz9WC1s4/s400/table7.PNG" border="0" alt="" /></a></p>
<p><span style="color: #000000;">Then, we create two dimensions again, but we link the Historical dimension with the DimSKey column and the Current one with the DimCode column. The result of slicing the cube by the current version is exactly the same as before. The trade-off is space vs. processing time and CPU usage. It is up to the developer to choose the more appropriate way to build the solution.</span></p>
<p><span style="color: #000000;">So far I discussed two ways of having our SCD and Current Version dimension in different dimensions in our cubes. There is, however a way to combine both in the same dimension. To do this, we need to have two levels in the dimension: a parent level, which contains the current version of the dimension members, and a child level, which contains the historical versions. In example:</span></p>
<blockquote><p><span style="color: #330099;"><span style="color: #993300;">Member1Ver2<br />
<span style="white-space: pre;"> </span>Member1Ver1<br />
<span style="white-space: pre;"> </span>Member1Ver2<br />
Member2Ver3<br />
<span style="white-space: pre;"> </span>Member2Ver1<br />
<span style="white-space: pre;"> </span>Member2Ver2<br />
<span style="white-space: pre;"> </span>Member2Ver3</span></span></p></blockquote>
<p><span style="color: #000000;"><span style="color: #000000;">This way the historical versions aggregate up to the current version and we can use either level, depending on what we want to achieve. To build this, we can use our current dimension table and add a parent level through SQL. This way, we do not need to update all records when a new version come</span>s:</span></p>
<blockquote><p><span style="color: #663300;"><span style="font-family: 'courier new';">SELECT</span><span style="white-space: pre;"><span style="font-family: 'courier new';"> </span></span><span style="font-family: 'courier new';">dt.SKey<br />
</span><span style="white-space: pre;"><span style="font-family: 'courier new';"> </span></span><span style="font-family: 'courier new';">, dt.Code<br />
</span><span style="white-space: pre;"><span style="font-family: 'courier new';"> </span></span><span style="font-family: 'courier new';">, dt.Description<br />
</span><span style="white-space: pre;"><span style="font-family: 'courier new';"> </span></span><span style="font-family: 'courier new';">, dt_p.SKey AS ParentSKey<br />
FROM DimTable dt<br />
<span style="white-space: pre;"> </span>INNER JOIN DimTable dt_p<br />
<span style="white-space: pre;"> </span>ON dt.Code = dt_p.Code<br />
WHERE dt_p.ToDate = &#8217;9999-12-31&#8242;</span></span></p></blockquote>
<p><span style="color: #000000;">The result is:</span></p>
<p><span style="color: #330099;"><span style="color: #000000;"><a href="http://1.bp.blogspot.com/_BMa6MDrkUyA/SaNrp4rE5nI/AAAAAAAABr0/lUaCRnu58yc/s1600-h/table1.PNG" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"><span style="color: #000000;"><img id="BLOGGER_PHOTO_ID_5306203153304446578" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 400px; height: 101px;" src="http://1.bp.blogspot.com/_BMa6MDrkUyA/SaNrp4rE5nI/AAAAAAAABr0/lUaCRnu58yc/s400/table1.PNG" border="0" alt="" /></span></a></span></span></p>
<p><span style="color: #000000;">Then, we can build our Parent-Child dimension and we can use the Parent level is we want to have current versions and the Child level for the historical ones.</span></p>
<p><span style="color: #000000;">This approach allows us to combine the two dimensions into one. It is also possible to implement it in a non-parent child fashion because the hierarchy is not ragged.</span></p>
<p><span style="color: #000000;">It is always advisable to make sure we actually need a SCD and avoid it whenever possible because it is not always intuitive for users to use one. Splitting our fact data on multiple rows can be surprising for users and understanding how the historical dimension works and the multiple nodes it consists of can be a problem. However, it lets us satisfy a common requirement and therefore it is quite important to know how to build.</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bp-msbi.com/2009/02/combining-slowly-changing-dimensions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

