Archive

Archive for the ‘SSAS’ Category

Ad-Hoc Ranges in SSAS

September 2nd, 2010

We can easily build ranges in MDX with the range operator “:”. We can also easily create a Range dimension in SSAS and use it to slice our data. This post is not about either of those. I would like to discuss the scenario where we need to restrict an ad-hoc query (e.g. PivotTable in Excel) by a range of values. Usually, we would tell our users to just select the values they need. This works. However, if our users do not want to be selecting/deselecting many values, we can provide an easier way to do this.

Let’s assume we have an Age dimension for a Kindergarten cube. The Age dimension contains the ages of the children, which can be from 1 to 10. Our requirement is to be able to select low and high limits of ages for a Pivot Table in Excel, so that the data in the Pivot Table is sliced for the range of ages between those limits.

To implement this in practise, we can build two extra dimensions – Age – Low Limit and Age – High Limit, which contain the same members as the Age dimension and then use them to slice our cube. Because the data is the same for all three dimensions, we can use a couple of database views on top of the Ade dimension table, thus ensuring that all three are in sync:

After that, we build two bridging tables BridgeLowLimit and BridgeHighLimit between Age and Age – High Limit, as well as between Age – Low Limit:

The data in these Bridging tables maps each Low and High limit to all Age members which are either lower (for High limit) or higher (for Low Limit) than the limit members:

 

Now, we can define many-to-many relationships between the FactAmount (our fact table), through the Age dimension and the Bridging tables to our limit dimensions as follows:

After this, we can hide the two measure groups for the Bridge tables from the users:

Now, we are ready to process our SSAS database. After that, we get the following in Excel:

If we place the Low and High limits in the Report Filter, Age on rows and our Amount measure on columns we can limit the Age members displayed by changing the filter members. Note that only the lowest member in the Age – Low Limit dimension and the highest in the Age – High Limit dimension matter – everything in between those (in case of multi-selects) effectively get ignored.

There are certain problems with this solution. If we place the limit dimensions on rows and we select multiple members from each dimension, we get the following:

This can be confusing for the users if they want to get distinct ranges like 1-3, 3-6, 6-10. Unfortunately, it is not possible to build a cube calculation which hides the irrelevant members as we do not know what the users have selected in Excel. From there, we cannot determine what members are in the query scope, and from there, we can’t pick only the ones we need (e.g. the ones with the lowest distance between the two limit members).

If we place the two dimensions on Rows and Columns, we can get a nice matrix and this makes a bit more sense:

 

Also, for large dimensions (e.g. Date), this can be quite slow, as the number of rows in the Bridge tables will grow. In example, if we have 10 years in our Date dimension, and we map them the way I just showed we will end up with approximately 6-7 million rows in each Bridge table, which can be quite prohibitive from performance point of view. However, for smaller dimensions (in my opinion everything under 1000 members would be ok as it would generate approximately up to 500,000 rows in each Bridge table). Therefore, if our users insist on this functionality – especially when they have a flat list of 100-1000 members, and they frequently select ranges out of this list – we have a way of answering their need.

SSAS , , ,

 

How to use ITEM and when ITEM(0).ITEM(0) is redundant

August 30th, 2010

In MDX we have the Item function which can be used in a number of ways. It is important to understand how it works and how it can be used to our advantage.

As a start, we can call Item over a set or over a tuple:

{set}.Item(0) or (tuple).Item(0)

It may be important to note that when we call Item over a set, we get a tuple out of it (sets are collections of tuples), while if we call it over a tuple we get a member.

If we use Item with a tuple, we must specify as an argument the integer position of the member within the tuple which we want. However, when we works with sets, we can either do the same, or specify a number of strings, which identify specific tuples. Some examples:

Item with a tuple:

(a,b).Item(0) = a

SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 ([Product].[Category].&[4], [Date].[Calendar].[Calendar Year].&[2008]).Item(0)
} ON 1
FROM [Adventure Works]

(a,b).Item(1) = b

SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 ([Product].[Category].&[4], [Date].[Calendar].[Calendar Year].&[2008]).Item(1)
} ON 1
FROM [Adventure Works]

Item with a set:

{a,b,c}.Item(0) = a

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item(0)
} ON 1
FROM [Adventure Works]

{a,b,c}.Item(“a”) = a

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item("([Product].[Category].&[4],
             [Date].[Calendar].[Calendar Year].&[2008])")
} ON 1
FROM [Adventure Works]

{(a1,b1),(a2,b2),(a3,b3)}.Item(“a1″,”b1″) = (a1,b1)

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item("[Product].[Category].&[4]",
           "[Date].[Calendar].[Calendar Year].&[2008]")
} ON 1
FROM [Adventure Works]

When we specify a number of strings as arguments, we get the tuple which is defined by these strings/coordinates.

Now, let’s see what happens when we have a set of tuples and we use Item on it with a single argument:

{(a1,b1),(a2,b2),(a3,b3)}.Item(0) = (a1,b1)

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item(0)
} ON 1
FROM [Adventure Works]

We get a tuple back. Therefore, if we use a second Item function over the first one, we will get the member on that position from the tuple:

{(a1,b1),(a2,b2),(a3,b3)}.Item(0).Item(0) = (a1,b1).Item(0) = a1

To illustrate the concept:

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item(0).Item(0)
} ON 1
FROM [Adventure Works]

This gives us the whole amount for Accessories, while:

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item(0).Item(1)
} ON 1
FROM [Adventure Works]

gives us the total amount for 2008.

Even if we do:

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item(0).Item(0).Item(0).Item(0).Item(0).Item(0).Item(0).Item(0)
} ON 1
FROM [Adventure Works]

we still get the amount for accessories.

What happens here  is:

  • With the first call of Item(0) over SET1 we get the first tuple from the set (in our case it is ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008])).
  • Then with the second call, we get the first member of this tuple – [Product].[Category].&[4].
  • Now, with the third call of Item(0) over this member, we get the first member from the implicitly converted to tuple member from the previous step. Therefore, we pull out the first member from it which is ([Product].[Category].&[4]).
  • From here onwards we flip between a tuple and a member as a result every time we call Item(0).

But if we do:

WITH
SET SET1 AS
 { ([Product].[Category].&[4],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[1],[Date].[Calendar].[Calendar Year].&[2008]),
  ([Product].[Category].&[3],[Date].[Calendar].[Calendar Year].&[2008])
 }
SELECT
{
 [Measures].[Internet Sales Amount]
} ON 0,
{
 SET1.Item(0).Item(1).Item(1).Item(0).Item(0).Item(0).Item(0).Item(0)
} ON 1
FROM [Adventure Works]

we get nothing back. This is because there is no element on the second position/coordinate of the tuple ([Date].[Calendar].[Calendar Year].&[2008]).

Therefore calling Item(0) after another Item(0) is rarely necessary and should be done only if we need it, because we could either get wrong results or possibly hurt our query performance.

SSAS , ,

 

Set Operations in MDX – UNION and EXCEPT

August 1st, 2010

I just read an article MDX: Except written by Vincent Rainardi.

It shows set subtraction by usng the EXCEPT function (as you could derive from the title, no doubt). I have always been a fan of using the “-” and “+” operators instead of EXCEPT and UNION where possible because in my opinion they give us better visiblity of the intention we are putting behind our MDX expressions. However, EXCEPT and UNION have an advantage over “-” and “+” – the third parameter ALL.

In BOL we can see that both of these functions can be used like this: UNION/EXCEPT(set1, set2, ALL). If we skip the ALL keyword, we would get exactly what we would get with +/-. Some examples:

SET1: {a, b, c}
SET2: {c}

SET3 = UNION(SET1, SET2) = SET1+SET2 = {a, b, c}+{c} = {a, b, c}

But if we use ALL, we would get duplicates in our result set:

SET4 = UNION(SET1, SET2, ALL) = UNION({a, b, c}, {c}, ALL} = {a, b, c, c}

The difference here is the duplicates, which get preserved in SET4 because of ALL. And this is also where EXCEPT is different to “-”:

SET5 = EXCEPT(SET4, SET2) = SET4-SET2 = {a, b, c, c} – {a} = {b, c}

while

SET6 = EXCEPT(SET4, SET2, ALL) = {a, b, c, c} – {a} = {b, c, c}

As you can see, unlike in set math where a set cannot have dupicates, in MDX we can. Therefore, if we are in a situation where we need to preserve these, we have the option of using the UNION and EXCEPT functions with the ALL parameter.

I am using these concepts on every-day basis and I have found that mastering them gives me a very powerful way of solving many MDX problems. I hope that the examples are suitable and easy to understand – when I read the article after I have just written it I sound a bit like my math lecturer from uni (I wish he knew and taught MDX), who was a good guy, so I guess there is nothing wrong with that :)

SSAS , ,

 

Using the DSV to its Full Potential

July 22nd, 2010

The Data Source View in Analysis Services is a very powerful abstraction of the data source and it can help us overcome some scenarios in an easy and clean way. Many times we look for MDX or programmatic solutions to problems, which can be tackled best in our data. While for complex tasks we would be better off extending the ETL process, some simple ones can and should be implemented in the DSV.

 

As an introduction to the topic I would like to explain briefly what the DSV actually is. It can be conceptualised as a database view on top of the data source. By default all tables which we need for building the Analysis Services database (typically dimensions and facts) are appearing in the DSV as table bindings (exactly as if we do a SELECT * FROM Table). If we have no foreign keys defined in our database, SSAS will not show us the relationships in the DSV. However, we can define logical relationships in the DSV, thus connecting the tables on related columns, which are then used for automatically determining dimension relationships to the measure groups.

 

There are two important ways to modify the DSV, which allow us to add more columns to the existing tables and to modify the way the existing columns are shown:

Named Queries

 

 

 

If we right-click on a table in the DSV, we can select to replace the table with a Named Query. A Named Query is essentially a T-SQL statement, which is equivalent to a database view definition. By utilising Named Queries we can alter the way we see the tables and their column in SSAS. In example, we could concatenate columns, implement CASE logic, etc. Named Queries can be thought of as equivalent to database views.

 

Named Calculations

 

A named calculation is a SQL statement which adds a column to a table without modifying the table binding. It gives us an easy way to define a new column without changing the whole query. The statement defining the column is in T-SQL and it behaves the same way as a new column in a Named Query (or a SELECT statement). If we just want to add one more column (e.g. Display Order, Code+Description concatenation, etc.), we can simply define a Named Calculation. Also, as the name suggests, Named Calculations can be commonly used for defining a leaf-level calculation without modifying a large fact table’s SELECT statement in a Named Query.

 

The column we define here appears in both the DSV table and in the Dimension Designer window:

 

These two DSV functions can be used in many scenarios. Most importantly, there are a few when they yield better performance, faster development and easier maintenance:

Leaf-level calculations

If we have the common requirement to perform leaf-level calculations and then aggregate this up the hierarchy, as opposed to aggregating and then calculating, the best way to do this is in a SQL statement on the fact table. Alternatively, we can do this in and MDX statement:

 SUM(DESCENDANTS(Dim.CurrentMember,,LEAVES), MeasureCalc)

However, it comes at a price. Since SSAS would have to do the calculation for each leaf and then sum this up the hierarchy, this could take a long time to perform. Also, SSAS would not be able to use pre-processed aggregations and the calculations will be done at execution time. To avoid this we could add a new column to the fact table and do the calculation there (in SQL), using the column as a new measure in the cube, which can then be aggregated by SSAS as any other measure. The performance gain is usually substantial and using a Named Query or a Named Calculation should always be the preferred option.

Description Attributes

Often we need to perform a concatenation between different dimension attributes, which we can use as a Description attribute while slicing the cube, or when providing reports from the SSAS database. A very easy way to achieve such a requirement is to use our DSV and concatenate the column we need in a new column in the dimension table, which we can expose as a new attribute in the dimension. A task such as concatenating an Account Code and Account Description into an Account Long Description (i.e. [Account Code] + ‘-‘ + [Account Description]) becomes very easy to implement within the DSV without modifying the ETL or any tables.

Composite Keys

Sometimes we need to build unique keys for attribute column in a dimension. A good example is a Date dimension, which does not have unique keys for non-leaf levels such as Month. Often developers have Month Key of 1,2,3-12. This does not make a good Month key in SSAS as it is not unique for higher levels such as Year, Quarter, etc. There are a number of ways to tackle this common scenario. While the recommended approach would be to build a concatenation between Year-Quarter-Month as a Month Key in the dimension table, we can also achieve this by either selecting all of the columns as key columns for the attribute in the dimension attribute properties. However, this would give us a concatenated key in MDX and this could sometimes be undesirable. A yet simpler and cleaner solution is to concatenate the relevant columns in the DSV by using a Named Query. Instead of the typical

SELECT col1, col2,.., MonthKey, colx, coly, coly FROM DimDate

we can write

SELECT col1, col2,…,YearKey+QuarterKey+MonthKey AS MonthKey, colx, coly, coz FROM DimDate

This way we can use the MonthKey column directly as a key for our Month attribute.

While this is useful for a Date dimension, it can also be useful for any other composite key definition in our dimensions.

Other possible applications of DSV Named Queries and Named Calculations are the implementation of

  • Sort Order attribute, in cases when we need custom sort of the dimension attributes
  • Restricting the data which comes into the cube dynamically based on a certain condition (think of a Date dimension, which includes only relevant periods)
  • Combining tables – by a SQL join
  • Replacing 0s with NULLs (the opposite can be done automatically in SSAS) for our measures

Basically, in a DSV we can “correct” our data to make it suitable for our cube without changing the ETL.

Last but not least, we can also transform tables to conform to a star-schema-like design. If we want to show a proof of concept on top of a normalized OLTP database, we could avoid the ETL complexities, as well as building a datamart, and use SQL to join/split tables in dimension and fact tables, which are suitable for cube development. While this could work in post-POC scenarios, it would be better to take a cautious approach to it as there are many scenarios when it would either not work, or will be too slow.

And a word of warning – your DSV could become slow because of over-use of complex Named Queries. This could be painful when minimising cube processing time is crucial, or when the DSV starts timing out and queries take hours to execute. Luckily, in most cases we can simply move these large queries forward – to the ETL where we have more time and better tools (e.g. SSIS).

SSAS , , , , ,

 

Do Business Analysts make good dimensional modellers??

May 26th, 2010

Recently I had the (dis)pleasure of working with Business Analysts, who also thought that they are good in dimensional modelling. so, I had to implement BI solutions (including cubes) on top of their database design. I will show an example (about 95% the same as the actual design), where the idea of letting BAs go into dev territory does not yield the best results:

 

This “dimensional model” was created by an experienced BA. Some “features” are missing here:
1. The fact table had EffectiveFrom and EffectiveTo dates
2. The relationships between some Dim Tables were 1-1 ?!
3. The Time dim (the only one properly implemented on its own – on the bottom of my example) had columns like: DateTimeName nvarchar(100), DateTimeKey nvarchar(100), YearName nvarchar(100), etc..
4. The Some Tables on the top had nothing to do with the rest (in fact a colleague of mine reckons they are there to fill in the white space on the top of the A3 printout)

Another design, which is better, but still pretty bad showed up after my training on Dimensional Modelling (1hr to go through EVERYTHING, including M2M relationships, Parent-Child hierarchies, Type 2 dimensions, etc):

Obviously, the designer (a developer actually) did grasp some concepts. However, my explanation of a star schema must have been not too clear..

Hope that you had some fun with these two diagrams..and I am sure many developers get in a similar situation, especially when someone else designs their databases. But two points:

1. Ask the BAs to analyse the business and their requirements – not to design the database
2. 1 hour of training on dimensional modelling will not make you an expert

SSAS, T-SQL ,