Archive
 
Spreading Non-Transactional Data Along Time
In some cases we need to be able to analyse non-transactional data for discrete periods along a time dimension. An example of such a case is a collection of invoices, which have start and end dates for a period, but are not otherwise connected to a time axis. We may have such invoices with these properties:
Invoice Id
Start Date
End Date
Amount
One of the invoices may be:
Invoice Id: 34821432
Start Date: 2008-10-15
End Date: 2009-03-14
Amount: 15,000.00
and another one:
Invoice Id: 34934221
Start Date: 2008-12-01
End Date: 2009-05-30
Amount: 6,500.00
If the company we are building this for is daily deducting a fee for its services (e.g. funds management, software maintenance, etc.), we may have to be able to spread the amount in smaller periods, like months or days and then aggregate the smaller amounts along a time dimension.
To do this we have to first store the data in a relational table and then write some SQL to do the trick for us.
First, we should create a table valued function which returns all the dates at a specified granularity, such as days, from the Start to the End dates and the count of all the periods in between (in our case is is a count of days):
CREATE FUNCTION udf_Create_Daily_Date_Spread
(
@Start_Date datetime
, @End_Date datetime
)
RETURNS @Daily_Spread TABLE (
Date_Id datetime
, Count_Of_Days int
)
AS
BEGIN
DECLARE @Count int
SET @Count = 0IF @Start_Date >= @End_Date
RETURNWHILE @Start_Date <= @End_Date
BEGIN
INSERT INTO @Daily_Spread(Date_Id)
SELECT @Start_DateSET @Start_Date = DATEADD(d, 1,@Start_Date)
SET @Count = @Count + 1
ENDUPDATE @Daily_Spread
SET Count_Of_Days = @CountRETURN
END
After having created these functions, we can use the CROSS APPLY statement to create the even spread:
SELECT Invoice_Id
,Start_Date
,End_Date
,cdds.Date_Id
,Amount/cdds.Count_Of_Days
FROM Invoice_Source inv
CROSS APPLY udf_Create_Daily_Date_Spread(inv.Start_Date, inv.End_Date) cdds
After running the sample data through this code, we will get an even spread for both invoices and we will be able to attach a time dimension to them.
Even though the data size may explode after such a manipulation, Analysis Services provides an excellent way of handling even the largest sets of data. If storage is a problem, we can always choose to break down our data in less periods – instead of days, weeks or months.
 
Pre-Aggregated Data and Changing Dimensions
Normally when working with data we assume that we have it on the lowest possible grain and we are challenged by the need to aggregate it along various hierarchies. By the nature of BI, we get to work with large sets of detailed data collected by an existing system.
But what happens when we are given the aggregations and we need to work with these instead? Recently Nick Barclay and I were involved in such an implementation. Nick designed a Health and Safety Dashboard and after that I built it with Reporting Services, PerformancePoint, Analysis Services and SQL Server 2005 relational databases. We were told in the beginning of the project that the aggregation business rules are so complicated that rebuilding them for the purposes of the dashboard was way out of scope. I had not had experience with pre-aggregated data and I did not foresee a major problem, which became apparent after a few months of development when the Business Unit hierarchy changed significantly.
Aggregation Rules
In contrast with typical business scenarios when an SSAS function like Sum and some custom roll-ups works perfectly well, the Health and Safety data needed to be aggregated in a complex way and the organisation we built the dashboard for had already invested in a system managing these aggregations. In example, a simple rule would be – if we have more than 1 major incidents in a Business Unit, it gets an Amber score for Safety Compliance. If it has more than 4, it becomes Red. In turn, its parent is the same – if the sum of all major incidents for its descendants is greater than 1, it becomes Amber and with more than 4 – Red. There were also quite a few dependencies between various dimensions and reluctantly we agreed to work with the data the way it was. The following diagram shows the way an aggregate for Major Incidents works:
Japan doing badly on Major Incidents also makes Asia and Global look bad.
The problem
The actual problem comes from the fact that our dimension hierarchy can change and if it does the aggregations do not make sense historically. We could have a change in our hierarchy:
and we may need to generate the following simple report for before and after the change:
The historical data suggests that something was wrong with China as the only child of Asia before the change in hierarchy, while in fact, the culprit then (and now) was Japan. As we would not be able to see how the hierarchy looked before, we would not be able to analyse the data we have accordingly. Also, we cannot compare data for Asia after the change to data for Asia from before the change along the new hierarchy.
Possible solutions
In my project, the client actually agreed that it is alright to have this handicap in the system and a possible solution was a complete historical rebuild of the aggregations followed by a complete reload of the system data after every hierarchy change. Lucky for us, the data set was not too big and was not expected to grow too much.
Another solution would be to use a Slowly Changing Dimension, so we can show the hierarchy as it was when the aggregations were made – this would improve the analytical value of the data because business users would be able to see why a Business Unit like Asia was Red in the context of an outdated hierarchy.
The best solution would be to build our own aggregations and work with those, as then we are gaining the ultimate flexibility to represent data in the exact way the business users need it, but unfortunately sometimes it is very hard to convince them that spending twice the funds on the same problem is a good thing.
 
Filtering Unneeded Dimension Members in PerformancePoint Filters
Published on SQLServerCentral on the 2008-11-14:
http://www.sqlservercentral.com/articles/PerformancePoint/64565/
Sometimes we need to utilise dynamic dimension security in an Analysis Services solution and we also need to display dynamically the allowed members in a PerformancePoint filter.
In case our hierarchy is multi-level and in case we can expect to have security on multiple levels, PerformancePoint will display the full path upwards to the root member of the dimension. So, in the case where in a Business Unit hierarchy we have access to some third level members, in PerformancePoint we will see all their parents. In example if our hierarchy looks like this:
All Business Units
-Europe
–UK
—-France
—-Bulgaria
-North America
—-USA
—-Canada
and we give someone access to France, they will in fact see:
All Business Units
-Europe
—-France
Indeed, when they select All Business Units or Europe they will still see only France data but this may be confusing. To eliminate the top levels we need to change the way we create our PerformancePoint filter.
To achieve this, first we need to create a MDX Query filter. For more information about PerformancePoint filters and dynamic dimension security you can read the following brilliant post on Nick Barclay’s blog: PPS Data Connection Security with CustomData. Nick explains how to set up PerformancePoint to work with Analysis Services dynamic dimension security and related topics. I will now concentrate on actually filtering the members of the already set-up MDX Query filter.
Instead of requesting all dimension members with a simple statement like:
DESCENDANTS([Business].[Business Hierarchy].Members,, SELF_AND_AFTER)
we can write some fairly simple MDX which means:
Get me the descendants of all dimension members whose ascendants (excluding themselves) have no more than one child.
And the MDX code is:
DESCENDANTS( FILTER([Business].[Business Hierarchy].Members AS a, ((FILTER(ASCENDANTS(a.CurrentMember) AS a_asc, a_asc.CurrentMember.CHILDREN.Count > 1).Count = 1) And a.CurrentMember.Children.Count > 1) Or ((FILTER(ASCENDANTS(a.CurrentMember) AS a_asc, a_asc.CurrentMember.CHILDREN.Count > 1).Count = 0) And a.CurrentMember.Children.Count = 0)),,SELF_AND_AFTER)
The result will be France being displayed as the only available member in the PerformancePoint drop-down. Also, if we had two different allowed members, the code above would show us the top common parent of the two. Therefore, if we had France and Canada as the allowed set of dimension members, the drop-down would consist of the following hierarchy:
All Business Units
-Europe
—-France
-North America
—-Canada
thus satisfying our requirements.
 
Excel and MDX – Report Authoring Tips
Excel is a powerful BI tool which is often overlooked as an inferior alternative to Reporting Services for distributing reports because of its potential for causing chaotic mess in organisations. As a number of speakers on the Australian Tech.Ed 2008 pointed out, replacing Excel is often the goal of many organisations when implementing BI solutions and using Excel as a main reporting tool can be often misunderstood by prospective clients. SharePoint Excel services go a long way in helping organisations control the distribution of Excel files and these in combination with a centralised Data Warehouse and Analysis Services cubes on top of it will no doubt be a competitive solution framework, especially when considering the familiarity of business users with the Microsoft Office suite.
During a recent implementation of a BI solution I was approached with the request to provide certain reports to a small set of business users. As the project budget did not allow for a full Reporting Services solution to be built for them, Excel was appointed as the end-user interface to the OLAP cube. The users were quite impressed by the opportunities that direct OLAP access present to them but were quite unimpressed by the performance they were getting when trying to create complex reports.
After investigating the problem I noticed a pattern – the more dimensions they put on a single axis the slower the reports were generated. Profiling the SSAS server showed that Excel generates quite bad MDX queries. In the case where we put one measure on rows and multiple dimensions on columns the Excel-generated MDX query looks like this:
SELECT {[Measures].[Amount]} ON ROWS,
{NON_EMPTY([Time].[Month].ALLMEMBERS*
[Business].[Business Name].ALLMEMBERS*
…
[Product].[Product Name].ALLMEMBERS)} ON ROWS
FROM OLAP_Cube
What this means is that Excel performs a full cross-join of all dimension members and then applies the NON_EMPTY function on top of this set. If our dimensions are large, this could cause significant issues with the performance of the reports.
Unfortunately it is not possible to replace the query in Excel as it is not exposed to us, and even if we could replace it, it would be pointless as changing the user selections of the dimensions to be displayed would cause it to fail. There are some Excel add-ons available for changing the query string but issues such as distribution of these add-ons and the inability of business users to edit MDX queries diminish the benefits of using them.
While waiting for an optimised query generator in Excel, we can advise business users of ways to optimise their reports themselves. In my case these were:
- Consider using report parameters instead of placing all dimensions on rows or columns in a pivot table. This will place them in the WHERE clause of the query instead of the SELECT clause and will not burden the cross-join part of it.
- Spread the dimensions as evenly as possible between rows and columns. Having 6 dimensions on one row is worse than having 3 on rows and 3 on columns as the cross-joins will generate smaller sets for NON_EMPTY to filter and ultimately will improve performance.
- Consider using less dimensions – if possible split the reports on multiple sheets. This is not always possible but it is better for the users to keep their report-making simple.
In Reporting Services we can write our own query which can be optimised by “cleaning” the cross-join set of empty members in the following manner:
NONEMPTY(NONEMPTY(DimA * DimB) * DimC))
As we cannot do this in Excel, the best way to improve performance is to advise the users against overcomplicating their reports.
 
