What Exists and What is Empty in MDX

February 12th, 2012
Comments Off

After reading my chapter “Managing Context in MDX” in MVP Deep Dives vol2 I noticed that I should have probably discussed one extra topic – the difference between cells which cannot exist, and such which can, but are empty.

The basic idea is that in SSAS cubes we have the notion of empty space. However, there is an important difference between empty intersections which are possible but result in nulls when queried and “impossible” intersections between hierarchies in the same dimension.

If we look at the Date dimension in Adventure Works we can see that we have month and year attributes. Months in 2007 appear only with the year 2007 in the dimension. Therefore, the combination between January 2005 and CY 2007 is not possible and consequentially it does not and cannot exist in our cube. In contrast, if we query for Clothing products in 2007 and we place the Month attribute on rows, we can see that there has been no Clothing items sold in the first few months of 2007:

Here we are dealing with possible, but empty cells – the January 2007 to June 2007 rows show empty intersections in the cube.

Why is this important? Well, it means that if we try to get the number of months with Clothing sales in 2007 with a query like:

we wrongly get 12, not 6. We need a function which can “detect” empty cells – not unnecessarily enforce the current context. An excellent function for this purpose is NonEmpty (or Exists):

Here we get the expected number – 6 (since only 6 months in 2007 have Internet Sales Amount against them).

A similar example can be shown the other way around. The following query returns 37, which is the total number of members of the month attribute with data against them:

This is because the NonEmpty function does not enforce the current context on its first argument and it gives us the members with data only (omitting NonEmpty results in 39, because we have two months with no sales whatsoever). Existing does, so if we add Existing to NonEmpty we get the expected count of 12 (as all months have had a sale in 2007 if we take all categories into account):

Here we eliminated the impossible intersections between months not in 2007 and year 2007.

SSAS , ,

 

SSAS Locale Identifier Bug

January 29th, 2012

These days I have two little problems with SSAS. One is really small – I really don’t like how the Process dialog in BIDS (2008 R2) stays blank during the process operation and how I need to click on “Stop” after it finishes. The fact that such a bug had gotten into the product and has survived for so long when it happens 80-90% of the time doesn’t speak very well for the QA process at Microsoft. However, I can understand that the impact of this bug would have been deemed very insignificant as it impacts developers only and does not prevent them from doing their job. By the way, if you are also annoyed by the blank process dialog in the latest version of SQL Server, you will have to wait until the next release (2012) until you get a fix:

http://connect.microsoft.com/SQLServer/feedback/details/536543/ssas-process-progress-window-blank-no-details

The other bug is far more significant. It not only impair developers’ ability to build some features, but is also highly visible to all Excel users. However, it is hard(er) to reproduce:

http://connect.microsoft.com/SQLServer/feedback/details/484146/getting-an-error-when-trying-to-browse-a-cube

Still, it seems like developers from {[World].[Countries].Members}-{[World].[Countries].[USA]} hit it all the time. I am speaking about the Locale Identifier bug. The most common occurrence is when drilling through to detail in Excel. After the drill-through action has been initiated, Excel shows a message box with a message talking about the XML Parser and how the Locale Identifier cannot be overwritten -

“XML for Analysis parser: The LocaleIdentifier property is not overwritable and cannot be assigned a new value.”

The cause is simple (as confirmed by the SSAS team at Microsoft and Akshai Mirchandani in particular): once we open a session we cannot overwrite the locale. The mystery is around the cause for Excel to do something like that. Noting that Excel is not the only offender, as I have also seen the same error message thrown by SQL Server Profiler, and Chris Webb has seen it with the Cube Browser in BIDS:

http://connect.microsoft.com/SQLServer/feedback/details/484146/getting-an-error-when-trying-to-browse-a-cube

Since the bug has been reported to Microsoft, we are now hopeful that a fix will appear at some point. Until then you can try the following workaround, courtesy of my friends and former colleagues Matthew Ward and Paul Hales:

- Switch the Windows Region and Language Format in Control Panel to English (United States):

- Switch it back to whatever it was before

Now the “bug” should be fixed for that machine. For example, I had my new Windows 7 workstation configured to use Australian formats. After I got the Locale Identifier error message in Profiler, I switched the formats to USA. The bug disappeared and I could profile SSAS. After that I switched Windows back to the original English (Australia) format…and nothing broke. I could still use Profiler and drill-through in Excel.

Another notice. Greg Galloway has released a new version (0.7.4) of the OLAP Pivot Extensions add-in for Excel. In case you have been experiencing a Locale Identifier problem in SSAS while using older versions of the add-in, please download the new one and let Greg know (e.g. on the discussions page on CodePlex) if the new release fies your problem.

Thanks to everyone involved in confirming and testing different fixes. Ideally, I would like to see Microsoft fixing this on the server side of things which would allow us to easily patch up all existing systems exhibiting the problem.

Since all Microsoft Connect item related to this bug have been closed, I opened a new one, so you can vote in order to prioritise this issue:

https://connect.microsoft.com/SQLServer/feedback/details/721372/locale-identifier-bug

Note (2012-03-21): The issue seems to be with Windows Vista+ as confirmed by Michael Kaplan – http://blogs.msdn.com/b/michkap/archive/2010/03/19/9980203.aspx

SSAS ,

 

Load Testing BI Solutions – When?

December 24th, 2011
Comments Off

This year I came across two very different BI projects which had the common non-functional requirement to prove that they would handle an expected spike in the report generation load. Funny enough, in both cases the project teams got very concerned and came up with wildly inaccurate predictions of how many concurrent users we should be testing for. In the first case the problem was with the perception of “thousands of users”, while in the second, the team interpreted “monthly users” as “concurrent users”. The annoying part was that in the first case the team planned on building an ultra-massively overcomplicated queuing system to handle those spikes, and in the second case they were thinking of completely scrapping the ad-hoc functionality in the solution and resorting to report extracts distributed by email. The unreasonable expectations of the load lead to bad design choices – this is why it is important to remain calm and first check whether there is a problem at all.

Firstly, let’s agree that we are measuring report requests. To begin, we should know how many requests we get per a period of time (e.g. a month), and then how long it takes to generate a report. A typical scenario would be:

  • 1,000,000 report requests per month
  • 2 seconds to generate a report on average

What we need to do now if apply a bit of math:

1,000,000 / 20 = 50,000 requests per day (on average)

50,000 / 8 = 6,250 requests per hour (8 hours in a working day)

Since a report takes 2 seconds to generate, we can generate 1,800 reports in one hour. Therefore, with 6,250 requests, we would have 3.47 average concurrent users. Of course, this would be the case if we have a very uniformly split load. In reality this would not happen – instead, we will have peaks and dips in usage. A moderate peak is typically around 3x the average, while a heavy one would be at around 6x the average. To ensure that we can handle such peak periods, we should multiply our average concurrent users by 3 or by 6 depending on our load analysis. Let’s assume we have a very high peak load of 3.47 * 6 = 20.82, or approximately 21 concurrent users. This is the number we need to test in our case. Note that we had 1,000,000 report requests per month, but in our highest peak we expect to have only 21 concurrent users. I have not actually had a project where we have expected to have such a load (in both cases which prompted me to write this post we had between 2000-10000 users per month).

The moral of the story – don’t panic. In most reporting projects the user load is not high enough to warrant a full-scale load testing exercise; next time you hear talking about something like that, instead of rushing to cover unreasonable scenarios, try to calculate and confirm the need first.

Other

 

DataMarket Updates: Speed, Portal and DateStream

December 8th, 2011
Comments Off

It has been an eventful week for the Azure DataMarket. We had three new and exciting (for geeks like me) things happening in that corner of the Microsoft universe:

1. Speed!

There was an update to the Azure DataMarket a few days ago. It was, in my opinion, the best thing Microsoft could have done to their offering – tremendously increase its performance. While the DataMarket was previously plagued by unacceptably slow download speed, now it’s for feed standards blazingly fast. For comparison sake, I used to wait for more than 40 minutes when downloading an approximately 70k rows feed from the DataMarket prior to the update. Now, it is on my machine in around 5 – 8-fold increase in performance! Rumours have it that on faster-than-my-home-ADSL2+-networks we will be experiencing up to 20x better performance. It would be good to hear if this is actually correct for developers on such networks (please comment).

Next, range queries, hopefully…

2. Portal

While before the last couple of days anyone who wanted to publish data on the DataMarket had to contact the Microsoft team via email and ask how to get it done, we have just moved into the self-service space with a new portal allowing publishers to create and manage their feeds. The link to this new portal is:

https://publish.marketplace.windowsazure.com/

And, you can find some very helpful documentation about it here:

http://msdn.microsoft.com/en-us/library/windowsazure/hh563871.aspx

3. DateStream

Finally, I am proud to announce that the great DateStream feed got translated in four more languages:

- Hebrew and Danish – thanks to Rafi Asraf

- German

- Bulgarian

The Italian translation (thanks to Marco Russo) is coming soon too, but missed this release unfortunately.

Feel free to explore them and let me know if anything needs to be changed to make them more correct/useful.

Other , ,

 

SSAS: Multiple SQL Queries in ROLAP Mode

November 28th, 2011
Comments Off

Just recently I was working on a project where I had to build a SSAS ROLAP cube on top of a badly built data mart. Badly built in this case meant one where we encounter multiple referential integrity (RI) issues. Most importantly, the designers ignored the very basic principle that all dimension keys for each row must be present in the respective dimension tables. When in MOLAP mode, SSAS checks for such mismatches during processing. However, when a partition is in ROLAP storage mode, we don’t get a notification that anything is wrong and the cube processing operation succeeds. This situation has some consequences during execution time and I will try to illustrate those in this post and show a solution. Before I begin, I must say that if it wasn’t for Akshai Mirchandani’s (from the Microsoft SSAS dev team) and Greg Galloway‘s help, I would have probably spent quite some time figuring out what is happening. Thanks to them the problem got solved quickly and I got to understand the reason for what is happening.

In terms of set-up, I created two tables in SQL Server: Dim and Fact. The Dim table contained two members A and B, with keys of 1 and 2. Initially, the Fact table had two rows referencing the Dim table – Dim keys of 1 and 2, and a measure column called Amount with 1.0 and 2.0 as the amounts corresponding to A and B. No issues here. After that I created a SSAS solution, corresponding to this simple dimensional model. I switched the partition storage for the cube to ROLAP and processed the SSAS database. After that I ran the following query, which I used for all subsequent examples:

 

 

 

 

 

The result was as expected:

 

 

At the same time I had a SQL Server Profiler trace running, which showed:

 

We can see that SSAS has executed one SQL query retrieving data from the fact table. Nothing unusual thus far.

To spoil the party, I added one more row to the fact table with a dimension key of 3 and Amount of 3. Since I did not add a row in the dimension table with a key of 3, this broke the rules and if I had a foreign key constraint implemented between the fact and the dimension tables I would not have been able to do this. After cleaning the SSAS cache, I ran my query again. The result:

 

 

The actual error was, of course, a missing key. I was not surprised when I saw this on my original project. However, looking at Profiler we see a “weird” sequence of events:

 

SSAS runs multiple queries which result in errors. In this case we can see four of these ExecuteSQL events. All of them are followed by an error in a ReadData event. In this particular case we can see only four ExecuteSQL events. In the real-world, this scenario can get multiple times worse (in my case we saw 4667 queries run against the relational database in a few minutes) leading to a really significant drop in performance.

So, what is happening? According to Akshai, SSAS encounters an error while dealing with the results from the initial SQL query and is trying to recover by sending more queries. In some cases this can result in getting the error in the result set only for some cells.

Luckily, there is an easy way out of this situation (thanks to Greg for providing the tips). SSAS can automatically create an “unknown bucket” for each dimension and can assign to it all measure values which do not correspond to a dimension member. To get this result, we must ensure that each affected partition’s error configuration is set to something similar to:

 

 

 

 

 

 

 

 

 

Note that the KeyErrorAction is ConvertToUnknown, not DiscardRecord (which is the alternative). This must also be coupled with setting up each “incomplete” dimension to include an Unknown member:

 

 

 

 

 

 

 

 

 

 

It does not matter whether the UnknownMember is Visible or Hidden, as long as it is not None.

Back to our scenario. After setting these properties on the dimension and the partition I processed the SSAS database again and executed the query. The result:

 

 

 

and the profiler trace:

 

As we can see we eliminated the multiple queries. If we do not want to see the Unknown amount in the cube we can use a scope assignment:

 

 

Coupled with making the UnknownMember Hidden, we can completely obliterate traces of our underlying RI issues. Unless our users check the numbers, but then we can blame whoever designed the datamart! :)

SSAS , , ,