Archive

Author Archive

Quick Wins and Quick Losses

April 27th, 2010
Comments Off

I have been wondering of late – what a “Quick Win” implies and means. From my experience with BI projects, very often because of fairly uneducated target consumers and fierce competition, companies deliver quick and dirty solutions, hoping to attract attention and then sell more services. This practice is often referred to as a “Quick Win”. Of course, the actual intention is not bad, but when poorly executed it firstly wastes clients’ money and time and then also discourages them from pursuing a BI solution any further. In the case of a failure another term – a “Quick Loss” is more appropriate but never used.

 So, what determines the outcome?

 1. Scope

 Managing the scope is absolutely essential in a Quick Win scenario. We must convince the client that all the advanced functionality can be safely pushed back to the next full-blown release when we would have the time and money to build it properly. If we extend our Quick Win to build Dynamic Dimension Security, partition our cube, clean up the data, build dimension managing capabilities (MDS comes to mind), etc. we will most likely fail or at least jeopardise our chances of success. In this first crucial phase we need to concentrate on the core – building simple and robust system. Instead of having the usual scope creep, we should actually try to push for the opposite – scope cuts. Of course, this has to be carefully balanced with the actual need as cutting too much will leave us with an unusable result.

 2. Quality

 In my opinion, if we deliver a poor quality solution it will fail and no attempts to resuscitate it later would have any decent chance of success. So, when we are scoping out our project we must make sure we have time to build it well. Shortcuts would quite likely make us scrap it altogether at a later point of time and then rebuild it properly. Also, if we build an OLAP solution which is slow and buggy, we would hardly be able to convince our client that the next phase of the project will be any better.

 3. Analysis and Design

 Yes, it is a Quick Win and yes, it is a BI solution, but even these (contrary to some opinions) do need analysis and design. Spending a bit of time with the business users, the source system and with the server engineers can greatly improve the development experience. Without a design phase, it is hard to maintain a strict scope and attain high quality. A brief design document helps with remembering why we have done something the way we have and decoupling us (as developers) from the solution.

 4. Task Management

 I am not a project manager. However, when alone on a small project I find it very useful to track my progress and objectives by building a basic spreadsheet showing Tasks, Description, Time Allocated, etc. This way I can easily comprehend and explain how my development is going, and ask for more time before I hit a deadline if required. Also, a task sheet helps me to switch between tasks, or allocate them to other developers.

 5. Managing Client Expectations

 I have heard this phrase many times before, and it has usually been misused. Managing client expectations does not actually mean lying to the clients, neither it means promising too much. In my opinion, managing client expectations means exactly what it sounds like – don’t make your client too excited with what you cannot deliver and make them expect exactly what you can. It is good to keep the clients happy and optimistic for the future, but making them enthusiastic and then crushing their enthusiasm with a dud solution is unprofessional.

 This issue has been haunting me for a while. I have definitely not exhausted the topic and I am sure that many developers can add to this list their own thoughts, but I just hope I can spare some trouble or offer some hints for the less experienced readers of this blog.

Other

 

Passing unCONSTRAINED Set and Member parameters between reports in Reporting Services

April 15th, 2010

By default SSRS MDX queries get a StrToMember or StrToSet functions with a CONSTRAINED flag. However, many developers do not quite know why it is there or what it actually does. Books-On-Line contains this statements:

StrToMember

  • When the CONSTRAINED flag is used, the member name must be directly resolvable to a qualified or unqualified member name. This flag is used to reduce the risk of injection attacks via the specified string. If a string is provided that is not directly resolvable to a qualified or unqualified member name, the following error appears: “The restrictions imposed by the CONSTRAINED flag in the STRTOMEMBER function were violated.”
  • When the CONSTRAINED flag is not used, the specified member can resolve either directly to a member name or can resolve to an MDX expression that resolves to a name.
  • StrToSet

  • When the CONSTRAINED flag is used, the set specification must contain qualified or unqualified member names or a set of tuples containing qualified or unqualified member names enclosed by braces {}. This flag is used to reduce the risk of injection attacks via the specified string. If a string is provided that is not directly resolvable to qualified or unqualified member names, the following error appears: “The restrictions imposed by the CONSTRAINED flag in the STRTOSET function were violated.”
  • When the CONSTRAINED flag is not used, the specified set specification can resolve to a valid Multidimensional Expressions (MDX) expression that returns a set.
  • Therefore, if you have a CONSTRAINED flag you have to specify exact members or sets (e.g. [Date].[Year].[2009], or {[Date].[Year].[2009],[Date].[Year].[2010]}). If you omit the flag, you can pass to the StrToMember an expression, which evaluates to a member (e.g. [Date].[Year].[Year].Members.Item(0)), and to StrToSet an expression, which evaluates to a set (e.g. NONEMPTY([Date].[Year].[Year].Members, [Measures].[Amount]).

    The flexibility which removing CONSTRAINED offers can be quite powerful when passing parameters between reports. In example, we may want to pass a parameter to a drill-through report from two different summary reports, where each of those uses a different subset of dimension members, which in turn can be derived by different set expressions.

    Major drawbacks of using this approach is the severe performance hit it leads to, as well as a possible “MDX injection” vulnerability. Since in most cases we would  be using the passed parameters in a subcube expression or on the slicer axis (WHERE clause), this should not lead to as bad performance as we would get if we would use it inside a calculation. However, when we need to use a parameter directly in a calculated measure, we would be better off avoiding an unCONSTRAINED function. 

     Therefore, we may instead use SetToStr on the summary reports and pass a String parameter to a CONSTRAINED StrToSet function in the drill-through report. This way we are first resolving the set expression once and passing it on to the underlying report as a string. We could do that in a calculated measure returning a String, then passed on as a Field to the drill-through parameter. However, in the rare case where we have many rows travelling from the SSAS server to the SSRS server, this could be slow.

    So, whether we use a StrToSet without CONSTRAINED flag, or a String parameter constructed by a SetToStr function, is dependent on the actual scenario, but it is good to have both options in our arsenal of tools and techniques when we need to implement some non-quite-standard piece of functionality.

    SSAS, SSRS , , ,

     

    Custom Dates for an SSIS SCD Task

    April 12th, 2010
    Comments Off

    Just last weekend I implemented a number of Slowly Changing Dimensions in a SQL Server 2005 based project. For the large ones I wrote some SQL code, but for the smaller dimensions, I just decided to use the SSIS SCD task. Since the wizard does most of the work, there is not much beyond it I have done in the past with that component. This time, though we decided to have custom default EffectiveTo dates for the dimensions – 9999-12-31 instead of the default for the SCD task NULL. The wizard, however is not customisable and some manual teaks need to be done before it can handle custom dates. So, I decided to share these since there is not much around on this topic (or at least I could not find any particular references). There is a customisable component on Codeplex – Kimball Method SCD Component, however I could not use it as no custom tools could be used for an unknown reason.

    I created a quick mock up of a dimension table for demonstration purposes:

     

    Then, I created a SCD task in SSIS with one historical and one changing attribute. For Start and End dates I used my EffectiveFromDateId and EffectiveToDateId and got them populated with [System::StartTime]. Unfortunately, the SSIS task does not allow specifying custom values for the default To date, and uses NULL by default. To change it, we have to modify the following three dataflow components:

    We can modify these through the Advanced Editor (right-click). For the actual Slowly Changing Dimension task we have make the following change:

    Then we also have to modify the SQL script  for the two OLE DB commands (again through the advanced editor). For the Changing Attributes Updates Output:

    And a similar change to the Historical Attributes Inserts Output:

    After applying these three changes, we are ready to run the task:

    As we have three new rows, they get inserted in the target dimension table. As they are all active, their EffectiveToDates are the default values of 9999-12-31:

    Of course, if we decide to change anything through the SSIS SCD wizard, all of these will be lost and we have to redo these changes once again…

    SSIS , ,

     

    When Not To Write MDX and When Not To Use Analysis Services

    March 21st, 2010

    MDX is a great way to achieve some objectives easily and efficiently. However, there are some things better done in other ways. I will first discuss three distinct mistakes, which designers and developers tend to make when working on a typical BI solution.

    1. Leaf-Level Calculations

    This is by far the most common one. Both on the MSDN SQL Server forums, and in practice – developers try building calculations in MDX on the leaf level of some dimensions and usually hit severe performance problems. While it is usually possible to build an MDX expression to achieve the goal, it is usually much simpler and way better for performance to just do the calculation either in the ETL, or in the DSV (either as a Named Calculation, or as a part of a Named Query). This avoids the need for the query engine to perform a large number of calculations every time we request an aggregate.

    2. Mocking Joins

    I have discussed this in a previous post, where I am explaining how we can access a measure group through a dimension, which is not directly related to it (but rather related to it through another measure group and another dimension). Well, instead of doing this, we can simply build a many-to-many relationship between the two and avoid the whole MDX bit.

    3. Business Logic over a large dimension

    MDX is brilliant for implementing business logic. Unless it needs to operate over millions of dimension members every time a calculation is being requested. In example, recently I tried building a bit of business logic, which needed to order a dimension over a measure, and get the member with a largest value for each member of another dimension with hundreds of thousands of members. On top of it there were other calculations doing similar logic and the end result was not quite what was expected. Even though the MDX was really neat and achieved the purpose in 3-4 lines, I moved the logic back to the ETL (which was quite a bit more complex) because of performance. So, in general, I would not advise in favour of using MDX when to retrieve the result, the query engine needs to go through a lot of cells (in my case quite a few million), especially when ordering is required.

    A more fundamental mistake is using Analysis Services in a solution that does not really need it. Two severe and common, in my opinion mistakes are:

    1. Data Dumps

    Why build a cube when the sole purpose of the project is to allow users to access the underlying data? Unfortunately, sometimes Analysis Services is seen as a silver bullet for everything. If the end report contains a massive amount of data and a key requirement is for it to export nicely to CSV, why bother – just export the data to CSV, zip it up and let the users download it. As far as I know, this can be achieved very easily in a number of other ways. Especially considering the amount of effort and skills needed to build and maintain a SSAS solution.

    2. No Aggregations

    Another way SSAS gets misused is when a lot of textual data gets stored in a large number of big dimensions, and those get linked in a “fact table”. I have previously worked on a solution where there were no measure columns in the fact table at all and the cube was used to retrieve information about dimension members of the largest dimension called “Member”, containing 4-5 million customers. The rest were dimensions like “Sign Up Date”, “Date Suspended”, “Country of Birth”, “Age Band”, etc. In the end, the main report consisted of the information about the members. No data was aggregated apart from a simple count. The entire OLAP solution could have been replaced by a SQL query with a WHERE clause and an index.

    I am sure that there are many other cases when SSAS gets misused. A solution utilising SSAS and MDX properly can be very powerful and elegant. However, sometimes because of poor design choices it gets discarded as inadequate. Don’t blame the tools and the technology if your cube is slow – it is most likely a problem with either your design or the way you have developed your solution.

    SSAS ,

     

    Obtaining Microsoft BI Certification

    February 24th, 2010

    I have been a little busy recently with getting certified with Microsoft. I am quite happy to announce I have passed both 70-448 and 70-452 exams in the last couple of weeks without reading a single page of preparation material. Now I am MCP, MCTS and MCITP in SQL Server 2008 BI.

    I have always wondered how important and relevant these certificates are. The common opinion around the industry professionals seems to be that experience is what really counts and certifications are for people without experience who are trying to get into the profession. I did it the other way and I am not sorry the least bit for it. An overview of my experience in regards to the two certifications:

    MCTS: Microsoft SQL Server 2008, Business Intelligence Development and Maintenance
    Exam: 70-448 Microsoft SQL Server 2008, Business Intelligence Development and Maintenance

    I passed this one with 857 (out of 1000). Since it was my first attempt at certification I did go through a few sample questions from a prep book based on MeasureUp, and a friend also flicked my way some really poor quality Braindumps, which contained some quite poor terminology and a fair bit of nonsense. On the MeasureUp tests I scored between 65-85%, so I decided to get a Free Second Shot voucher from Prometric and just go with no further preparation. As expected, my Data Mining skills did not quite cover the expectations and I had a few glitches around SSIS maintenance, but as expected I got more than 90% on all the development components and around 80% on all the administration/maintenance ones. After all I am a developer and I am not that experienced with administering BI solutions. So in general, my impression was that the test can be passed without much preparation from fairly inexperienced developers, who read some preparation materials.

    MCITP: Business Intelligence Developer 2008
    Exam: 70-452 PRO Designing a Business Intelligence Infrastructure Using Microsoft SQL Server 2008

    This one was harder, but I passed it with 92% with doing one sample test of 30ish questions before the attempt. I had another Prometric Free Second Shot voucher, so I was not stressing if I would pass or fail. The test had double the amount of questions than the MCTS one and it took me a fir while to go through all of them. Again – Data Mining was my weakest part (less than 50% right), while I managed to score 100% on SSRS, SSAS and “Designing the BI Architecture” part. SSIS was almost perfect too. Now, if a fairly inexpreienced developer passes MCTS, this test will definitely cause a lot more headaches. The questions are much more practical and much higher degree of relevant experience would be required to pass (or much more reading).

    In general, both of the tests are not easy and I do believe that if one can pass them without preparation (from experience), he is quite prepared to tackle the design, implementation and administration of Microsoft BI solutions. The Free Second Shot vouchers are also great for eliminating stress to some degree. However, on the not-so-good side, the tests can be passed with no experience and because the questions do follow a certain pattern and do not allow for much freedom in choosing the correct answer (short answer ones could be better), I think that there is a moderate chance of inexperienced and not that knowledgeable people to study, pass and then immediately forget the subject matter.

    Still, I have no idea how much my career will benefit from these certifications, however I did get a few PDF certificates signed by Steven A. Ballmer to wave at sutiable occasions :)

    Other , , , ,