3×3 Worst Development Practices 2010 with SSIS, SSAS and SSRS

December 12th, 2010

There are some practices which are outright bad and some which are sometimes bad. I will try to make a list of top three outright worst practices per the main components of SSIS, SSAS and SSRS which I have annoyed me most throughout 2010 (and previous years). I am sure there are many more, but these are ones which I have witnessed often enough to deserve some attention.

SSIS

Using default names for all components in a control flow and a data flow

I really dislike opening projects and seeing only generic names. I have built some in the past because of insufficient time to document everything, but I hated creating them and later leaving them like that. I have also seen developers with time on their hands carelessly omitting them. The worst practice is to actually rename them to an equally unintelligible, if better, language. Many ETL-out-of-the-box tools actually do that and make subsequent maintenance much harder.

Performing joins with Merge Join

Chaining 5 Merge Joins and placing Sort components between them is something SQL Server does faster and with much less resources. Therefore doing joins in the database is a no-brainer even if your preference lies with a fully SSIS-idised process.

Not using configurations

Lucky, in Denali we are getting a completely revamped project configurations approach. Even if imperfect, Package Configurations are a must for every SSIS project. Migrating projects, changing databases, setting variables are just too good to miss.

SSAS

Using MDX for everything

MDX has an aura of a language which allows you to do everything much faster than SQL. Well, this is not entirely true and in the majority of cases a change in the design will allow avoidance of MDX, which in turn removes a whole layer of complexity allowing tremendous improvements over any programmatical approach.

Importing too much dimensional data

Often developers underestimate the impact of unnecessarily large dimensions with many useless attributes. Here the best approach is to add more as you need them rather than import everything because we can. Cubes are smaller and less complex with less attributes, processing is faster, and the most important thing is that the users actually like having less clutter.

Leaving NULLs in the relational database

SSAS does not like NULLs. I have not seen a case when NULLs are necessary in dimension tables. While I would leave a NULL for a measure when there is no data for it, too many of these can be quite harmful and it may be a case of a bloated measure group, as well.

SSRS

Storing business logic in report queries

While we can, we should never store SQL and MDX logic which can be implemented in stored procedures or calculated measures in the reports. Maintaining SSRS queries is much harder and less transparent. There are performance implications around caching, too.

Merging data sets in SSRS

This is hard to be done, and when it is, it is always bad. Data sets should be merged in the queries – not in the reports.

Building massive reports

In many organizations reports are seen as a transport mechanism to Excel. Users generate large, hundreds of thousands of rows report with a large amount of columns and sometimes even basic calculations. All this can be done simply and easily on the database level without going through SSRS.

SSAS, SSIS, SSRS , , ,

 

Why Choose PowerPivot?

December 11th, 2010

If you are on the market for self-service, or in-memory BI tools you have some options. You have to consider functionality, cost and the future. If you are a SQL Server and/or SharePoint and/or Microsoft Office user, PowerPivot should be a top-of-the-list contender. I will discuss a few points to show why.

Cost

First and foremost, it is cheap. To utilise PowerPivot you need Excel and possibly SharePoint. If you need to empower your users with the capability to expand Excel and do heavy analytics on a workstation PC, PowerPivot is essentially free for Excel. If you want to let them collaborate and share their work, then SharePoint comes to the mix. If you do have SharePoint Enterprise in your organisation, then PowerPivot is, again, free. A free self-service BI platform – not a bad option, is it? Surely the cheapest.

Functionality

Functionality-wise, the outstanding “feature” is the integration with Excel. How many other self-service BI tools out there allow you a seamless integration with Excel? When Excel users become PowerPivot users, they have all the capabilities of Excel, plus PowerPivot. They can pick Top/Bottom 10, flick to percentage representation of values, use the charting functionality of Excel the way they are used to; they can also utilise the rest of the Excel functionality they love. With a little bit of DAX knowledge they can build new calculations on top of massive data sets. Writing [Quantity]*[Price] gives us [Sales Amount]. Simple, isn’t it? Furthermore, if you prepare a nice, well-referenced datamart for them you do not need to worry about the lack of knowledge of SQL – all modelling gets done in Excel in a very familiar for users environment – spreadsheets with rows and columns. Data can be previewed, filtered and ordered; new columns can be added with Excel-like syntax – a paradise for moderately Excel-savvy users.

Once ready with the model, if users want to share their work they can simply publish to SharePoint. From there other users can either browse the workbooks (if they have a browser – right…), or if they are interested in more on-the-fly analysis they can connect to the workbooks through Excel and slice/dice the data just the way they do with SSAS cubes. No need for client installations and no need for powerful workstations. In fact, to connect to published models they only need Excel 2003 and Windows XP. The minimum hardware requirements for those are, well…minimum.

As for IT Services departments – they still can manage the situation. They can monitor, advise and service – precisely their purpose. While a standalone, isolated and incompatible server could be a problem, the sort of manageability and visibility BISM and PowerPivot offer will, no doubt, appeal to ITS.

The Future

Let’s zap to the future. Microsoft has made a strong commitment for a multitude of future enhancements. The models will be available in SQL Server Analysis Services and DAX will get massively enhanced. There will be numerous enhancements on the modelling side, querying side, engine side, etc, etc – all in all – BISM and PowerPivot have a great future. In fact, from what I can sense, BISM and DAX will become more and more powerful and if you commit to spending your money on another product I can guarantee that you will be thinking back and regret this step, especially if you like Excel and SQL Server. Just think back of where SSAS was in 2000 and where it is now. Well, by what it looks like at this moment, we’ll have a similar situation after Denali and ahead of it. Better than ever will be the integration between the components in the Microsoft BI stack, too. With Crescent and SSRS reading BISM models easier, the pegs will fit together even more seamlessly.

Am I advertising Microsoft BI? Yes. Am I objective – maybe no, but allow me to be excited about it. From the poll on the top right of this blog, where I asked how you feel about the recent BISM announcements, I noticed that many people are either “Angry”, or “Excited” about the new developments around SSAS. I wonder what will the reactions be when Denali ships and more people get hand-on experience with BISM. Will there be as many “Angry” people out there? I doubt it. Will everyone get ecstatic – well, maybe not, but I believe that a lot of users will get more Excited/Happy about it.

PowerPivot ,

 

Examining SSAS Write-Back – Updates and Transactions

December 1st, 2010

SSAS write-back is a very under-used feature of SSAS in my view. It can be very powerful when we have a good understanding of its capabilities. A good first step is to explore how it works through some MDX.

For the purpose of this article, I am using a very simple cube, which has two dimensions and one measure. I have enabled MOLAP write-back for its only partition and I have processed it, thus creating its Write Table. I will explore what happens when we issue different transaction management and UPDATE CUBE statements in varying order.

The first thing we should understand is that SSAS utilises transaction management for the write-back operations. There are three statements which we can use for this:

BEGIN TRAN[SACTION]

COMMIT TRAN[SACTION]

ROLLBACK TRAN[SACTION]

The [SACTION] part is optional and we can freely omit it.

To perform the actual updates, we use an UPDATE statement. You can find its definition here:

http://technet.microsoft.com/en-us/library/ms145488.aspx

It is worth to note that the BEGIN TRAN statement is not necessary since UPDATE issues a BEGIN TRAN statement implicitly. However, to complete a transaction we must either execute a COMMIT or a ROLLBACK statement. Also, important note here is that the transaction scope is within a session. An uncommitted transaction is visible only in its session. Once a COMMIT is called, the transaction gets written to the Write Table, and becomes visible for all cube statements (i.e. to all sessions).

So, let’s work with an MDX sample statement:

BEGIN TRANSACTION

UPDATE CUBE [Service Request] SET
([Service Start Date].[FiscalYear].[Fiscal Period].&[2011]&[2011-Q4]&[2011-13],
[Service End Date].[FiscalYear].[Fiscal Period].&[2011]&[2011-Q4]&[2011-13],
[Measures].[Measure 1 Sum])=10
USE_EQUAL_ALLOCATION

SELECT
{
([Service Start Date].[FiscalYear].[Fiscal Period].&[2011]&[2011-Q4]&[2011-13],
[Service End Date].[FiscalYear].[Fiscal Period].&[2011]&[2011-Q4]&[2011-13],
[Measures].[Measure 1 Sum])
} ON 0
FROM [Service Request]

ROLLBACK TRAN

COMMIT TRAN

Note that I am running this in an SSMS query window and I cannot execute multiple statements. Instead, we can select each statement and then execute it. Since we are in the same session, we can do this without losing the effect of the previously executed one.

First, I run the BEGIN TRANSACTION statement. It executes successfully:

Then, I run the UPDATE CUBE statement. It also executes successfully. Now, when I verify its success with the SELECT, I get 10 as the value for the measure. After this, we can check if anything got to the Write Table:

 

 

It is empty, because the transaction has not yet been committed. Also, if we browse the cube in BIDS we will not see the value:

This shows us that the value is only there in the current session where out transaction is active.

If we then call the COMMIT TRAN statement, we get a slight delay and we are notified that the statement has been executed successfully. At the same time, I run SQL Server Profiler and I captured the trace of what happened. As we can see, (because the write-back partition is MOLAP), the statement got written to our table and the write-back partition got processed. If we have another look at the Write Table we will notice that we have one row there. Also, we can see the value for the measure in BIDS:

 

 

 

ROLLBACK is less interesting – it just discards the UPDATEd values and the cube goes back to the same state it was in before we started manipulating it.

Another interesting thing which I already mentioned is that if we run UPDATE before we have issues a BEGIN TRANSACTION (note I am using TRAN and TRANSACTION interchangeably in my script and text), a transaction is implicitly started but not committed. Therefore, we cannot just call UDPATE and hope to see the changes in the Write Table. We must call COMMIT first.

This behaviour explains how Excel 2010 does write-back. When we change values it issues UDPATE statements and once we “Publish” the changes it issues a COMMIT TRAN. This is where we have the largest wait overhead in MOLAP mode. The comparison between MOLAP and ROLAP write-back is really for another article, whcih I hope to have the time to write soon.

SSAS ,

 

PowerPivot Books

November 25th, 2010

Teo Lachev just published a review of Marco Russo’s and Alberto Ferrari’s book PowerPivot for Excel 2010: Give Your Data Meaning. Instead of a full-blown review, I will add a visual one:

 

Yes, I ordered three copies from Amazon. Ok, they are not all mine, but you can see what I think about this book…

PowerPivot , , ,

 

Analysing Twitter Trends

November 22nd, 2010

Recently, I built a pilot for a demo in Avanade based on Twitter trends entirely using Microsoft SQL Server 2008 R2 and Microsoft Office. Since we are hearing about “Social BI”, and other “Social” aspects of software, this may help introduce an approach to this topic from Microsoft BI point of view. 

The basic idea is: 

  1. Connect to search.twitter.com and get a feed which contains the results from a search
  2. Do #1 every now and then (in my case – every 10 seconds)
  3. Pick the last 15 feed items (tweets), and compare them to the last 15 items to determine any new tweets
  4. Store all new tweets in SQL Server
  5. Build a few dimensions (author, time, etc.)
  6. Build a categorisation dimension. In my case I am using Full-Text search to find certain keywords around my search strings (e.g. Windows – bad/good/excellent/terrible), so I can analyse the feedback the search term is receiving
  7. Build a SSAS cube on top of these dimensions and the feed history
  8. Build an Excel pivot-based report on top of the SSAS cube

Now, for the first 4 points, we can use SSIS. We cannot natively connect to RSS/ATOM feeds in SSIS, but we can easily build a Script Transform as a data source, which connects and pulls the feed items in a data flow. I found a very useful C# reference here: 

http://blogs.msdn.com/b/mattm/archive/2009/02/19/read-an-rss-feed-from-ssis.aspx 

And, I customised it a bit (99% is the same as in the example above), so it works with Twitter and also, so it does not include irrelevant info. Of course, depending on what we need to achieve we can change the code to support all RSS/ATOM features. A sample of my script is below: 

/* Microsoft SQL Server Integration Services Script Component
*  Write scripts using Microsoft Visual C# 2008.
*  ScriptMain is the entry point class of the script.*/

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.ServiceModel.Web;
using System.ServiceModel.Syndication;
using System.Xml;
using System.Text;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    private string url = string.Empty;
    private SyndicationFeed feed = null;
    private XmlReader reader = null;
   public override void PreExecute()
    {
     //base.PreExecute();
   //here we are using a SSIS variable for the feed address, which we can populate
   //from a database table, thus providing an easily customisable interface to our
   //package
        reader = XmlReader.Create(Variables.Feed1);
        feed = SyndicationFeed.Load(reader);
    }
    public override void PostExecute()
    {
        base.PostExecute();
        reader.Close();
    }
    public override void CreateNewOutputRows()
    {
        if (feed != null)
        {
            foreach (var item in feed.Items)
            {
                Output0Buffer.AddRow();
                Output0Buffer.id = item.Id;
                Output0Buffer.published = item.PublishDate.DateTime;
                Output0Buffer.title = item.Title.Text;
                Output0Buffer.authorname = item.Authors[0].Name;
            }
            Output0Buffer.SetEndOfRowset();
        }
    }
}

Once we can connect to our source, everything becomes pretty much a run-of-the-mill standard ETL. My package at present supports only 5 feeds and they get pulls simultaneously. This is of course just a demo and is easy to extend as required. A natural way to improve this solution is to implement some sort of logic (i.e. a looping sequence container), so you harvest more feeds. The importantly convenient feature of the feeds is their ID, which is very useful for comparing items. Apart from it, another important property of our tweets is that they may contain one or more keyword, which we are searching for. Therefore, we would like to count them for each feed (sometimes resulting in the same feed item getting counted 3 or more times). A good example of this is if we are searching for Microsoft, Vista and Windows7, and we receive something like: “Microsoft is doing well with Windows7 – it is so much better than Vista” – here we want to know that all three terms got hit and we want to show that there was a Tweet for each (multiplying the total count by 3). 

The following is an example of my SSIS items: 

 Control Flow: 

 

Data Flow:

 

I am doing a little trick here to compare my last to present feed items – tagging them as old in the start of the control flow, and then deleting old ones in the end. This is sort of a snapshot processing logic and works just fine when running only this package many times in a row (which I am doing in SQL Server Agent).The output of this package is one table with all feed items.

We don’t need much more for a simple implementation such as WardyIT’s #sqlpass Trends:

http://www.wardyit.com/sqlpass/

However, once we decide to progress a bit further, we can achieve much more. As I mentioned earlier, we can use something like Free-Text Search to analyse the content of the feeds. Utilising CONTAINS(‘something’ NEAR something), we can find keywords around the search terms. This can be very useful if we want to make a little bit of sense from the raw tweets. I prepared a table, which categorises 20 words in Positive/Negative and one level below (Strongly/Moderately/Slightly Positive or Negative) and I am matching these to the tweets. This allows me to see trends in like/dislike logic for each search term. For ad-hoc analysis and easy consumption in Excel, I also created the following SSAS solution:

 

I am processing my cube every hour, which allows me to see fairly up-to-date results from it. Alternatively, depending on requirements you can easily switch it to ROLAP (considering the very low complexity of the cube and the relatively low amount of fact data) and get real-time analytics.Of course, the result matters the most and this is a little (un)useful dashboard in Excel, which showcases a few of the available analytical options we have from this very simple implementation:

 

Please don’t slam me for the choice of search terms and some visualisation items (e.g. the line chart x-axis) – I know that some are not quite relevant, but it is just a quick demo, right? J

SSAS, SSIS , , , , ,