Archive

Posts Tagged ‘T-SQL’

Number of Weekdays Between Two Dates

August 4th, 2010

There was an old post here describing some T-SQL code for finding the number of weekdays between two dates, which I wrote. It was working fine, so if you have implemented it you have not done anything wrong. However, Jeff Moden from SQL Server Central has written a post a while ago about this same problem and his implementation is a bit cleaner, and thus I would consider it better than mine. So, here is the link:

http://www.sqlservercentral.com/articles/Advanced+Querying/calculatingworkdays/1660/

T-SQL , ,

 

Using the DSV to its Full Potential

July 22nd, 2010

The Data Source View in Analysis Services is a very powerful abstraction of the data source and it can help us overcome some scenarios in an easy and clean way. Many times we look for MDX or programmatic solutions to problems, which can be tackled best in our data. While for complex tasks we would be better off extending the ETL process, some simple ones can and should be implemented in the DSV.

 

As an introduction to the topic I would like to explain briefly what the DSV actually is. It can be conceptualised as a database view on top of the data source. By default all tables which we need for building the Analysis Services database (typically dimensions and facts) are appearing in the DSV as table bindings (exactly as if we do a SELECT * FROM Table). If we have no foreign keys defined in our database, SSAS will not show us the relationships in the DSV. However, we can define logical relationships in the DSV, thus connecting the tables on related columns, which are then used for automatically determining dimension relationships to the measure groups.

 

There are two important ways to modify the DSV, which allow us to add more columns to the existing tables and to modify the way the existing columns are shown:

Named Queries

 

 

 

If we right-click on a table in the DSV, we can select to replace the table with a Named Query. A Named Query is essentially a T-SQL statement, which is equivalent to a database view definition. By utilising Named Queries we can alter the way we see the tables and their column in SSAS. In example, we could concatenate columns, implement CASE logic, etc. Named Queries can be thought of as equivalent to database views.

 

Named Calculations

 

A named calculation is a SQL statement which adds a column to a table without modifying the table binding. It gives us an easy way to define a new column without changing the whole query. The statement defining the column is in T-SQL and it behaves the same way as a new column in a Named Query (or a SELECT statement). If we just want to add one more column (e.g. Display Order, Code+Description concatenation, etc.), we can simply define a Named Calculation. Also, as the name suggests, Named Calculations can be commonly used for defining a leaf-level calculation without modifying a large fact table’s SELECT statement in a Named Query.

 

The column we define here appears in both the DSV table and in the Dimension Designer window:

 

These two DSV functions can be used in many scenarios. Most importantly, there are a few when they yield better performance, faster development and easier maintenance:

Leaf-level calculations

If we have the common requirement to perform leaf-level calculations and then aggregate this up the hierarchy, as opposed to aggregating and then calculating, the best way to do this is in a SQL statement on the fact table. Alternatively, we can do this in and MDX statement:

 SUM(DESCENDANTS(Dim.CurrentMember,,LEAVES), MeasureCalc)

However, it comes at a price. Since SSAS would have to do the calculation for each leaf and then sum this up the hierarchy, this could take a long time to perform. Also, SSAS would not be able to use pre-processed aggregations and the calculations will be done at execution time. To avoid this we could add a new column to the fact table and do the calculation there (in SQL), using the column as a new measure in the cube, which can then be aggregated by SSAS as any other measure. The performance gain is usually substantial and using a Named Query or a Named Calculation should always be the preferred option.

Description Attributes

Often we need to perform a concatenation between different dimension attributes, which we can use as a Description attribute while slicing the cube, or when providing reports from the SSAS database. A very easy way to achieve such a requirement is to use our DSV and concatenate the column we need in a new column in the dimension table, which we can expose as a new attribute in the dimension. A task such as concatenating an Account Code and Account Description into an Account Long Description (i.e. [Account Code] + ‘-‘ + [Account Description]) becomes very easy to implement within the DSV without modifying the ETL or any tables.

Composite Keys

Sometimes we need to build unique keys for attribute column in a dimension. A good example is a Date dimension, which does not have unique keys for non-leaf levels such as Month. Often developers have Month Key of 1,2,3-12. This does not make a good Month key in SSAS as it is not unique for higher levels such as Year, Quarter, etc. There are a number of ways to tackle this common scenario. While the recommended approach would be to build a concatenation between Year-Quarter-Month as a Month Key in the dimension table, we can also achieve this by either selecting all of the columns as key columns for the attribute in the dimension attribute properties. However, this would give us a concatenated key in MDX and this could sometimes be undesirable. A yet simpler and cleaner solution is to concatenate the relevant columns in the DSV by using a Named Query. Instead of the typical

SELECT col1, col2,.., MonthKey, colx, coly, coly FROM DimDate

we can write

SELECT col1, col2,…,YearKey+QuarterKey+MonthKey AS MonthKey, colx, coly, coz FROM DimDate

This way we can use the MonthKey column directly as a key for our Month attribute.

While this is useful for a Date dimension, it can also be useful for any other composite key definition in our dimensions.

Other possible applications of DSV Named Queries and Named Calculations are the implementation of

  • Sort Order attribute, in cases when we need custom sort of the dimension attributes
  • Restricting the data which comes into the cube dynamically based on a certain condition (think of a Date dimension, which includes only relevant periods)
  • Combining tables – by a SQL join
  • Replacing 0s with NULLs (the opposite can be done automatically in SSAS) for our measures

Basically, in a DSV we can “correct” our data to make it suitable for our cube without changing the ETL.

Last but not least, we can also transform tables to conform to a star-schema-like design. If we want to show a proof of concept on top of a normalized OLTP database, we could avoid the ETL complexities, as well as building a datamart, and use SQL to join/split tables in dimension and fact tables, which are suitable for cube development. While this could work in post-POC scenarios, it would be better to take a cautious approach to it as there are many scenarios when it would either not work, or will be too slow.

And a word of warning – your DSV could become slow because of over-use of complex Named Queries. This could be painful when minimising cube processing time is crucial, or when the DSV starts timing out and queries take hours to execute. Luckily, in most cases we can simply move these large queries forward – to the ETL where we have more time and better tools (e.g. SSIS).

SSAS , , , , ,

 

Passing database names to SSIS stored procedures

June 29th, 2010

In the rare cases when we use dynamic SQL and want to use a database name in our code, we are better off avoiding hard-coding them. Unfortunately, I could not find an easy way to access a connection manager’s database name and on my current project the catalog name is not in the SSIS configurations XML file. Therefore, I had to resort to a little trick to pull the database name out and pass it to a stored procedure. In brief we can do the following:

1. Create a user variable database_name

2. Create an Execute SQL Task using the connection manager we want to get the database name from, which does:

 SELECT db_name() AS database_name
 
3. Map the Single Row result set to our database_name variable 

4. Place the task created in the previous step before any components which would be using the variable.

5. Pass the variable to our dynamic SQL stored procedure

 

 

There we go – a stored procedure configured in the SSIS package configurations – a bit better than just hard-coding the name.

SSIS , ,

 

SQL Server DBMS Top 1 Wish List

December 8th, 2009

As an addition to Teo Lachev’s Top 10 Wishlists (SSAS and SSRS), I would like to contribute only 1 item to a possible SQL Server DBMS wishlist:

1. Source Control.

Not SourceSafe source control, but rather an automated version out-of-the-box, not relying on developers to check in/out. Rather, it should track the changes to the code as they are made, and a full version history should be available directly in the DBMS. It should not be too hard. After all, there is a nice database available, which can store code with its version numbers just like anything else.

This would make a lot of developers’ lives a bit less frustrating.

OK, a SQL code “beautifier” would also be nice, but it is not all that important…

Vote on Connect

T-SQL , ,

 

Moving writeback data in the Fact tables and avoiding problems with changing column names

January 12th, 2009

While writeback functionality in SQL Server Analysis Services 2008 has changed significantly and writeback values are stored in the OLAP cubes, in SSAS 2005 the writeback values are stored in a relational table on the same server with the fact tables. When the writeback functionality is enabled for a partition, a new table is automatically created which bears a prefix of WriteTable. Its structure is fairly simple: it contains a column for each dimension and two audit fields.

The ROLAP nature of the writeback table makes it inefficient for storage of a large number of writeback records, and it is sometimes required to consolidate the data it contains with the fact table.

Normally we can write a stored procedure, which can do this for us. Because the values in the WriteTable are deltas there is a new row for each user change. In example, if we change 0 to 5, there will be one row in the writeback table, which shows 5 as a measure value. If then we change the new value of 5 to 2, there will be a new row with a measure value of -3. Therefore, it could be more efficient to perform a quick aggregation of the values in the WriteTable while moving them in the fact table. This could also be contrary to our requirements if we want to be able to trace all data changes.

In either case, we end up with a number of new rows and we can insert these into our fact table, after which we can truncate our WriteTable and process our cube. There is a potential pitfall here. If we do not set up properly the processing settings, we could destroy our WriteTable and have it re-created, which in turn introduces another pitfall – SSAS may change our column suffixes. In example, if we have a fact table with the following definition:

CREATE TABLE [Fact_IndicatorAmount](
[Fact_IndicatorAmount_Id] [int],
[ETL_Date] [timestamp],
[Indicator_Id] [int],
[Region_Id] [int],
[Scenario_Id] [int],
[Date_Id] [datetime],
[High] [float],
[Low] [float],
[Amount] [float]
)

The WriteTable may be created like this:

CREATE TABLE [WriteTable_Indicator Amount](
[High_0] [float],
[Low_1] [float],
[Amount_2] [float],
[Indicator_Id_3] [int],
[Region_Id_4] [int],
[Scenario_Id_5] [int],
[Date_Id_6] [datetime],
[MS_AUDIT_TIME_8] [datetime],
[MS_AUDIT_USER_9] [nvarchar](255)
)

Note how the column names are the same as the fact table column names, but are suffixed with _1, _2, etc. Unfortunately, these may change with the re-creation of the WriteTable. SSAS tends to assign the suffixes randomly. If that happens, our consolidation stored procedures will break.

The obvious step to avoid this is to set up our cube processing correctly, making sure that the WriteTable does not get re-created. To do this, we can select Use Existing writeback table in the Change Settings… dialog, which allows us to change cube processing settings:

image

We can also script this action and use it in our automated cube processing SQL Server job.

Even though this is a relatively intuitive and simple solution, I have always had problems with it because of manual cube processing performed by power users, which do destroy the writeback data together with the WriteTable structure and following from that, the code in my stored procedures.

Through the utilisation of some dynamic SQL and SQL Server system tables information, we can write a stored procedure which does not depend on the suffixes of the column names in the writeback table:

CREATE PROCEDURE [usp_Consolidate_WriteBack_to_Facts]
AS
BEGIN
SET NOCOUNT ON;

DECLARE @Column_High nvarchar(50),
@Column_Low nvarchar(50),
@Column_Amount nvarchar(50),
@Column_Indicator nvarchar(50),
@Column_Region nvarchar(50),
@Column_Scenario nvarchar(50),
@Column_Time nvarchar(50)

SET @Column_High = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘High%’
)

SET @Column_Low = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Low%’
)

SET @Column_Amount = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Amount%’
)

SET @Column_Indicator = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Indicator%’
)

SET @Column_Region = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Region%’
)

SET @Column_Scenario = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
INNER JOIN systypes
ON syscolumns.xtype=systypes.xtype
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Scenario%’
)

SET @Column_Time = (
SELECT syscolumns.name
FROM sysobjects
INNER JOIN syscolumns
ON sysobjects.id = syscolumns.id
WHERE sysobjects.xtype=’U’
AND sysobjects.name like ‘Write%’
AND syscolumns.name like ‘Date%’
)

DECLARE @SQL_Command nvarchar(4000)
SET @SQL_Command = (‘
INSERT INTO [Fact_IndicatorAmount]
([High]
,[Low]
,[Amount]
,[Indicator_Id]
,[Region_Id]
,[Scenario_Id]
,[Date_Id])
SELECT ‘+ @Column_High +’
,’+ @Column_Low +’
,’+ @Column_Amount +’
,’+ @Column_Indicator +’
,’+ @Column_Region +’
,’+ @Column_Scenario +’
,’+ @Column_Time +’
FROM [WriteTable_Indicator Amount]‘)

EXEC (@SQL_Command)

TRUNCATE TABLE [WriteTable_Indicator Amount]
END

What we are effectively doing here is getting the column names from the WriteTable and then constructing an INSERT statement based on these. It is dangerous to further automate this by a while loop, as the actual column names in the WriteTable can differ from the ones in the fact table. This could happen if the dimension table key names are different to the fact table key names.

Moving writeback rows through this stored procedure ensures that even if the WriteTable for a partition is re-created for some reason our code can handle it.

SSAS , , ,

 

Switch to our mobile site