November 21st, 2008
Recently I had to build a function, which returned dimension members from a parent-child relational table. I needed to be able to order the members hierarchically and I decided to use a recursive CTE call.
Usually, we have a parent-child dimension table with structure similar to this:
Business_Skey intParent_Business_Skey int
with a root node with Skey of -3 and Parent_Business_Skey of -3.
Let’s assume we have a hierarchy like this:
All (Id = -3, P_ID = -3)-GLOBAL (Id = 1, P_ID = -3)–Europe (Id = 2, P_ID = 1)—UK (Id = 3, P_ID = 2)—France (Id = 4, P_ID = 2)—Spain (Id = 5, P_ID = 2)–North America (Id = 6, P_ID = 1)—USA (Id = 7, P_ID = 6)—Canada (Id = 8, P_ID = 6)—Mexico (Id = 9, P_ID = 6)
If we do something like:
SELECT Business_SkeyFROM Dim_BusinessORDER BY Parent_Business_Skey ASC, Business_Skey ASC
We will get:
-3 (All)1 (GLOBAL)2 (Europe)6 (North America)3 (UK)4 (France)5 (Spain)7 (USA)8 (Canada)9 (Mexico)
Obviously, this hierarchy is incorrect, because we want to see the leaf nodes under their respective parents.
We can recursively create order, which concatenates the parent ids:
WITH b_ord(bid, bord)AS(SELECT Business_Skey AS bid, CONVERT(nvarchar(1000), Business_Skey) AS bordFROM Dim_BusinessWHERE Business_Skey = -3
SELECT Business_Skey AS bid, CONVERT(nvarchar(1000), bord + ‘|’ + CONVERT(nvarchar, Business_Skey))FROM Dim_Business dbINNER JOIN b_ord boON db.Parent_Business_Skey = bo.bidWHERE db.Business_Skey <> bo.bid)SELECT *FROM b_ordORDER BY bord ASC
The result of the CTE query is:
-3 -3 (All)1 -3|1 (GLOBAL)2 -3|1|2 (Europe)3 -3|1|2|3 (UK)4 -3|1|2|4 (France)5 -3|1|2|5 (Spain)6 -3|1|6 (North America)7 -3|1|6|7 (USA)8 -3|1|6|8 (Canada)9 -3|1|6|9 (Mexico)
and the order is correct.
Because the code needed to go in a function, invoked by a number of stored procedures, .NET application and various reports, I needed the code to be quick and
light. As some dimensions had a large number of members (50000+), which could grow with time, the code needed to implemented in a careful way. So, I decided
to compare the recursive CTE function to a WHILE loop and a temporary table implementation:
DECLARE @c intDECLARE @num_of_nodes int
SET @num_of_nodes = (SELECT COUNT(*) FROM Dim_Business)
CREATE TABLE #order(skey int, ord nvarchar(1000), lvl int)
INSERT INTO #orderSELECT Business_Skey, CONVERT(nvarchar(1000), Business_Skey), 1FROM Dim_BusinessWHERE Business_Skey = -3
SET @c = 2
WHILE @c > 0BEGININSERT INTO #orderSELECT Business_Skey, CONVERT(nvarchar(1000), ord + ‘|’ + CONVERT(nvarchar, Business_Skey)), @cFROM Dim_Business dbINNER JOIN #order oON db.Parent_Business_Skey = o.skeyAND o.lvl = @c – 1WHERE db.Business_Skey <> o.skey
SET @c = @c + 1IF (SELECT COUNT(*) FROM #order) = @num_of_nodesSET @c = 0END
SELECT skey AS bid, ord AS bordFROM #orderORDER BY ord ASC
DROP TABLE #order
After comparing the results in Client Statistics and the IO reads, the 1st recursive query performs more than 20% worse than the WHILE loop query. It also trails the non-recursive query in the count of logical reads,read-ahead reads and scan counts.
It seems like in SQL Server 2005 calling WITH recursively does not work as good as coding set-based operations through WHILE loops.