Recently I had to build a function, which returned dimension members from a parent-child relational table. I needed to be able to order the members hierarchically and I decided to use a recursive CTE call.
Usually, we have a parent-child dimension table with structure similar to this:
Dim_Business
Business_Skey int
Parent_Business_Skey int
with a root node with Skey of -3 and Parent_Business_Skey of -3.
Let’s assume we have a hierarchy like this:
All (Id = -3, P_ID = -3)
-GLOBAL (Id = 1, P_ID = -3)
–Europe (Id = 2, P_ID = 1)
—UK (Id = 3, P_ID = 2)
—France (Id = 4, P_ID = 2)
—Spain (Id = 5, P_ID = 2)
–North America (Id = 6, P_ID = 1)
—USA (Id = 7, P_ID = 6)
—Canada (Id = 8, P_ID = 6)
—Mexico (Id = 9, P_ID = 6)
If we do something like:
SELECT Business_Skey
FROM Dim_Business
ORDER BY Parent_Business_Skey ASC, Business_Skey ASC
We will get:
-3 (All)
1 (GLOBAL)
2 (Europe)
6 (North America)
3 (UK)
4 (France)
5 (Spain)
7 (USA)
8 (Canada)
9 (Mexico)
Obviously, this hierarchy is incorrect, because we want to see the leaf nodes under their respective parents.
We can recursively create order, which concatenates the parent ids:
WITH b_ord(bid, bord)
AS
(
SELECT Business_Skey AS bid
, CONVERT(nvarchar(1000), Business_Skey) AS bord
FROM Dim_Business
WHERE Business_Skey = -3
UNION ALL
SELECT Business_Skey AS bid
, CONVERT(nvarchar(1000), bord + ‘|’ + CONVERT(nvarchar, Business_Skey))
FROM Dim_Business db
INNER JOIN b_ord bo
ON db.Parent_Business_Skey = bo.bid
WHERE db.Business_Skey <> bo.bid
)
SELECT *
FROM b_ord
ORDER BY bord ASC
The result of the CTE query is:
-3 -3 (All)
1 -3|1 (GLOBAL)
2 -3|1|2 (Europe)
3 -3|1|2|3 (UK)
4 -3|1|2|4 (France)
5 -3|1|2|5 (Spain)
6 -3|1|6 (North America)
7 -3|1|6|7 (USA)
8 -3|1|6|8 (Canada)
9 -3|1|6|9 (Mexico)
and the order is correct.
Because the code needed to go in a function, invoked by a number of stored procedures, .NET application and various reports, I needed the code to be quick and
light. As some dimensions had a large number of members (50000+), which could grow with time, the code needed to implemented in a careful way. So, I decided
to compare the recursive CTE function to a WHILE loop and a temporary table implementation:
DECLARE @c int
DECLARE @num_of_nodes int
SET @num_of_nodes = (SELECT COUNT(*) FROM Dim_Business)
CREATE TABLE #order(
skey int
, ord nvarchar(1000)
, lvl int
)
INSERT INTO #order
SELECT Business_Skey
, CONVERT(nvarchar(1000), Business_Skey)
, 1
FROM Dim_Business
WHERE Business_Skey = -3
SET @c = 2
WHILE @c > 0
BEGIN
INSERT INTO #order
SELECT Business_Skey
, CONVERT(nvarchar(1000), ord + ‘|’ + CONVERT(nvarchar, Business_Skey))
, @c
FROM Dim_Business db
INNER JOIN #order o
ON db.Parent_Business_Skey = o.skey
AND o.lvl = @c – 1
WHERE db.Business_Skey <> o.skey
SET @c = @c + 1
IF (SELECT COUNT(*) FROM #order) = @num_of_nodes
SET @c = 0
END
SELECT skey AS bid
, ord AS bord
FROM #order
ORDER BY ord ASC
DROP TABLE #order
After comparing the results in Client Statistics and the IO reads, the 1st recursive query performs more than 20% worse than the WHILE loop query. It also trails the non-recursive query in the count of logical reads,read-ahead reads and scan counts.
It seems like in SQL Server 2005 calling WITH recursively does not work as good as coding set-based operations through WHILE loops.
T-SQL
recursion vs loops, stored procedures, T-SQL