Exploring Recursive CTEs by Example

Question

Post reply

Exploring Recursive CTEs by Example

Dwain Camps

SSC Guru

Points: 86978
More actions
July 16, 2012 at 10:05 pm

#259563

Comments posted to this topic are about the item Exploring Recursive CTEs by Example
My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 43 total)

You must be logged in to reply to this topic. Login to reply

Basit Farooq Ten Centuries Points: 1151 More actions · Answer 1

Thanks for this excellent article.

Regards,

Basit A. Farooq (MSC Computing, MCITP SQL Server 2005 & 2008, MCDBA SQL Server 2000)

http://basitaalishan.com

David McKinney SSChampion Points: 10472 More actions · Answer 2

I love it! Especially the conclusion.

It's time we stopped being afraid to whisper the words 'Recursive CTE' for fear of being heard by the RBAR police! 😉

Have to go....I hear a knock at the door :w00t:

rodjkidd SSCoach Points: 15973 More actions · Answer 3

Excellent article Dwain.

Will have to read a few times for it all to sink in though!

Rodders...

nem.schlecht SSC Rookie Points: 39 More actions · Answer 4

Great article!

I noticed a little mistake, though. The 2nd example under "Geometric and Arithmetic Sequences and Progressions", you have:

"Another example: 1^1 + 1/2^2 + 1/3^3 + 1/4^4:"

But the very first line of the code is:

"-- Generate first 99 of Power Series: 1+1/2^2+1/3^2+1/4^2 etc."

Note the difference: the first is to the power of the current iteration, the latter (code example) always just squares the current quotient.

Not sure which one you wanted, but I assume the latter as that's what the code and example generate. 🙂

Alex Friedman Right there with Babe Points: 755 More actions · Answer 5

Alex Friedman

Right there with Babe

Points: 755

July 17, 2012 at 9:06 am

#1513658

Very interesting, thanks!

DavidP-340734 Right there with Babe Points: 737 More actions · Answer 6

One big caveat is the size of the datasets being used by CTEs. Be very cautions for two reasons: recursive CTE can blow up your temp db when large datasets can’t be written to memory and bad query plans are cached. Query analyzer is not good and building the best query plans with CTEs. I have had to rewrite numerous stored procedures replacing CTE with #temp tables to improve performance and get better results from query analyzer to add missing indexes.

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 7

Just yesterday by coincidence I needed a query to parse dynamic XML with different nodes and properties on each run and came up with this procedure using CTEs which is very fast. I'm sure it can be improved and if anyone has suggestions please offer them.

CREATE PROCEDURE dbo.Test_ParseXML

@doc NVARCHAR(MAX)

,@rootnode NVARCHAR(255)

AS

BEGIN

SET NOCOUNT ON

DECLARE

@idoc INT

,@id INT

,@parentid INT

SET @parentid = NULL

SET @id = 1

IF OBJECT_ID('tempdb..#ChildList') IS NOT NULL

DROP TABLE #ChildList

CREATE TABLE #ChildList (

[RowNum] INT IDENTITY(1,1) NOT NULL,

[parentid] INT NULL,

[id] INT NULL,

PRIMARY KEY (RowNum),

UNIQUE (RowNum))

IF OBJECT_ID('tempdb..#NodeList') IS NOT NULL

DROP TABLE #NodeList

CREATE TABLE #NodeList (

[RowNum] INT NOT NULL,

[id] INT NULL,

[parentid] INT NULL,

[nodetype] INT NULL,

[localname] NVARCHAR(MAX) NULL,

[text] NVARCHAR(MAX) NULL,

PRIMARY KEY (RowNum),

UNIQUE (RowNum))

EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

;WITH cte

AS (

SELECT

CAST(p1.parentid AS INT) AS parentid

,CAST(p1.id AS INT) AS id

FROM

OPENXML (@idoc,@rootnode,2) AS p1

UNION ALL

SELECT

CAST(p2.parentid AS INT) AS parentid

,CAST(p2.id AS INT) AS id

FROM

OPENXML (@idoc,@rootnode,2) AS p2

JOIN

cte

ON CAST(cte.id AS INT) = CAST(p2.ParentID AS INT)

WHERE

CAST(p2.parentid AS INT) = @parentid

UNION ALL

SELECT

CAST(p3.parentid AS INT) AS parentid

,CAST(p3.id AS INT) AS id

FROM

OPENXML (@idoc,@rootnode,2) AS p3

JOIN

cte

ON CAST(cte.id AS INT) = CAST(p3.ParentID AS INT)

WHERE

CAST(p3.parentid AS INT) = @parentid

UNION ALL

SELECT

CAST(p4.parentid AS INT) AS parentid

,CAST(p4.id AS INT) AS id

FROM

OPENXML (@idoc,@rootnode,2) AS p4

JOIN

cte

ON CAST(cte.id AS INT) = CAST(p4.ParentID AS INT)

WHERE

CAST(p4.parentid AS INT) = @parentid

)

INSERT INTO #ChildList

SELECT *

FROM cte

INSERT INTO #NodeList

SELECT

#ChildList.RowNum

,xmllist.id

,xmllist.parentid

,xmllist.nodetype

,xmllist.localname

,CAST(xmllist.[text] AS NVARCHAR(MAX)) AS [text]

FROM #ChildList

INNER JOIN

OPENXML (@idoc,@rootnode,2) AS xmllist

ON #ChildList.id = xmllist.id

WHERE

#ChildList.RowNum > 0

;WITH RecursiveNodes(RowNum,id,parentid,nodepath,localname,[text],nodetype)

AS (

SELECT

#NodeList.RowNum

,#NodeList.id

,#NodeList.parentid

,CAST('/' + REPLACE(REPLACE(REPLACE(REPLACE(#NodeList.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath

,#NodeList.localname

,CAST(#NodeList.[text] AS NVARCHAR(MAX)) AS [text]

,0 AS nodetype

FROM #ChildList

INNER JOIN

#NodeList

ON #ChildList.id = #NodeList.id

WHERE

#NodeList.parentid IS NULL

AND #ChildList.RowNum > 0

AND #NodeList.RowNum > 0

UNION ALL

SELECT

n.RowNum

,n.id

,n.parentid

,CAST(r.nodepath + '/'+ REPLACE(REPLACE(REPLACE(REPLACE(n.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath

,n.localname

,n.[text]

,n.nodetype

FROM #NodeList AS n

INNER JOIN

RecursiveNodes AS r

ON n.parentid = r.id

WHERE

n.RowNum > 0

AND r.RowNum > 0

AND n.parentid >= 0

)

SELECT

ROW_NUMBER() OVER (ORDER BY Result.RowNum) AS RowNum

,Result.id

,Result.parentid

,Result.nodepath

,Result.nodename

,Result.property

,Result.value

,Result.nodecontents

FROM

(

SELECT

rn.RowNum

,rn.id

,rn.parentid

,rn.nodepath

,(CASE

WHEN rn.nodetype = 0 THEN rn.localname

WHEN rn.nodetype = 1 THEN rn.localname

ELSE NULL

END) AS nodename

,(CASE

WHEN rn.nodetype = 2 THEN rn.localname

ELSE NULL

END) AS property

,(CASE

WHEN rn.nodetype = 2 THEN (SELECT [text] FROM RecursiveNodes WHERE parentid = rn.id)

ELSE NULL

END) AS value

,(CASE

WHEN rn.nodetype = 1 THEN (SELECT TOP(1) [text] FROM RecursiveNodes WHERE parentid = rn.id)

ELSE NULL

END) AS nodecontents

FROM

RecursiveNodes AS rn

WHERE

rn.localname <> '#text'

) AS Result

WHERE

Result.id >= 0

/*

EXEC dbo.Test_ParseXML

'<ReportPackage type="report">

<ReferenceId>XYZ-123</ReferenceId>

<Reports>

<ReportId>123</ReportId>

<Categories>

<Category>Real Estate</Category>

<Category>Restaurants</Category>

</Categories>

<Transactions>

<Transaction name="TxId">987654</Transaction>

</Transactions>

</Reports>

<CompanyData>

<CompanyCategory type="real estate">

<CompanyName>ABC Realty</CompanyName>

</CompanyCategory>

<DemographicDetail>

<StateID issuingAuthority="NC">123445555</StateID>

<DateExpires>2014-12-31</DateExpires>

</DemographicDetail>

</CompanyData>

<ReviewStatus>

<ReviewLevel>4.7</ReviewLevel>

<NumberReviews>1234</NumberReviews>

<ReviewStatus>Recommended</ReviewStatus>

</ReviewStatus>

</ReportPackage>',

'/ReportPackage'

*/

END

Peter H SSC Enthusiast Points: 127 More actions · Answer 8

Great article, thanks. I really enjoyed going through that.

I am looking for CTE's that will give me formulae for different predictive analysis routines.

here is a slight coding problem, the first select in the following code should not have the "Union".

-- Add an additional origin and destination node

INSERT INTO #Edges

UNION ALL SELECT '0', 'N', 15 -- ABS(CHECKSUM(NEWID())) % 100 + 25

MCTSQL Valued Member Points: 70 More actions · Answer 9

FINALLY!!! Someone who loves burning up the CPU as much as I do! 😀

As a token of gratitude, feel free to add my recursive string parser to your excursions into recursions. 🙂

[font="Courier New"]create function [sql].[StringToTable](

@string nvarchar(max),

@delimiter nvarchar(1))

returns table

/* t@sqlPROCtologist.com

Usage:

declare @str nvarchar(max)='a,bc,def,ghij,klmno,pqrstu,vwxyz,0123,456,78,9';

select * from [sql].[StringToTable](@str,',') option(maxrecursion 32767);

*/

as return(

with cte as(

select

convert(int,1) as [start],

charindex(@delimiter, @string+@delimiter,1) as [end],

substring(@string+@delimiter,1,charindex(@delimiter,@string+@delimiter,1)) as [string]

union all select

convert(int,[end]+1) as [start],

charindex(@delimiter,@string+@delimiter,[end]+1) as [end],

substring(@string+@delimiter,[end]+1,charindex(@delimiter,@string+@delimiter,[end]+1)-[end]) as [string]

from cte where [end]+1<=len(@string+@delimiter)

)

select left(string,len(string)-1) as [String] from cte);

go

[/font]

r.mitchell Old Hand Points: 368 More actions · Answer 10

Nice article. I consider myself recursively challenged, not because I'm bad at math, just from inexperience I suppose. It's nice to see several different uses in one place that could actually be useful. Most of the time I see them and just shake my head asking why...

Dwain Camps SSC Guru Points: 86978 More actions · Answer 11

Wow! So many interesting responses, so little time.

First of all, thanks to Basit, David, rodjkidb, Alex and r.mitchell for your interest and feedback. I hope that something in the article will shout at you to "use me" someday.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86978 More actions · Answer 12

nem.schlecht (7/17/2012)
Great article!
I noticed a little mistake, though. The 2nd example under "Geometric and Arithmetic Sequences and Progressions", you have:
"Another example: 1^1 + 1/2^2 + 1/3^3 + 1/4^4:"
But the very first line of the code is:
"-- Generate first 99 of Power Series: 1+1/2^2+1/3^2+1/4^2 etc."
Note the difference: the first is to the power of the current iteration, the latter (code example) always just squares the current quotient.
Not sure which one you wanted, but I assume the latter as that's what the code and example generate. 🙂

Nice catch Nem. It just goes to show, no matter how often you review articles like this something small is bound to slip through the cracks.

In this case, my intention probably doesn't matter. Both sequences are power series that will (probably) converge to different values. My intention was to show how to generate a power series using an rCTE and both cases do that.

Fortunately, I didn't design that particular rCTE to calculate the stresses on a bridge!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86978 More actions · Answer 13

DavidP-340734 (7/17/2012)
One big caveat is the size of the datasets being used by CTEs. Be very cautions for two reasons: recursive CTE can blow up your temp db when large datasets can’t be written to memory and bad query plans are cached. Query analyzer is not good and building the best query plans with CTEs. I have had to rewrite numerous stored procedures replacing CTE with #temp tables to improve performance and get better results from query analyzer to add missing indexes.

I personally haven't experienced this particular problem with ordinary CTEs (not saying of course that it isn't a potential problem), however with rCTEs the potential is most certainly there.

The UNIQUEnTuples rCTE I used to generate the solution to the knapsack problem is a case in point. That bad boy generates explosive numbers of rows, so without some method to prune back the combinations sets it produces, it becomes massively unwieldy for any but the smallest of problems.

Nonetheless, I was able to apply it to a couple of medium size problems using some pruning techniques I tailored to the problem. This thread shows those two problems, with the latter one analyzed in detail.

http://www.sqlservercentral.com/Forums/Topic1318149-392-1.aspx

The fun part about those two, besides of course the sheer challenge of a complex problem, was that there were one or two other posters that thought this wasn't a job for SQL.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86978 More actions · Answer 14

Steven Willis (7/17/2012)
Just yesterday by coincidence I needed a query to parse dynamic XML with different nodes and properties on each run and came up with this procedure using CTEs which is very fast. I'm sure it can be improved and if anyone has suggestions please offer them.
CREATE PROCEDURE dbo.Test_ParseXML
@doc NVARCHAR(MAX)
,@rootnode NVARCHAR(255)
AS
BEGIN
SET NOCOUNT ON
DECLARE
@idoc INT
,@id INT
,@parentid INT
SET @parentid = NULL
SET @id = 1
IF OBJECT_ID('tempdb..#ChildList') IS NOT NULL
DROP TABLE #ChildList
CREATE TABLE #ChildList (
[RowNum] INT IDENTITY(1,1) NOT NULL,
[parentid] INT NULL,
[id] INT NULL,
PRIMARY KEY (RowNum),
UNIQUE (RowNum))
IF OBJECT_ID('tempdb..#NodeList') IS NOT NULL
DROP TABLE #NodeList
CREATE TABLE #NodeList (
[RowNum] INT NOT NULL,
[id] INT NULL,
[parentid] INT NULL,
[nodetype] INT NULL,
[localname] NVARCHAR(MAX) NULL,
[text] NVARCHAR(MAX) NULL,
PRIMARY KEY (RowNum),
UNIQUE (RowNum))
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
;WITH cte
AS (
SELECT
CAST(p1.parentid AS INT) AS parentid
,CAST(p1.id AS INT) AS id
FROM
OPENXML (@idoc,@rootnode,2) AS p1
UNION ALL
SELECT
CAST(p2.parentid AS INT) AS parentid
,CAST(p2.id AS INT) AS id
FROM
OPENXML (@idoc,@rootnode,2) AS p2
JOIN
cte
ON CAST(cte.id AS INT) = CAST(p2.ParentID AS INT)
WHERE
CAST(p2.parentid AS INT) = @parentid
UNION ALL
SELECT
CAST(p3.parentid AS INT) AS parentid
,CAST(p3.id AS INT) AS id
FROM
OPENXML (@idoc,@rootnode,2) AS p3
JOIN
cte
ON CAST(cte.id AS INT) = CAST(p3.ParentID AS INT)
WHERE
CAST(p3.parentid AS INT) = @parentid
UNION ALL
SELECT
CAST(p4.parentid AS INT) AS parentid
,CAST(p4.id AS INT) AS id
FROM
OPENXML (@idoc,@rootnode,2) AS p4
JOIN
cte
ON CAST(cte.id AS INT) = CAST(p4.ParentID AS INT)
WHERE
CAST(p4.parentid AS INT) = @parentid
)
INSERT INTO #ChildList
SELECT *
FROM cte
INSERT INTO #NodeList
SELECT
#ChildList.RowNum
,xmllist.id
,xmllist.parentid
,xmllist.nodetype
,xmllist.localname
,CAST(xmllist.[text] AS NVARCHAR(MAX)) AS [text]
FROM #ChildList
INNER JOIN
OPENXML (@idoc,@rootnode,2) AS xmllist
ON #ChildList.id = xmllist.id
WHERE
#ChildList.RowNum > 0
;WITH RecursiveNodes(RowNum,id,parentid,nodepath,localname,[text],nodetype)
AS (
SELECT
#NodeList.RowNum
,#NodeList.id
,#NodeList.parentid
,CAST('/' + REPLACE(REPLACE(REPLACE(REPLACE(#NodeList.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath
,#NodeList.localname
,CAST(#NodeList.[text] AS NVARCHAR(MAX)) AS [text]
,0 AS nodetype
FROM #ChildList
INNER JOIN
#NodeList
ON #ChildList.id = #NodeList.id
WHERE
#NodeList.parentid IS NULL
AND #ChildList.RowNum > 0
AND #NodeList.RowNum > 0
UNION ALL
SELECT
n.RowNum
,n.id
,n.parentid
,CAST(r.nodepath + '/'+ REPLACE(REPLACE(REPLACE(REPLACE(n.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath
,n.localname
,n.[text]
,n.nodetype
FROM #NodeList AS n
INNER JOIN
RecursiveNodes AS r
ON n.parentid = r.id
WHERE
n.RowNum > 0
AND r.RowNum > 0
AND n.parentid >= 0
)
SELECT
ROW_NUMBER() OVER (ORDER BY Result.RowNum) AS RowNum
,Result.id
,Result.parentid
,Result.nodepath
,Result.nodename
,Result.property
,Result.value
,Result.nodecontents
FROM
(
SELECT
rn.RowNum
,rn.id
,rn.parentid
,rn.nodepath
,(CASE
WHEN rn.nodetype = 0 THEN rn.localname
WHEN rn.nodetype = 1 THEN rn.localname
ELSE NULL
END) AS nodename
,(CASE
WHEN rn.nodetype = 2 THEN rn.localname
ELSE NULL
END) AS property
,(CASE
WHEN rn.nodetype = 2 THEN (SELECT [text] FROM RecursiveNodes WHERE parentid = rn.id)
ELSE NULL
END) AS value
,(CASE
WHEN rn.nodetype = 1 THEN (SELECT TOP(1) [text] FROM RecursiveNodes WHERE parentid = rn.id)
ELSE NULL
END) AS nodecontents
FROM
RecursiveNodes AS rn
WHERE
rn.localname <> '#text'
) AS Result
WHERE
Result.id >= 0
/*
EXEC dbo.Test_ParseXML
'<ReportPackage type="report">
<ReferenceId>XYZ-123</ReferenceId>
<Reports>
<ReportId>123</ReportId>
<Categories>
<Category>Real Estate</Category>
<Category>Restaurants</Category>
</Categories>
<Transactions>
<Transaction name="TxId">987654</Transaction>
</Transactions>
</Reports>
<CompanyData>
<CompanyCategory type="real estate">
<CompanyName>ABC Realty</CompanyName>
</CompanyCategory>
<DemographicDetail>
<StateID issuingAuthority="NC">123445555</StateID>
<DateExpires>2014-12-31</DateExpires>
</DemographicDetail>
</CompanyData>
<ReviewStatus>
<ReviewLevel>4.7</ReviewLevel>
<NumberReviews>1234</NumberReviews>
<ReviewStatus>Recommended</ReviewStatus>
</ReviewStatus>
</ReportPackage>',
'/ReportPackage'
*/
END

Steven - Thanks for the great contribution! That's one heck of an impressive rCTE!

I definitely want to take a closer look at what you're doing. I've heard some folks say there are "rare cases" where the rCTE will outperform more traditional solutions, and if this is one of them I want to try and understand why, as well as figure out ways to apply the algorithm employed to other applications.

When I find time of course, between solving my current bin-packing problem. 🙂

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St