# Exploring Recursive CTEs by Example

• Comments posted to this topic are about the item Exploring Recursive CTEs by Example

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

My thought question: Have you ever been told that your query runs too fast?

INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

• Thanks for this excellent article.

Regards,

Basit A. Farooq (MSC Computing, MCITP SQL Server 2005 & 2008, MCDBA SQL Server 2000)

http://basitaalishan.com
• I love it! Especially the conclusion.

It's time we stopped being afraid to whisper the words 'Recursive CTE' for fear of being heard by the RBAR police! π

Have to go....I hear a knock at the door :w00t:

• Excellent article Dwain.

Will have to read a few times for it all to sink in though!

Rodders...

• Great article!

I noticed a little mistake, though. The 2nd example under "Geometric and Arithmetic Sequences and Progressions", you have:

"Another example: 1^1 + 1/2^2 + 1/3^3 + 1/4^4:"

But the very first line of the code is:

"-- Generate first 99 of Power Series: 1+1/2^2+1/3^2+1/4^2 etc."

Note the difference: the first is to the power of the current iteration, the latter (code example) always just squares the current quotient.

Not sure which one you wanted, but I assume the latter as that's what the code and example generate. π

• Very interesting, thanks!

• One big caveat is the size of the datasets being used by CTEs. Be very cautions for two reasons: recursive CTE can blow up your temp db when large datasets canβt be written to memory and bad query plans are cached. Query analyzer is not good and building the best query plans with CTEs. I have had to rewrite numerous stored procedures replacing CTE with #temp tables to improve performance and get better results from query analyzer to add missing indexes.

• Just yesterday by coincidence I needed a query to parse dynamic XML with different nodes and properties on each run and came up with this procedure using CTEs which is very fast. I'm sure it can be improved and if anyone has suggestions please offer them.

`CREATE PROCEDURE dbo.Test_ParseXML`

` @doc NVARCHAR(MAX)`

`,@rootnode NVARCHAR(255)`

`AS`

`BEGIN`

` SET NOCOUNT ON`

` DECLARE`

` @idoc INT`

` ,@id INT`

` ,@parentid INT`

`SET @parentid = NULL`

`SET @id = 1`

`IF OBJECT_ID('tempdb..#ChildList') IS NOT NULL`

`DROP TABLE #ChildList`

`CREATE TABLE #ChildList (`

`[RowNum] INT IDENTITY(1,1) NOT NULL,`

`[parentid] INT NULL,`

`[id] INT NULL,`

`PRIMARY KEY (RowNum),`

`UNIQUE (RowNum))`

`IF OBJECT_ID('tempdb..#NodeList') IS NOT NULL`

`DROP TABLE #NodeList`

`CREATE TABLE #NodeList (`

`[RowNum] INT NOT NULL,`

`[id] INT NULL,`

`[parentid] INT NULL,`

`[nodetype] INT NULL,`

`[localname] NVARCHAR(MAX) NULL,`

`[text] NVARCHAR(MAX) NULL,`

`PRIMARY KEY (RowNum),`

`UNIQUE (RowNum))`

`EXEC sp_xml_preparedocument @idoc OUTPUT, @doc`

`;WITH cte`

`AS (`

`SELECT`

` CAST(p1.parentid AS INT) AS parentid`

`,CAST(p1.id AS INT) AS id`

`FROM`

`OPENXML (@idoc,@rootnode,2) AS p1`

`UNION ALL`

`SELECT`

` CAST(p2.parentid AS INT) AS parentid`

`,CAST(p2.id AS INT) AS id`

`FROM`

`OPENXML (@idoc,@rootnode,2) AS p2`

`JOIN`

`cte`

`ON CAST(cte.id AS INT) = CAST(p2.ParentID AS INT)`

`WHERE`

`CAST(p2.parentid AS INT) = @parentid`

`UNION ALL`

`SELECT`

` CAST(p3.parentid AS INT) AS parentid`

`,CAST(p3.id AS INT) AS id`

`FROM`

`OPENXML (@idoc,@rootnode,2) AS p3`

`JOIN`

`cte`

`ON CAST(cte.id AS INT) = CAST(p3.ParentID AS INT)`

`WHERE`

`CAST(p3.parentid AS INT) = @parentid`

`UNION ALL`

`SELECT`

` CAST(p4.parentid AS INT) AS parentid`

`,CAST(p4.id AS INT) AS id`

`FROM`

`OPENXML (@idoc,@rootnode,2) AS p4`

`JOIN`

`cte`

`ON CAST(cte.id AS INT) = CAST(p4.ParentID AS INT)`

`WHERE`

`CAST(p4.parentid AS INT) = @parentid`

`)`

`INSERT INTO #ChildList`

`SELECT *`

`FROM cte`

`INSERT INTO #NodeList`

`SELECT`

` #ChildList.RowNum`

`,xmllist.id`

`,xmllist.parentid`

`,xmllist.nodetype`

`,xmllist.localname`

`,CAST(xmllist.[text] AS NVARCHAR(MAX)) AS [text]`

`FROM #ChildList`

`INNER JOIN`

`OPENXML (@idoc,@rootnode,2) AS xmllist`

`ON #ChildList.id = xmllist.id`

`WHERE`

`#ChildList.RowNum > 0`

`;WITH RecursiveNodes(RowNum,id,parentid,nodepath,localname,[text],nodetype)`

`AS (`

`SELECT`

` #NodeList.RowNum`

`,#NodeList.id`

`,#NodeList.parentid`

`,CAST('/' + REPLACE(REPLACE(REPLACE(REPLACE(#NodeList.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath`

`,#NodeList.localname`

`,CAST(#NodeList.[text] AS NVARCHAR(MAX)) AS [text]`

`,0 AS nodetype`

`FROM #ChildList`

`INNER JOIN`

`#NodeList`

`ON #ChildList.id = #NodeList.id`

`WHERE`

`#NodeList.parentid IS NULL`

`AND #ChildList.RowNum > 0`

`AND #NodeList.RowNum > 0`

`UNION ALL`

`SELECT`

` n.RowNum`

`,n.id`

`,n.parentid`

`,CAST(r.nodepath + '/'+ REPLACE(REPLACE(REPLACE(REPLACE(n.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath`

`,n.localname`

`,n.[text]`

`,n.nodetype`

`FROM #NodeList AS n`

`INNER JOIN`

`RecursiveNodes AS r`

`ON n.parentid = r.id`

`WHERE`

`n.RowNum > 0`

`AND r.RowNum > 0`

`AND n.parentid >= 0`

` )`

`SELECT`

`ROW_NUMBER() OVER (ORDER BY Result.RowNum) AS RowNum`

` ,Result.id`

` ,Result.parentid`

` ,Result.nodepath`

` ,Result.nodename`

` ,Result.property`

` ,Result.value`

` ,Result.nodecontents`

`FROM`

`(`

`SELECT`

`rn.RowNum`

` ,rn.id`

` ,rn.parentid`

` ,rn.nodepath`

` ,(CASE`

`WHEN rn.nodetype = 0 THEN rn.localname`

`WHEN rn.nodetype = 1 THEN rn.localname`

`ELSE NULL`

` END) AS nodename`

` ,(CASE`

`WHEN rn.nodetype = 2 THEN rn.localname`

`ELSE NULL`

` END) AS property`

` ,(CASE`

`WHEN rn.nodetype = 2 THEN (SELECT [text] FROM RecursiveNodes WHERE parentid = rn.id)`

`ELSE NULL`

` END) AS value`

` ,(CASE`

`WHEN rn.nodetype = 1 THEN (SELECT TOP(1) [text] FROM RecursiveNodes WHERE parentid = rn.id)`

`ELSE NULL`

` END) AS nodecontents`

`FROM`

`RecursiveNodes AS rn`

`WHERE`

`rn.localname <> '#text'`

`) AS Result`

`WHERE`

`Result.id >= 0`

`/*`

`EXEC dbo.Test_ParseXML`

` '<ReportPackage type="report">`

`<ReferenceId>XYZ-123</ReferenceId>`

`<Reports>`

` <ReportId>123</ReportId>`

` <Categories>`

`<Category>Real Estate</Category>`

`<Category>Restaurants</Category>`

` </Categories>`

` <Transactions>`

`<Transaction name="TxId">987654</Transaction>`

` </Transactions>`

`</Reports>`

`<CompanyData>`

` <CompanyCategory type="real estate">`

`<CompanyName>ABC Realty</CompanyName>`

` </CompanyCategory>`

` <DemographicDetail>`

`<StateID issuingAuthority="NC">123445555</StateID>`

`<DateExpires>2014-12-31</DateExpires>`

` </DemographicDetail>`

`</CompanyData>`

`<ReviewStatus>`

` <ReviewLevel>4.7</ReviewLevel>`

` <NumberReviews>1234</NumberReviews>`

` <ReviewStatus>Recommended</ReviewStatus>`

`</ReviewStatus>`

` </ReportPackage>',`

` '/ReportPackage'`

`*/`

`END`

• Great article, thanks. I really enjoyed going through that.

I am looking for CTE's that will give me formulae for different predictive analysis routines.

here is a slight coding problem, the first select in the following code should not have the "Union".

`-- Add an additional origin and destination node`

`INSERT INTO #Edges`

`UNION ALL SELECT '0', 'N', 15 -- ABS(CHECKSUM(NEWID())) % 100 + 25`

• FINALLY!!! Someone who loves burning up the CPU as much as I do! π

As a token of gratitude, feel free to add my recursive string parser to your excursions into recursions. π

[font="Courier New"]`create function [sql].[StringToTable](`

` @string nvarchar(max),`

` @delimiter nvarchar(1))`

` returns table`

`/* t@sqlPROCtologist.com`

` Usage:`

` declare @STR nvarchar(max)='a,bc,def,ghij,klmno,pqrstu,vwxyz,0123,456,78,9';`

` select * from [sql].[StringToTable](@str,',') option(maxrecursion 32767);`

`*/`

`as return(`

` with cte as(`

` select`

` convert(int,1) as [start],`

` charindex(@delimiter, @string+@delimiter,1) as [end],`

` substring(@string+@delimiter,1,charindex(@delimiter,@string+@delimiter,1)) as [string]`

` union all select`

` convert(int,[end]+1) as [start],`

` charindex(@delimiter,@string+@delimiter,[end]+1) as [end],`

` substring(@string+@delimiter,[end]+1,charindex(@delimiter,@string+@delimiter,[end]+1)-[end]) as [string]`

` from cte where [end]+1<=len(@string+@delimiter)`

` )`

` select left(string,len(string)-1) as [String] from cte);`

`go`

[/font]

• Nice article. I consider myself recursively challenged, not because I'm bad at math, just from inexperience I suppose. It's nice to see several different uses in one place that could actually be useful. Most of the time I see them and just shake my head asking why...

• Wow! So many interesting responses, so little time.

First of all, thanks to Basit, David, rodjkidb, Alex and r.mitchell for your interest and feedback. I hope that something in the article will shout at you to "use me" someday.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

My thought question: Have you ever been told that your query runs too fast?

INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

• nem.schlecht (7/17/2012)

Great article!

I noticed a little mistake, though. The 2nd example under "Geometric and Arithmetic Sequences and Progressions", you have:

"Another example: 1^1 + 1/2^2 + 1/3^3 + 1/4^4:"

But the very first line of the code is:

"-- Generate first 99 of Power Series: 1+1/2^2+1/3^2+1/4^2 etc."

Note the difference: the first is to the power of the current iteration, the latter (code example) always just squares the current quotient.

Not sure which one you wanted, but I assume the latter as that's what the code and example generate. π

Nice catch Nem. It just goes to show, no matter how often you review articles like this something small is bound to slip through the cracks.

In this case, my intention probably doesn't matter. Both sequences are power series that will (probably) converge to different values. My intention was to show how to generate a power series using an rCTE and both cases do that.

Fortunately, I didn't design that particular rCTE to calculate the stresses on a bridge!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

My thought question: Have you ever been told that your query runs too fast?

INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

• DavidP-340734 (7/17/2012)

One big caveat is the size of the datasets being used by CTEs. Be very cautions for two reasons: recursive CTE can blow up your temp db when large datasets canβt be written to memory and bad query plans are cached. Query analyzer is not good and building the best query plans with CTEs. I have had to rewrite numerous stored procedures replacing CTE with #temp tables to improve performance and get better results from query analyzer to add missing indexes.

I personally haven't experienced this particular problem with ordinary CTEs (not saying of course that it isn't a potential problem), however with rCTEs the potential is most certainly there.

The UNIQUEnTuples rCTE I used to generate the solution to the knapsack problem is a case in point. That bad boy generates explosive numbers of rows, so without some method to prune back the combinations sets it produces, it becomes massively unwieldy for any but the smallest of problems.

Nonetheless, I was able to apply it to a couple of medium size problems using some pruning techniques I tailored to the problem. This thread shows those two problems, with the latter one analyzed in detail.

http://www.sqlservercentral.com/Forums/Topic1318149-392-1.aspx

The fun part about those two, besides of course the sheer challenge of a complex problem, was that there were one or two other posters that thought this wasn't a job for SQL.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

My thought question: Have you ever been told that your query runs too fast?

INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

• Steven Willis (7/17/2012)

Just yesterday by coincidence I needed a query to parse dynamic XML with different nodes and properties on each run and came up with this procedure using CTEs which is very fast. I'm sure it can be improved and if anyone has suggestions please offer them.

`CREATE PROCEDURE dbo.Test_ParseXML`

` @doc NVARCHAR(MAX)`

`,@rootnode NVARCHAR(255)`

`AS`

`BEGIN`

` SET NOCOUNT ON`

` DECLARE`

` @idoc INT`

` ,@id INT`

` ,@parentid INT`

`SET @parentid = NULL`

`SET @id = 1`

`IF OBJECT_ID('tempdb..#ChildList') IS NOT NULL`

`DROP TABLE #ChildList`

`CREATE TABLE #ChildList (`

`[RowNum] INT IDENTITY(1,1) NOT NULL,`

`[parentid] INT NULL,`

`[id] INT NULL,`

`PRIMARY KEY (RowNum),`

`UNIQUE (RowNum))`

`IF OBJECT_ID('tempdb..#NodeList') IS NOT NULL`

`DROP TABLE #NodeList`

`CREATE TABLE #NodeList (`

`[RowNum] INT NOT NULL,`

`[id] INT NULL,`

`[parentid] INT NULL,`

`[nodetype] INT NULL,`

`[localname] NVARCHAR(MAX) NULL,`

`[text] NVARCHAR(MAX) NULL,`

`PRIMARY KEY (RowNum),`

`UNIQUE (RowNum))`

`EXEC sp_xml_preparedocument @idoc OUTPUT, @doc`

`;WITH cte`

`AS (`

`SELECT`

` CAST(p1.parentid AS INT) AS parentid`

`,CAST(p1.id AS INT) AS id`

`FROM`

`OPENXML (@idoc,@rootnode,2) AS p1`

`UNION ALL`

`SELECT`

` CAST(p2.parentid AS INT) AS parentid`

`,CAST(p2.id AS INT) AS id`

`FROM`

`OPENXML (@idoc,@rootnode,2) AS p2`

`JOIN`

`cte`

`ON CAST(cte.id AS INT) = CAST(p2.ParentID AS INT)`

`WHERE`

`CAST(p2.parentid AS INT) = @parentid`

`UNION ALL`

`SELECT`

` CAST(p3.parentid AS INT) AS parentid`

`,CAST(p3.id AS INT) AS id`

`FROM`

`OPENXML (@idoc,@rootnode,2) AS p3`

`JOIN`

`cte`

`ON CAST(cte.id AS INT) = CAST(p3.ParentID AS INT)`

`WHERE`

`CAST(p3.parentid AS INT) = @parentid`

`UNION ALL`

`SELECT`

` CAST(p4.parentid AS INT) AS parentid`

`,CAST(p4.id AS INT) AS id`

`FROM`

`OPENXML (@idoc,@rootnode,2) AS p4`

`JOIN`

`cte`

`ON CAST(cte.id AS INT) = CAST(p4.ParentID AS INT)`

`WHERE`

`CAST(p4.parentid AS INT) = @parentid`

`)`

`INSERT INTO #ChildList`

`SELECT *`

`FROM cte`

`INSERT INTO #NodeList`

`SELECT`

` #ChildList.RowNum`

`,xmllist.id`

`,xmllist.parentid`

`,xmllist.nodetype`

`,xmllist.localname`

`,CAST(xmllist.[text] AS NVARCHAR(MAX)) AS [text]`

`FROM #ChildList`

`INNER JOIN`

`OPENXML (@idoc,@rootnode,2) AS xmllist`

`ON #ChildList.id = xmllist.id`

`WHERE`

`#ChildList.RowNum > 0`

`;WITH RecursiveNodes(RowNum,id,parentid,nodepath,localname,[text],nodetype)`

`AS (`

`SELECT`

` #NodeList.RowNum`

`,#NodeList.id`

`,#NodeList.parentid`

`,CAST('/' + REPLACE(REPLACE(REPLACE(REPLACE(#NodeList.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath`

`,#NodeList.localname`

`,CAST(#NodeList.[text] AS NVARCHAR(MAX)) AS [text]`

`,0 AS nodetype`

`FROM #ChildList`

`INNER JOIN`

`#NodeList`

`ON #ChildList.id = #NodeList.id`

`WHERE`

`#NodeList.parentid IS NULL`

`AND #ChildList.RowNum > 0`

`AND #NodeList.RowNum > 0`

`UNION ALL`

`SELECT`

` n.RowNum`

`,n.id`

`,n.parentid`

`,CAST(r.nodepath + '/'+ REPLACE(REPLACE(REPLACE(REPLACE(n.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath`

`,n.localname`

`,n.[text]`

`,n.nodetype`

`FROM #NodeList AS n`

`INNER JOIN`

`RecursiveNodes AS r`

`ON n.parentid = r.id`

`WHERE`

`n.RowNum > 0`

`AND r.RowNum > 0`

`AND n.parentid >= 0`

` )`

`SELECT`

`ROW_NUMBER() OVER (ORDER BY Result.RowNum) AS RowNum`

` ,Result.id`

` ,Result.parentid`

` ,Result.nodepath`

` ,Result.nodename`

` ,Result.property`

` ,Result.value`

` ,Result.nodecontents`

`FROM`

`(`

`SELECT`

`rn.RowNum`

` ,rn.id`

` ,rn.parentid`

` ,rn.nodepath`

` ,(CASE`

`WHEN rn.nodetype = 0 THEN rn.localname`

`WHEN rn.nodetype = 1 THEN rn.localname`

`ELSE NULL`

` END) AS nodename`

` ,(CASE`

`WHEN rn.nodetype = 2 THEN rn.localname`

`ELSE NULL`

` END) AS property`

` ,(CASE`

`WHEN rn.nodetype = 2 THEN (SELECT [text] FROM RecursiveNodes WHERE parentid = rn.id)`

`ELSE NULL`

` END) AS value`

` ,(CASE`

`WHEN rn.nodetype = 1 THEN (SELECT TOP(1) [text] FROM RecursiveNodes WHERE parentid = rn.id)`

`ELSE NULL`

` END) AS nodecontents`

`FROM`

`RecursiveNodes AS rn`

`WHERE`

`rn.localname <> '#text'`

`) AS Result`

`WHERE`

`Result.id >= 0`

`/*`

`EXEC dbo.Test_ParseXML`

` '<ReportPackage type="report">`

`<ReferenceId>XYZ-123</ReferenceId>`

`<Reports>`

` <ReportId>123</ReportId>`

` <Categories>`

`<Category>Real Estate</Category>`

`<Category>Restaurants</Category>`

` </Categories>`

` <Transactions>`

`<Transaction name="TxId">987654</Transaction>`

` </Transactions>`

`</Reports>`

`<CompanyData>`

` <CompanyCategory type="real estate">`

`<CompanyName>ABC Realty</CompanyName>`

` </CompanyCategory>`

` <DemographicDetail>`

`<StateID issuingAuthority="NC">123445555</StateID>`

`<DateExpires>2014-12-31</DateExpires>`

` </DemographicDetail>`

`</CompanyData>`

`<ReviewStatus>`

` <ReviewLevel>4.7</ReviewLevel>`

` <NumberReviews>1234</NumberReviews>`

` <ReviewStatus>Recommended</ReviewStatus>`

`</ReviewStatus>`

` </ReportPackage>',`

` '/ReportPackage'`

`*/`

`END`

Steven - Thanks for the great contribution! That's one heck of an impressive rCTE!

I definitely want to take a closer look at what you're doing. I've heard some folks say there are "rare cases" where the rCTE will outperform more traditional solutions, and if this is one of them I want to try and understand why, as well as figure out ways to apply the algorithm employed to other applications.

When I find time of course, between solving my current bin-packing problem. π

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

My thought question: Have you ever been told that your query runs too fast?