Exploring Recursive CTEs by Example

  • Comments posted to this topic are about the item Exploring Recursive CTEs by Example


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks for this excellent article.

    Regards,

    Basit A. Farooq (MSC Computing, MCITP SQL Server 2005 & 2008, MCDBA SQL Server 2000)

    http://basitaalishan.com
  • I love it! Especially the conclusion.

    It's time we stopped being afraid to whisper the words 'Recursive CTE' for fear of being heard by the RBAR police! πŸ˜‰

    Have to go....I hear a knock at the door :w00t:

  • Excellent article Dwain.

    Will have to read a few times for it all to sink in though!

    Rodders...

  • Great article!

    I noticed a little mistake, though. The 2nd example under "Geometric and Arithmetic Sequences and Progressions", you have:

    "Another example: 1^1 + 1/2^2 + 1/3^3 + 1/4^4:"

    But the very first line of the code is:

    "-- Generate first 99 of Power Series: 1+1/2^2+1/3^2+1/4^2 etc."

    Note the difference: the first is to the power of the current iteration, the latter (code example) always just squares the current quotient.

    Not sure which one you wanted, but I assume the latter as that's what the code and example generate. πŸ™‚

  • Very interesting, thanks!

  • One big caveat is the size of the datasets being used by CTEs. Be very cautions for two reasons: recursive CTE can blow up your temp db when large datasets can’t be written to memory and bad query plans are cached. Query analyzer is not good and building the best query plans with CTEs. I have had to rewrite numerous stored procedures replacing CTE with #temp tables to improve performance and get better results from query analyzer to add missing indexes.

  • Just yesterday by coincidence I needed a query to parse dynamic XML with different nodes and properties on each run and came up with this procedure using CTEs which is very fast. I'm sure it can be improved and if anyone has suggestions please offer them.

    CREATE PROCEDURE dbo.Test_ParseXML

    @doc NVARCHAR(MAX)

    ,@rootnode NVARCHAR(255)

    AS

    BEGIN

    SET NOCOUNT ON

    DECLARE

    @idoc INT

    ,@id INT

    ,@parentid INT

    SET @parentid = NULL

    SET @id = 1

    IF OBJECT_ID('tempdb..#ChildList') IS NOT NULL

    DROP TABLE #ChildList

    CREATE TABLE #ChildList (

    [RowNum] INT IDENTITY(1,1) NOT NULL,

    [parentid] INT NULL,

    [id] INT NULL,

    PRIMARY KEY (RowNum),

    UNIQUE (RowNum))

    IF OBJECT_ID('tempdb..#NodeList') IS NOT NULL

    DROP TABLE #NodeList

    CREATE TABLE #NodeList (

    [RowNum] INT NOT NULL,

    [id] INT NULL,

    [parentid] INT NULL,

    [nodetype] INT NULL,

    [localname] NVARCHAR(MAX) NULL,

    [text] NVARCHAR(MAX) NULL,

    PRIMARY KEY (RowNum),

    UNIQUE (RowNum))

    EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

    ;WITH cte

    AS (

    SELECT

    CAST(p1.parentid AS INT) AS parentid

    ,CAST(p1.id AS INT) AS id

    FROM

    OPENXML (@idoc,@rootnode,2) AS p1

    UNION ALL

    SELECT

    CAST(p2.parentid AS INT) AS parentid

    ,CAST(p2.id AS INT) AS id

    FROM

    OPENXML (@idoc,@rootnode,2) AS p2

    JOIN

    cte

    ON CAST(cte.id AS INT) = CAST(p2.ParentID AS INT)

    WHERE

    CAST(p2.parentid AS INT) = @parentid

    UNION ALL

    SELECT

    CAST(p3.parentid AS INT) AS parentid

    ,CAST(p3.id AS INT) AS id

    FROM

    OPENXML (@idoc,@rootnode,2) AS p3

    JOIN

    cte

    ON CAST(cte.id AS INT) = CAST(p3.ParentID AS INT)

    WHERE

    CAST(p3.parentid AS INT) = @parentid

    UNION ALL

    SELECT

    CAST(p4.parentid AS INT) AS parentid

    ,CAST(p4.id AS INT) AS id

    FROM

    OPENXML (@idoc,@rootnode,2) AS p4

    JOIN

    cte

    ON CAST(cte.id AS INT) = CAST(p4.ParentID AS INT)

    WHERE

    CAST(p4.parentid AS INT) = @parentid

    )

    INSERT INTO #ChildList

    SELECT *

    FROM cte

    INSERT INTO #NodeList

    SELECT

    #ChildList.RowNum

    ,xmllist.id

    ,xmllist.parentid

    ,xmllist.nodetype

    ,xmllist.localname

    ,CAST(xmllist.[text] AS NVARCHAR(MAX)) AS [text]

    FROM #ChildList

    INNER JOIN

    OPENXML (@idoc,@rootnode,2) AS xmllist

    ON #ChildList.id = xmllist.id

    WHERE

    #ChildList.RowNum > 0

    ;WITH RecursiveNodes(RowNum,id,parentid,nodepath,localname,[text],nodetype)

    AS (

    SELECT

    #NodeList.RowNum

    ,#NodeList.id

    ,#NodeList.parentid

    ,CAST('/' + REPLACE(REPLACE(REPLACE(REPLACE(#NodeList.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath

    ,#NodeList.localname

    ,CAST(#NodeList.[text] AS NVARCHAR(MAX)) AS [text]

    ,0 AS nodetype

    FROM #ChildList

    INNER JOIN

    #NodeList

    ON #ChildList.id = #NodeList.id

    WHERE

    #NodeList.parentid IS NULL

    AND #ChildList.RowNum > 0

    AND #NodeList.RowNum > 0

    UNION ALL

    SELECT

    n.RowNum

    ,n.id

    ,n.parentid

    ,CAST(r.nodepath + '/'+ REPLACE(REPLACE(REPLACE(REPLACE(n.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath

    ,n.localname

    ,n.[text]

    ,n.nodetype

    FROM #NodeList AS n

    INNER JOIN

    RecursiveNodes AS r

    ON n.parentid = r.id

    WHERE

    n.RowNum > 0

    AND r.RowNum > 0

    AND n.parentid >= 0

    )

    SELECT

    ROW_NUMBER() OVER (ORDER BY Result.RowNum) AS RowNum

    ,Result.id

    ,Result.parentid

    ,Result.nodepath

    ,Result.nodename

    ,Result.property

    ,Result.value

    ,Result.nodecontents

    FROM

    (

    SELECT

    rn.RowNum

    ,rn.id

    ,rn.parentid

    ,rn.nodepath

    ,(CASE

    WHEN rn.nodetype = 0 THEN rn.localname

    WHEN rn.nodetype = 1 THEN rn.localname

    ELSE NULL

    END) AS nodename

    ,(CASE

    WHEN rn.nodetype = 2 THEN rn.localname

    ELSE NULL

    END) AS property

    ,(CASE

    WHEN rn.nodetype = 2 THEN (SELECT [text] FROM RecursiveNodes WHERE parentid = rn.id)

    ELSE NULL

    END) AS value

    ,(CASE

    WHEN rn.nodetype = 1 THEN (SELECT TOP(1) [text] FROM RecursiveNodes WHERE parentid = rn.id)

    ELSE NULL

    END) AS nodecontents

    FROM

    RecursiveNodes AS rn

    WHERE

    rn.localname <> '#text'

    ) AS Result

    WHERE

    Result.id >= 0

    /*

    EXEC dbo.Test_ParseXML

    '<ReportPackage type="report">

    <ReferenceId>XYZ-123</ReferenceId>

    <Reports>

    <ReportId>123</ReportId>

    <Categories>

    <Category>Real Estate</Category>

    <Category>Restaurants</Category>

    </Categories>

    <Transactions>

    <Transaction name="TxId">987654</Transaction>

    </Transactions>

    </Reports>

    <CompanyData>

    <CompanyCategory type="real estate">

    <CompanyName>ABC Realty</CompanyName>

    </CompanyCategory>

    <DemographicDetail>

    <StateID issuingAuthority="NC">123445555</StateID>

    <DateExpires>2014-12-31</DateExpires>

    </DemographicDetail>

    </CompanyData>

    <ReviewStatus>

    <ReviewLevel>4.7</ReviewLevel>

    <NumberReviews>1234</NumberReviews>

    <ReviewStatus>Recommended</ReviewStatus>

    </ReviewStatus>

    </ReportPackage>',

    '/ReportPackage'

    */

    END

  • Great article, thanks. I really enjoyed going through that.

    I am looking for CTE's that will give me formulae for different predictive analysis routines.

    here is a slight coding problem, the first select in the following code should not have the "Union".

    -- Add an additional origin and destination node

    INSERT INTO #Edges

    UNION ALL SELECT '0', 'N', 15 -- ABS(CHECKSUM(NEWID())) % 100 + 25

  • FINALLY!!! Someone who loves burning up the CPU as much as I do! πŸ˜€

    As a token of gratitude, feel free to add my recursive string parser to your excursions into recursions. πŸ™‚

    [font="Courier New"]create function [sql].[StringToTable](

    @string nvarchar(max),

    @delimiter nvarchar(1))

    returns table

    /* t@sqlPROCtologist.com

    Usage:

    declare @STR nvarchar(max)='a,bc,def,ghij,klmno,pqrstu,vwxyz,0123,456,78,9';

    select * from [sql].[StringToTable](@str,',') option(maxrecursion 32767);

    */

    as return(

    with cte as(

    select

    convert(int,1) as [start],

    charindex(@delimiter, @string+@delimiter,1) as [end],

    substring(@string+@delimiter,1,charindex(@delimiter,@string+@delimiter,1)) as [string]

    union all select

    convert(int,[end]+1) as [start],

    charindex(@delimiter,@string+@delimiter,[end]+1) as [end],

    substring(@string+@delimiter,[end]+1,charindex(@delimiter,@string+@delimiter,[end]+1)-[end]) as [string]

    from cte where [end]+1<=len(@string+@delimiter)

    )

    select left(string,len(string)-1) as [String] from cte);

    go

    [/font]

  • Nice article. I consider myself recursively challenged, not because I'm bad at math, just from inexperience I suppose. It's nice to see several different uses in one place that could actually be useful. Most of the time I see them and just shake my head asking why...

  • Wow! So many interesting responses, so little time.

    First of all, thanks to Basit, David, rodjkidb, Alex and r.mitchell for your interest and feedback. I hope that something in the article will shout at you to "use me" someday.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • nem.schlecht (7/17/2012)


    Great article!

    I noticed a little mistake, though. The 2nd example under "Geometric and Arithmetic Sequences and Progressions", you have:

    "Another example: 1^1 + 1/2^2 + 1/3^3 + 1/4^4:"

    But the very first line of the code is:

    "-- Generate first 99 of Power Series: 1+1/2^2+1/3^2+1/4^2 etc."

    Note the difference: the first is to the power of the current iteration, the latter (code example) always just squares the current quotient.

    Not sure which one you wanted, but I assume the latter as that's what the code and example generate. πŸ™‚

    Nice catch Nem. It just goes to show, no matter how often you review articles like this something small is bound to slip through the cracks.

    In this case, my intention probably doesn't matter. Both sequences are power series that will (probably) converge to different values. My intention was to show how to generate a power series using an rCTE and both cases do that.

    Fortunately, I didn't design that particular rCTE to calculate the stresses on a bridge!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • DavidP-340734 (7/17/2012)


    One big caveat is the size of the datasets being used by CTEs. Be very cautions for two reasons: recursive CTE can blow up your temp db when large datasets can’t be written to memory and bad query plans are cached. Query analyzer is not good and building the best query plans with CTEs. I have had to rewrite numerous stored procedures replacing CTE with #temp tables to improve performance and get better results from query analyzer to add missing indexes.

    I personally haven't experienced this particular problem with ordinary CTEs (not saying of course that it isn't a potential problem), however with rCTEs the potential is most certainly there.

    The UNIQUEnTuples rCTE I used to generate the solution to the knapsack problem is a case in point. That bad boy generates explosive numbers of rows, so without some method to prune back the combinations sets it produces, it becomes massively unwieldy for any but the smallest of problems.

    Nonetheless, I was able to apply it to a couple of medium size problems using some pruning techniques I tailored to the problem. This thread shows those two problems, with the latter one analyzed in detail.

    http://www.sqlservercentral.com/Forums/Topic1318149-392-1.aspx

    The fun part about those two, besides of course the sheer challenge of a complex problem, was that there were one or two other posters that thought this wasn't a job for SQL.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Steven Willis (7/17/2012)


    Just yesterday by coincidence I needed a query to parse dynamic XML with different nodes and properties on each run and came up with this procedure using CTEs which is very fast. I'm sure it can be improved and if anyone has suggestions please offer them.

    CREATE PROCEDURE dbo.Test_ParseXML

    @doc NVARCHAR(MAX)

    ,@rootnode NVARCHAR(255)

    AS

    BEGIN

    SET NOCOUNT ON

    DECLARE

    @idoc INT

    ,@id INT

    ,@parentid INT

    SET @parentid = NULL

    SET @id = 1

    IF OBJECT_ID('tempdb..#ChildList') IS NOT NULL

    DROP TABLE #ChildList

    CREATE TABLE #ChildList (

    [RowNum] INT IDENTITY(1,1) NOT NULL,

    [parentid] INT NULL,

    [id] INT NULL,

    PRIMARY KEY (RowNum),

    UNIQUE (RowNum))

    IF OBJECT_ID('tempdb..#NodeList') IS NOT NULL

    DROP TABLE #NodeList

    CREATE TABLE #NodeList (

    [RowNum] INT NOT NULL,

    [id] INT NULL,

    [parentid] INT NULL,

    [nodetype] INT NULL,

    [localname] NVARCHAR(MAX) NULL,

    [text] NVARCHAR(MAX) NULL,

    PRIMARY KEY (RowNum),

    UNIQUE (RowNum))

    EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

    ;WITH cte

    AS (

    SELECT

    CAST(p1.parentid AS INT) AS parentid

    ,CAST(p1.id AS INT) AS id

    FROM

    OPENXML (@idoc,@rootnode,2) AS p1

    UNION ALL

    SELECT

    CAST(p2.parentid AS INT) AS parentid

    ,CAST(p2.id AS INT) AS id

    FROM

    OPENXML (@idoc,@rootnode,2) AS p2

    JOIN

    cte

    ON CAST(cte.id AS INT) = CAST(p2.ParentID AS INT)

    WHERE

    CAST(p2.parentid AS INT) = @parentid

    UNION ALL

    SELECT

    CAST(p3.parentid AS INT) AS parentid

    ,CAST(p3.id AS INT) AS id

    FROM

    OPENXML (@idoc,@rootnode,2) AS p3

    JOIN

    cte

    ON CAST(cte.id AS INT) = CAST(p3.ParentID AS INT)

    WHERE

    CAST(p3.parentid AS INT) = @parentid

    UNION ALL

    SELECT

    CAST(p4.parentid AS INT) AS parentid

    ,CAST(p4.id AS INT) AS id

    FROM

    OPENXML (@idoc,@rootnode,2) AS p4

    JOIN

    cte

    ON CAST(cte.id AS INT) = CAST(p4.ParentID AS INT)

    WHERE

    CAST(p4.parentid AS INT) = @parentid

    )

    INSERT INTO #ChildList

    SELECT *

    FROM cte

    INSERT INTO #NodeList

    SELECT

    #ChildList.RowNum

    ,xmllist.id

    ,xmllist.parentid

    ,xmllist.nodetype

    ,xmllist.localname

    ,CAST(xmllist.[text] AS NVARCHAR(MAX)) AS [text]

    FROM #ChildList

    INNER JOIN

    OPENXML (@idoc,@rootnode,2) AS xmllist

    ON #ChildList.id = xmllist.id

    WHERE

    #ChildList.RowNum > 0

    ;WITH RecursiveNodes(RowNum,id,parentid,nodepath,localname,[text],nodetype)

    AS (

    SELECT

    #NodeList.RowNum

    ,#NodeList.id

    ,#NodeList.parentid

    ,CAST('/' + REPLACE(REPLACE(REPLACE(REPLACE(#NodeList.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath

    ,#NodeList.localname

    ,CAST(#NodeList.[text] AS NVARCHAR(MAX)) AS [text]

    ,0 AS nodetype

    FROM #ChildList

    INNER JOIN

    #NodeList

    ON #ChildList.id = #NodeList.id

    WHERE

    #NodeList.parentid IS NULL

    AND #ChildList.RowNum > 0

    AND #NodeList.RowNum > 0

    UNION ALL

    SELECT

    n.RowNum

    ,n.id

    ,n.parentid

    ,CAST(r.nodepath + '/'+ REPLACE(REPLACE(REPLACE(REPLACE(n.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath

    ,n.localname

    ,n.[text]

    ,n.nodetype

    FROM #NodeList AS n

    INNER JOIN

    RecursiveNodes AS r

    ON n.parentid = r.id

    WHERE

    n.RowNum > 0

    AND r.RowNum > 0

    AND n.parentid >= 0

    )

    SELECT

    ROW_NUMBER() OVER (ORDER BY Result.RowNum) AS RowNum

    ,Result.id

    ,Result.parentid

    ,Result.nodepath

    ,Result.nodename

    ,Result.property

    ,Result.value

    ,Result.nodecontents

    FROM

    (

    SELECT

    rn.RowNum

    ,rn.id

    ,rn.parentid

    ,rn.nodepath

    ,(CASE

    WHEN rn.nodetype = 0 THEN rn.localname

    WHEN rn.nodetype = 1 THEN rn.localname

    ELSE NULL

    END) AS nodename

    ,(CASE

    WHEN rn.nodetype = 2 THEN rn.localname

    ELSE NULL

    END) AS property

    ,(CASE

    WHEN rn.nodetype = 2 THEN (SELECT [text] FROM RecursiveNodes WHERE parentid = rn.id)

    ELSE NULL

    END) AS value

    ,(CASE

    WHEN rn.nodetype = 1 THEN (SELECT TOP(1) [text] FROM RecursiveNodes WHERE parentid = rn.id)

    ELSE NULL

    END) AS nodecontents

    FROM

    RecursiveNodes AS rn

    WHERE

    rn.localname <> '#text'

    ) AS Result

    WHERE

    Result.id >= 0

    /*

    EXEC dbo.Test_ParseXML

    '<ReportPackage type="report">

    <ReferenceId>XYZ-123</ReferenceId>

    <Reports>

    <ReportId>123</ReportId>

    <Categories>

    <Category>Real Estate</Category>

    <Category>Restaurants</Category>

    </Categories>

    <Transactions>

    <Transaction name="TxId">987654</Transaction>

    </Transactions>

    </Reports>

    <CompanyData>

    <CompanyCategory type="real estate">

    <CompanyName>ABC Realty</CompanyName>

    </CompanyCategory>

    <DemographicDetail>

    <StateID issuingAuthority="NC">123445555</StateID>

    <DateExpires>2014-12-31</DateExpires>

    </DemographicDetail>

    </CompanyData>

    <ReviewStatus>

    <ReviewLevel>4.7</ReviewLevel>

    <NumberReviews>1234</NumberReviews>

    <ReviewStatus>Recommended</ReviewStatus>

    </ReviewStatus>

    </ReportPackage>',

    '/ReportPackage'

    */

    END

    Steven - Thanks for the great contribution! That's one heck of an impressive rCTE!

    I definitely want to take a closer look at what you're doing. I've heard some folks say there are "rare cases" where the rCTE will outperform more traditional solutions, and if this is one of them I want to try and understand why, as well as figure out ways to apply the algorithm employed to other applications.

    When I find time of course, between solving my current bin-packing problem. πŸ™‚


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 42 total)

You must be logged in to reply to this topic. Login to reply