• Lynn Pettis (6/1/2010)


    Its lack of scalability is its biggest problem. A recursive CTE is still RBAR (a Modenism for Row By Agonizing Row).

    Here's a little comparison between "quirky update" and rCTE which supports most folks' observation that the rCTE is slower. However, I'd disagree with you Lynn that the rCTE fails to scale - the "quirky update" is 5-7 times faster whether the dataset is 100k rows or 1 million rows. Sure it's slower - but it does scale.

    Here's the test with 1 million rows:

    DROP TABLE #Numbers

    SELECT TOP 1000000 --000

    n = ROW_NUMBER() OVER (ORDER BY a.name),

    CalcValue = CAST(NULL AS BIGINT)

    INTO #Numbers

    FROM master.dbo.syscolumns a, master.dbo.syscolumns b

    CREATE UNIQUE CLUSTERED INDEX CIn ON #Numbers ([n] ASC)

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    -- 'Quirky' update

    DECLARE @Lastval INT = 0, @CalcValue BIGINT

    UPDATE #Numbers SET

    @CalcValue = CalcValue = (@Lastval + n),

    @Lastval = n

    -- (1,000,000 row(s) affected) / CPU time = 4218 ms, elapsed time = 5719 ms.

    -- Table #Numbers... Scan count 1, logical reads 3113, physical reads 6, read-ahead reads 3146, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- Recursive CTE

    ;WITH Calculator (n, CalcValue) AS (

    SELECT n.n,

    CalcValue = CAST(n.n AS BIGINT)

    FROM #Numbers n

    WHERE n.n = 1

    UNION ALL

    SELECT n.n,

    CalcValue = n.n + c.n

    FROM #Numbers n

    INNER JOIN Calculator c ON c.n + 1 = n.n -- nice

    )

    SELECT n, CalcValue

    FROM Calculator

    OPTION (MAXRECURSION 0)

    -- (1,000,000 row(s) affected) / CPU time = 32438 ms, elapsed time = 35148 ms.

    -- Table 'Worktable'. Scan count 2, logical reads 6000001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SET STATISTICS IO Off

    SET STATISTICS TIME Off

    And with 100k rows:

    DROP TABLE #Numbers

    SELECT TOP 100000

    n = ROW_NUMBER() OVER (ORDER BY a.name),

    CalcValue = CAST(NULL AS BIGINT)

    INTO #Numbers

    FROM master.dbo.syscolumns a, master.dbo.syscolumns b

    CREATE UNIQUE CLUSTERED INDEX CIn ON #Numbers ([n] ASC)

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    -- 'Quirky' update

    DECLARE @Lastval INT = 0, @CalcValue BIGINT

    UPDATE #Numbers SET

    @CalcValue = CalcValue = (@Lastval + n),

    @Lastval = n

    -- (100000 row(s) affected) / CPU time = 454 ms, elapsed time = 526 ms.

    -- Table #Numbers... Scan count 1, logical reads 314, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- Recursive CTE

    ;WITH Calculator (n, CalcValue) AS (

    SELECT n.n,

    CalcValue = CAST(n.n AS BIGINT)

    FROM #Numbers n

    WHERE n.n = 1

    UNION ALL

    SELECT n.n,

    CalcValue = n.n + c.n

    FROM #Numbers n

    INNER JOIN Calculator c ON c.n + 1 = n.n -- nice

    )

    SELECT n, CalcValue

    FROM Calculator

    OPTION (MAXRECURSION 0)

    -- (100000 row(s) affected) / CPU time = 3203 ms, elapsed time = 3483 ms.

    -- Table 'Worktable'. Scan count 2, logical reads 600001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SET STATISTICS IO Off

    SET STATISTICS TIME Off

    Whichever measure of "resource used" you choose for comparison, both methods appear to scale in a linear manner.

    If you've absolutely got to do a job row by row, such as a running totals, then a quirky update is virtually guaranteed to complete before a rCTE. That's not the whole picture though. The rCTE gives you output which you may have to write back: the quirky update does the opposite. The rCTE is often quicker to write with a little practice too. I use both methods, sometimes even writing both for a particular job then choosing one or the other on merit - which might be readability over performance.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden