Lynn Pettis (6/1/2010)
Its lack of scalability is its biggest problem. A recursive CTE is still RBAR (a Modenism for Row By Agonizing Row).
Here's a little comparison between "quirky update" and rCTE which supports most folks' observation that the rCTE is slower. However, I'd disagree with you Lynn that the rCTE fails to scale - the "quirky update" is 5-7 times faster whether the dataset is 100k rows or 1 million rows. Sure it's slower - but it does scale.
Here's the test with 1 million rows:
DROP TABLE #Numbers
SELECT TOP 1000000 --000
n = ROW_NUMBER() OVER (ORDER BY a.name),
CalcValue = CAST(NULL AS BIGINT)
INTO #Numbers
FROM master.dbo.syscolumns a, master.dbo.syscolumns b
CREATE UNIQUE CLUSTERED INDEX CIn ON #Numbers ([n] ASC)
SET STATISTICS IO ON
SET STATISTICS TIME ON
-- 'Quirky' update
DECLARE @Lastval INT = 0, @CalcValue BIGINT
UPDATE #Numbers SET
@CalcValue = CalcValue = (@Lastval + n),
@Lastval = n
-- (1,000,000 row(s) affected) / CPU time = 4218 ms, elapsed time = 5719 ms.
-- Table #Numbers... Scan count 1, logical reads 3113, physical reads 6, read-ahead reads 3146, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Recursive CTE
;WITH Calculator (n, CalcValue) AS (
SELECT n.n,
CalcValue = CAST(n.n AS BIGINT)
FROM #Numbers n
WHERE n.n = 1
UNION ALL
SELECT n.n,
CalcValue = n.n + c.n
FROM #Numbers n
INNER JOIN Calculator c ON c.n + 1 = n.n -- nice
)
SELECT n, CalcValue
FROM Calculator
OPTION (MAXRECURSION 0)
-- (1,000,000 row(s) affected) / CPU time = 32438 ms, elapsed time = 35148 ms.
-- Table 'Worktable'. Scan count 2, logical reads 6000001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SET STATISTICS IO Off
SET STATISTICS TIME Off
And with 100k rows:
DROP TABLE #Numbers
SELECT TOP 100000
n = ROW_NUMBER() OVER (ORDER BY a.name),
CalcValue = CAST(NULL AS BIGINT)
INTO #Numbers
FROM master.dbo.syscolumns a, master.dbo.syscolumns b
CREATE UNIQUE CLUSTERED INDEX CIn ON #Numbers ([n] ASC)
SET STATISTICS IO ON
SET STATISTICS TIME ON
-- 'Quirky' update
DECLARE @Lastval INT = 0, @CalcValue BIGINT
UPDATE #Numbers SET
@CalcValue = CalcValue = (@Lastval + n),
@Lastval = n
-- (100000 row(s) affected) / CPU time = 454 ms, elapsed time = 526 ms.
-- Table #Numbers... Scan count 1, logical reads 314, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Recursive CTE
;WITH Calculator (n, CalcValue) AS (
SELECT n.n,
CalcValue = CAST(n.n AS BIGINT)
FROM #Numbers n
WHERE n.n = 1
UNION ALL
SELECT n.n,
CalcValue = n.n + c.n
FROM #Numbers n
INNER JOIN Calculator c ON c.n + 1 = n.n -- nice
)
SELECT n, CalcValue
FROM Calculator
OPTION (MAXRECURSION 0)
-- (100000 row(s) affected) / CPU time = 3203 ms, elapsed time = 3483 ms.
-- Table 'Worktable'. Scan count 2, logical reads 600001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SET STATISTICS IO Off
SET STATISTICS TIME Off
Whichever measure of "resource used" you choose for comparison, both methods appear to scale in a linear manner.
If you've absolutely got to do a job row by row, such as a running totals, then a quirky update is virtually guaranteed to complete before a rCTE. That's not the whole picture though. The rCTE gives you output which you may have to write back: the quirky update does the opposite. The rCTE is often quicker to write with a little practice too. I use both methods, sometimes even writing both for a particular job then choosing one or the other on merit - which might be readability over performance.
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden