Paul White (2/27/2010)
It is certainly orders of magnitude faster than a recursive CTE on large sets
Hi Paul, an excellent solution as always, and a method to remember.
There are a couple of things I've noticed while working with CTE's over the last few weeks of using them in anger. The first is that, even with relatively small data sets, chaining CTE's can be a really bad idea. The optimiser appears to screw things up, row counts go through the roof and runtimes rapidly escalate. I've got an excellent example of this where breaking the chain by spooling into a temp table improved performance about 20-fold on a modest pagination routine. Funny thing is, the very first run took less than a second, before the optimiser lost it.
The other thing I've noticed is that recursive CTE's can be blisteringly fast provided that the required conditions are met.
So, reusing Jeff's sample data (thanks Jeff), clumsily ramping up to about about 3,000 accounts across about 100,000 rows, and processing a data set which already contains a contiguous row number in the correct order for execution, here goes...
First, the sample data:
--=====================================================================================================================
-- This is the original test data including the second account you added this evening
--=====================================================================================================================
SET NOCOUNT ON
IF OBJECT_ID('TempDB..#InterestRates','U') IS NOT NULL
DROP TABLE #InterestRates
;
CREATE TABLE #InterestRates
(
[InterestRate] DECIMAL(28,17) NULL,
[Month] [int] NULL,
[year] [int] NULL
)
;
INSERT INTO #InterestRates
(InterestRate,[Month],[Year])
SELECT '0.105','8','2007' UNION ALL
SELECT '0.105','9','2007' UNION ALL
SELECT '0.105','10','2007' UNION ALL
SELECT '0.105','11','2007' UNION ALL
SELECT '0.105','12','2007' UNION ALL
SELECT '0.105','1','2008' UNION ALL
SELECT '0.105','2','2008' UNION ALL
SELECT '0.105','3','2008' UNION ALL
SELECT '0.105','4','2008' UNION ALL
SELECT '0.105','5','2008' UNION ALL
SELECT '0.105','6','2008' UNION ALL
SELECT '0.105','7','2008' UNION ALL
SELECT '0.105','8','2008' UNION ALL
SELECT '0.105','9','2008' UNION ALL
SELECT '0.105','10','2008' UNION ALL
SELECT '0.105','11','2008' UNION ALL
SELECT '0.105','12','2008' UNION ALL
SELECT '0.105','1','2009' UNION ALL
SELECT '0.105','2','2009' UNION ALL
SELECT '0.105','3','2009' UNION ALL
SELECT '0.105','4','2009' UNION ALL
SELECT '0.105','5','2009' UNION ALL
SELECT '0.105','6','2009' UNION ALL
SELECT '0.0067','7','2009' UNION ALL
SELECT '0.0067','8','2009' UNION ALL
SELECT '0.0067','9','2009' UNION ALL
SELECT '0.0055','10','2009' UNION ALL
SELECT '0.0055','11','2009' UNION ALL
SELECT '0.0055','12','2009' UNION ALL
SELECT '0.0055','1','2010' UNION ALL
SELECT '0.0055','2','2010'
;
IF OBJECT_ID('TempDB..#Refunds','U') IS NOT NULL
DROP TABLE #Refunds
;
CREATE TABLE #Refunds
(
[Acct_no] varchar(20) NULL,
[Amount] DECIMAL(28,17) NULL,
[Month] [int] NULL,
[year] [int] NULL
)
;
INSERT INTO #Refunds
(Acct_no,Amount, [Month],[year])
SELECT '1271003600','333.107456586203','1','2008' UNION ALL
SELECT '1271003600','87.6816131178288','1','2009' UNION ALL
SELECT '1271003600','103.602002310821','2','2008' UNION ALL
SELECT '1271003600','81.9722910242125','2','2009' UNION ALL
SELECT '1271003600','72.3401647559977','3','2008' UNION ALL
SELECT '1271003600','86.1625895742832','3','2009' UNION ALL
SELECT '1271003600','103.361218989159','4','2008' UNION ALL
SELECT '1271003600','84.6065155612006','4','2009' UNION ALL
SELECT '1271003600','92.2834655375568','5','2008' UNION ALL
SELECT '1271003600','84.6688365382596','5','2009' UNION ALL
SELECT '1271003600','89.0687598341003','6','2008' UNION ALL
SELECT '1271003600','79.1241550533365','6','2009' UNION ALL
SELECT '1271003600','123.131877952709','7','2008' UNION ALL
SELECT '1271003600','89.3938505146093','7','2009' UNION ALL
SELECT '1271003600','87.2510378856992','8','2007' UNION ALL
SELECT '1271003600','94.0687846832911','8','2008' UNION ALL
SELECT '1271003600','340.381691069552','8','2009' UNION ALL
SELECT '1271003600','80.3632310400272','9','2007' UNION ALL
SELECT '1271003600','121.96840107118','9','2008' UNION ALL
SELECT '1271003600','87.1037670414245','10','2007' UNION ALL
SELECT '1271003600','94.9861134399298','10','2008' UNION ALL
SELECT '1271003600','73.8008470324637','11','2007' UNION ALL
SELECT '1271003600','78.537373182859','11','2008' UNION ALL
SELECT '1271003600','90.7988754111144','12','2007' UNION ALL
SELECT '1271003600','88.4954811818565','12','2008' UNION ALL
SELECT '1271005700 ','651.764','1','2008' UNION ALL
SELECT '1271005700 ','650.766','1','2009' UNION ALL
SELECT '1271005700 ','897.994','2','2008' UNION ALL
SELECT '1271005700 ','688.486','2','2009' UNION ALL
SELECT '1271005700 ','721.466','3','2008' UNION ALL
SELECT '1271005700 ','718.654','3','2009' UNION ALL
SELECT '1271005700 ','913.072','4','2008' UNION ALL
SELECT '1271005700 ','863.224','4','2009' UNION ALL
SELECT '1271005700 ','735.081','5','2008' UNION ALL
SELECT '1271005700 ','647.368','5','2009' UNION ALL
SELECT '1271005700 ','607.721','6','2008' UNION ALL
SELECT '1271005700 ','580.359','6','2009' UNION ALL
SELECT '1271005700 ','718.969','7','2008' UNION ALL
SELECT '1271005700 ','693.378','7','2009' UNION ALL
SELECT '1271005700 ','734.005','8','2007' UNION ALL
SELECT '1271005700 ','591.788','8','2008' UNION ALL
SELECT '1271005700 ','679.773','8','2009' UNION ALL
SELECT '1271005700 ','560.296','9','2007' UNION ALL
SELECT '1271005700 ','685.047','9','2008' UNION ALL
SELECT '1271005700 ','682.834','10','2007' UNION ALL
SELECT '1271005700 ','608.706','10','2008' UNION ALL
SELECT '1271005700 ','586.172','11','2007' UNION ALL
SELECT '1271005700 ','650.191','11','2008' UNION ALL
SELECT '1271005700 ','858.672','12','2007' UNION ALL
SELECT '1271005700 ','711.708','12','2008'
;
--=====================================================================================================================
-- Solution to the problem starts here
--=====================================================================================================================
--===== Conditionally drop the work table to make reruns for any troubleshooting easier.
IF OBJECT_ID('TempDB..#Work','U') IS NOT NULL
DROP TABLE #Work;
--===== This set of cascading CTE's preconditions the data for processing and later display
WITH
cteAccount AS
( --=== Find all distinct account numbers so we can apply to all dates for rates without dupes
SELECT DISTINCT Acct_No
FROM #Refunds
),
cteRate AS
( --=== Apply the account number to all date rates without dupes
SELECT acct.Acct_No,
rate.[Year],
rate.[Month],
rate.InterestRate/12 AS MonthRate
FROM #InterestRates rate
CROSS JOIN cteAccount acct
)
SELECT ExecSeq = ISNULL(CAST(0 AS INT), 0),
ISNULL(rate.Acct_No,'') AS Acct_No,
ISNULL(DATEADD(mm,rate.[Month],DATEADD(yy,rate.[YEAR]-1900,0)),0) AS MonthDate,
rate.MonthRate,
ISNULL(amt.Amount,0) AS Amount,
CAST(NULL AS DECIMAL(28,17)) AS InterestAmount,
CAST(NULL AS DECIMAL(28,17)) AS Balance
INTO #Work
FROM cteRate rate
LEFT JOIN #Refunds amt
ON rate.Acct_No = amt.Acct_No
AND rate.[Year] = amt.[Year]
AND rate.[Month] = amt.[Month]
;
-- scale up the table yuck but it's a one-off
DECLARE @Acct_no INT, @Counter INT
SET @Counter = 1
WHILE @Counter < 3220 -- 322 corresponds to 10,000-ish rows
BEGIN
SELECT @Acct_no = MAX(Acct_no) FROM #Work
INSERT INTO #Work (ExecSeq, Acct_no, MonthDate, MonthRate, Amount)
SELECT ExecSeq, Acct_no = CAST(@Acct_no+1 AS VARCHAR(20)), MonthDate, MonthRate, Amount
FROM #Work WHERE Acct_no = CAST(@Acct_no AS VARCHAR(20)) -- 25 rows
SET @Counter = @Counter + 1
END
-- put in the all important execution order
UPDATE w SET ExecSeq = t.ExecSeq
FROM #Work w
INNER JOIN (SELECT ExecSeq = ROW_NUMBER() OVER(ORDER BY Acct_no, MonthDate), Acct_no, MonthDate FROM #Work) t
ON t.Acct_no = w.Acct_no AND t.MonthDate = w.MonthDate
--===== Add the very useful clustered index
CREATE CLUSTERED INDEX [ExecSeq] ON #Work ([ExecSeq] ASC)
Here's the slightly modified solution:
;WITH Calculator AS (
SELECT ExecSeq,
Acct_no,
Amount,
MonthDate,
MonthRate,
InterestAmount = ISNULL(InterestAmount,0)
FROM #Work w
WHERE ExecSeq = 1
UNION ALL
SELECT cr.ExecSeq,
ISNULL(cr.Acct_no, lr.Acct_no),
ISNULL(cr.Amount, 0),
cr.MonthDate,
cr.MonthRate,
InterestAmount = CAST(CASE WHEN cr.Acct_no = lr.Acct_no THEN (lr.Amount + lr.InterestAmount) * (1 + (cr.MonthRate)) ELSE 0 END AS DECIMAL(28,17))
FROM Calculator lr
INNER JOIN #Work cr ON cr.ExecSeq = lr.ExecSeq+1
)
SELECT * FROM Calculator ORDER BY ExecSeq OPTION (MAXRECURSION 0)
This runs in 5 seconds, or 3 if the ORDER BY is removed (???). If a single row is selected, say the second last (ExecSeq = 99850), it takes about 3 seconds and the result is correct.
100,000 rows, 5 seconds - that's not a misprint either Uncle Jeff 😛
A running total over a million rows of a simple but properly structured and indexed table can take as little as three or four seconds using this method.
Option 1, version 2.
Nathan, regarding recursive CTE's; "the reference to self in the recursive part represents the results of the previous iteration". It might be nowhere near what SQL Server is doing under the bonnet but it's all you need to know to make recursive CTE's work.
Cheers
ChrisM
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]