Calculating interest query

  • nathan 7372 (2/26/2010)


    Currently there is no running balance column in the table. If I need to add one let me know. I also have no indexes at this point. I probably should add some but I'm not entirely sure what would be a useful index at this point.

    Thanks for the quick feedback. Not to worry... let's peel one potato at a time. First, we'll get you the answer you need and then we'll work on your table. I'm almost done.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... here we go. This method is commonly referred to as the "Quirky Update" method and it's appropriately named because if you don't follow all the rules to a "T", it'll get "quirky" on you and it'll come up with the wrong answers. If you do follow the rules, it's rock solid accurate and nasty fast.

    There are some rules that won't be apparent in the code below. To wit, I'd really like you to take the time (it's 22 pages long in MS Word) to read the article at the following URL. Don't let the warnings scare you if you follow the rules. Sorry to sound brutal about this but I value your data as much as I do my own so if you think you're smarter than the rules and decide not to follow the rules, then don't use the "Quirky Update" method. Instead, use a well written FORWARD ONLY, READ ONLY, STATIC cursor which (not including a CLR) is the third fastest method (the second fastest is a bit more complicated) for doing running totals of this nature.

    http://www.sqlservercentral.com/articles/T-SQL/68467/

    Now, if you do decide to follow the rules, prepare to be absolutely amazed. The following code includes the test data you currently have in your original post as well as the code to do your running balances for all accounts in a Temp table. If it seems like it didn't actually run, that's the amazing part... shear blinding speed. 😉 Your 222k rows should take about 2 seconds to run and most of that is setting up the Temp Table and the required clustered index. The run time doesn't include the display time.

    I should also mention that I shifted from FLOAT to 17 place decimal numbers. You'll find I didn't come up with the same total for the one account as you did because I'm running just a bit more accurate because less rounding is occuring (I think). I also split out the monthly interest so you can see how much it actually is. The "Balance" column is what you were referring to as the "Interest(Month)".

    Here's the code:

    --=====================================================================================================================

    -- This is the original test data including the second account you added this evening

    --=====================================================================================================================

    IF OBJECT_ID('TempDB..#InterestRates','U') IS NOT NULL

    DROP TABLE #InterestRates

    ;

    CREATE TABLE #InterestRates

    (

    [InterestRate] DECIMAL(28,17) NULL,

    [Month] [int] NULL,

    [year] [int] NULL

    )

    ;

    INSERT INTO #InterestRates

    (InterestRate,[Month],[Year])

    SELECT '0.105','8','2007' UNION ALL

    SELECT '0.105','9','2007' UNION ALL

    SELECT '0.105','10','2007' UNION ALL

    SELECT '0.105','11','2007' UNION ALL

    SELECT '0.105','12','2007' UNION ALL

    SELECT '0.105','1','2008' UNION ALL

    SELECT '0.105','2','2008' UNION ALL

    SELECT '0.105','3','2008' UNION ALL

    SELECT '0.105','4','2008' UNION ALL

    SELECT '0.105','5','2008' UNION ALL

    SELECT '0.105','6','2008' UNION ALL

    SELECT '0.105','7','2008' UNION ALL

    SELECT '0.105','8','2008' UNION ALL

    SELECT '0.105','9','2008' UNION ALL

    SELECT '0.105','10','2008' UNION ALL

    SELECT '0.105','11','2008' UNION ALL

    SELECT '0.105','12','2008' UNION ALL

    SELECT '0.105','1','2009' UNION ALL

    SELECT '0.105','2','2009' UNION ALL

    SELECT '0.105','3','2009' UNION ALL

    SELECT '0.105','4','2009' UNION ALL

    SELECT '0.105','5','2009' UNION ALL

    SELECT '0.105','6','2009' UNION ALL

    SELECT '0.0067','7','2009' UNION ALL

    SELECT '0.0067','8','2009' UNION ALL

    SELECT '0.0067','9','2009' UNION ALL

    SELECT '0.0055','10','2009' UNION ALL

    SELECT '0.0055','11','2009' UNION ALL

    SELECT '0.0055','12','2009' UNION ALL

    SELECT '0.0055','1','2010' UNION ALL

    SELECT '0.0055','2','2010'

    ;

    IF OBJECT_ID('TempDB..#Refunds','U') IS NOT NULL

    DROP TABLE #Refunds

    ;

    CREATE TABLE #Refunds

    (

    [Acct_no] varchar(20) NULL,

    [Amount] DECIMAL(28,17) NULL,

    [Month] [int] NULL,

    [year] [int] NULL

    )

    ;

    INSERT INTO #Refunds

    (Acct_no,Amount, [Month],[year])

    SELECT '1271003600','333.107456586203','1','2008' UNION ALL

    SELECT '1271003600','87.6816131178288','1','2009' UNION ALL

    SELECT '1271003600','103.602002310821','2','2008' UNION ALL

    SELECT '1271003600','81.9722910242125','2','2009' UNION ALL

    SELECT '1271003600','72.3401647559977','3','2008' UNION ALL

    SELECT '1271003600','86.1625895742832','3','2009' UNION ALL

    SELECT '1271003600','103.361218989159','4','2008' UNION ALL

    SELECT '1271003600','84.6065155612006','4','2009' UNION ALL

    SELECT '1271003600','92.2834655375568','5','2008' UNION ALL

    SELECT '1271003600','84.6688365382596','5','2009' UNION ALL

    SELECT '1271003600','89.0687598341003','6','2008' UNION ALL

    SELECT '1271003600','79.1241550533365','6','2009' UNION ALL

    SELECT '1271003600','123.131877952709','7','2008' UNION ALL

    SELECT '1271003600','89.3938505146093','7','2009' UNION ALL

    SELECT '1271003600','87.2510378856992','8','2007' UNION ALL

    SELECT '1271003600','94.0687846832911','8','2008' UNION ALL

    SELECT '1271003600','340.381691069552','8','2009' UNION ALL

    SELECT '1271003600','80.3632310400272','9','2007' UNION ALL

    SELECT '1271003600','121.96840107118','9','2008' UNION ALL

    SELECT '1271003600','87.1037670414245','10','2007' UNION ALL

    SELECT '1271003600','94.9861134399298','10','2008' UNION ALL

    SELECT '1271003600','73.8008470324637','11','2007' UNION ALL

    SELECT '1271003600','78.537373182859','11','2008' UNION ALL

    SELECT '1271003600','90.7988754111144','12','2007' UNION ALL

    SELECT '1271003600','88.4954811818565','12','2008' UNION ALL

    SELECT '1271005700 ','651.764','1','2008' UNION ALL

    SELECT '1271005700 ','650.766','1','2009' UNION ALL

    SELECT '1271005700 ','897.994','2','2008' UNION ALL

    SELECT '1271005700 ','688.486','2','2009' UNION ALL

    SELECT '1271005700 ','721.466','3','2008' UNION ALL

    SELECT '1271005700 ','718.654','3','2009' UNION ALL

    SELECT '1271005700 ','913.072','4','2008' UNION ALL

    SELECT '1271005700 ','863.224','4','2009' UNION ALL

    SELECT '1271005700 ','735.081','5','2008' UNION ALL

    SELECT '1271005700 ','647.368','5','2009' UNION ALL

    SELECT '1271005700 ','607.721','6','2008' UNION ALL

    SELECT '1271005700 ','580.359','6','2009' UNION ALL

    SELECT '1271005700 ','718.969','7','2008' UNION ALL

    SELECT '1271005700 ','693.378','7','2009' UNION ALL

    SELECT '1271005700 ','734.005','8','2007' UNION ALL

    SELECT '1271005700 ','591.788','8','2008' UNION ALL

    SELECT '1271005700 ','679.773','8','2009' UNION ALL

    SELECT '1271005700 ','560.296','9','2007' UNION ALL

    SELECT '1271005700 ','685.047','9','2008' UNION ALL

    SELECT '1271005700 ','682.834','10','2007' UNION ALL

    SELECT '1271005700 ','608.706','10','2008' UNION ALL

    SELECT '1271005700 ','586.172','11','2007' UNION ALL

    SELECT '1271005700 ','650.191','11','2008' UNION ALL

    SELECT '1271005700 ','858.672','12','2007' UNION ALL

    SELECT '1271005700 ','711.708','12','2008'

    ;

    --=====================================================================================================================

    -- Solution to the problem starts here

    --=====================================================================================================================

    --===== Conditionally drop the work table to make reruns for any troubleshooting easier.

    IF OBJECT_ID('TempDB..#Work','U') IS NOT NULL

    DROP TABLE #Work

    ;

    --===== Declare local variables to support the upcoming running total update

    DECLARE @PrevAcct VARCHAR(20),

    @PrevBal DECIMAL(28,17),

    @IntAmt DECIMAL(28,17)

    ;

    --===== This set of cascading CTE's preconditions the data for processing and later display

    WITH

    cteAccount AS

    ( --=== Find all distinct account numbers so we can apply to all dates for rates without dupes

    SELECT DISTINCT Acct_No

    FROM #Refunds

    ),

    cteRate AS

    ( --=== Apply the account number to all date rates without dupes

    SELECT acct.Acct_No,

    rate.[Year],

    rate.[Month],

    rate.InterestRate/12 AS MonthRate

    FROM #InterestRates rate

    CROSS JOIN cteAccount acct

    )

    SELECT ISNULL(rate.Acct_No,'') AS Acct_No,

    ISNULL(DATEADD(mm,rate.[Month],DATEADD(yy,rate.[YEAR]-1900,0)),0) AS MonthDate,

    rate.MonthRate,

    ISNULL(amt.Amount,0) AS Amount,

    CAST(NULL AS DECIMAL(28,17)) AS InterestAmount,

    CAST(NULL AS DECIMAL(28,17)) AS Balance

    INTO #Work

    FROM cteRate rate

    LEFT JOIN #Refunds amt

    ON rate.Acct_No = amt.Acct_No

    AND rate.[Year] = amt.[Year]

    AND rate.[Month] = amt.[Month]

    ;

    --===== Add the required clustered index

    ALTER TABLE #Work

    ADD PRIMARY KEY CLUSTERED (Acct_No, MonthDate) WITH FILLFACTOR = 100

    ;

    --===== Do the update to calculate the interest amounts and the running total balance.

    -- This will do a million rows in about 3 to 7 seconds

    UPDATE #Work

    SET @IntAmt = InterestAmount = CASE

    WHEN @PrevAcct = Acct_No

    THEN (@PrevBal + Amount) * Monthrate

    ELSE (Amount) * MonthRate

    END,

    @PrevBal = Balance = CASE

    WHEN @PrevAcct = Acct_No

    THEN @PrevBal + Amount + @IntAmt

    ELSE Amount + @IntAmt

    END,

    @PrevAcct = Acct_No

    FROM #Work WITH(TABLOCKX) --Not required on a temp table, but good habit for this type of update.

    OPTION (MAXDOP 1) --Absolutely essential whether you use a temp table or not.

    ;

    ... and here's a bit on how you might display things... we can round the amounts to 2 decimal places, if you'd like.

    SELECT Acct_No, RIGHT(CONVERT(CHAR(11),MonthDate,113),8) AS [Month], Amount, InterestAmount, Balance

    FROM #Work

    ORDER BY Acct_No, MonthDate

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • For completeness, I'm going to add a new method to the mix (before anyone else jumps in!).

    This is a method called 'set-based iteration' by MVP Hugo Kornelis, which was first brought to my attention here in a discussion of one of Jeff's fine articles.

    While not quite as fast as the Quirky Update, it is in the same ballpark. It is faster than a recursive CTE on large sets, and even manages to out-perform the best CLR solution I have been able to put together, at least for some data sets.

    The key to performance with the set-based iteration method is to have at least several groups in the data. It operates on a divide-and-conquer basis, and the unit of division is the group. In the present case, these groups are accounts - in general, they are the things that reset the running total. Worst case, if there is only one group, this method performs worse than an optimized cursor - so be aware of that.

    In the sample data provided, there are two accounts, which is enough to make the method viable, but certainly not optimal. My guess is that the real data set contains many accounts, so that should be all good.

    I have shamelessly re-used Jeff's fine test script, up to the point marked 'Solution to the problem starts here'. The following code runs from that point, and again draws heavily on Jeff's work. I hope he won't mind.

    -- A work table to hold the reformatted data, and

    -- ultimately, the results

    CREATE TABLE #Work

    (

    Acct_No VARCHAR(20) NOT NULL,

    MonthDate DATETIME NOT NULL,

    MonthRate DECIMAL(19,12) NOT NULL,

    Amount DECIMAL(19,12) NOT NULL,

    InterestAmount DECIMAL(19,12) NOT NULL,

    RunningTotal DECIMAL(19,12) NOT NULL,

    RowRank BIGINT NOT NULL

    );

    -- Slightly modified from Jeff's example, to prepare

    -- for the set-based iteration method

    WITH Accounts

    AS (

    -- Get a list of the account numbers

    SELECT DISTINCT Acct_No

    FROM #Refunds

    ),

    Rates

    AS (

    -- Apply all the accounts to all the rates

    SELECT A.Acct_No,

    R.[Year],

    R.[Month],

    MonthRate = R.InterestRate / 12

    FROM #InterestRates R

    CROSS

    JOIN Accounts A

    ),

    BaseData

    AS (

    -- The basic data we need to work with

    SELECT Acct_No = ISNULL(R.Acct_No,''),

    MonthDate = ISNULL(DATEADD(MONTH, R.[Month], DATEADD(YEAR, R.[year] - 1900, 0)), 0),

    R.MonthRate,

    Amount = ISNULL(RF.Amount,0),

    InterestAmount = ISNULL(RF.Amount,0) * R.MonthRate,

    RunningTotal = ISNULL(RF.Amount,0)

    FROM Rates R

    LEFT

    JOIN #Refunds RF

    ON RF.Acct_No = R.Acct_No

    AND RF.[Year] = R.[Year]

    AND RF.[Month] = R.[Month]

    )

    -- Basic data plus a rank id, numbering the rows by MonthDate, and resetting to 1 for each new Account

    INSERT #Work

    (Acct_No, MonthDate, MonthRate, Amount, InterestAmount, RunningTotal, RowRank)

    SELECT BD.Acct_No, BD.MonthDate, BD.MonthRate, BD.Amount, BD.InterestAmount, BD.RunningTotal,

    RowRank = RANK() OVER (PARTITION BY BD.Acct_No ORDER BY MonthDate)

    FROM BaseData BD;

    -- An index to speed the next stage (different from that used with the Quirky Update method)

    CREATE UNIQUE CLUSTERED INDEX nc1 ON #Work (RowRank, Acct_No);

    -- Iteration variables

    DECLARE @Rank BIGINT,

    @RowCount INTEGER;

    -- Initialize

    SELECT @Rank = 1,

    @RowCount = 1;

    -- This is the iteration bit, processes a rank id per iteration

    -- The number of rows processed with each iteration is equal to the number of groups in the data

    -- More groups --> greater efficiency

    WHILE (1 = 1)

    BEGIN

    SET @Rank = @Rank + 1;

    -- Set-based update with running totals for the current rank id

    UPDATE This

    SET InterestAmount = (Previous.RunningTotal + This.Amount) * This.MonthRate,

    RunningTotal = Previous.RunningTotal + This.Amount + (Previous.RunningTotal + This.Amount) * This.MonthRate

    FROM #Work This

    JOIN #Work Previous

    ON Previous.Acct_No = This.Acct_No

    AND Previous.RowRank = @Rank - 1

    WHERE This.RowRank = @Rank;

    IF (@@ROWCOUNT = 0) BREAK;

    END;

    -- Show the results in natural order

    SELECT *

    FROM #Work

    ORDER BY

    Acct_No, RowRank;

    I continue to recommend the Quirky Update over this method, for sheer speed, but this method does have the advantage of being fully documented and supported. I present it as 'Option 2'.

    Paul

    edit: to reflect the improvements in the recursive CTE method

  • Paul White (2/27/2010)


    It is certainly orders of magnitude faster than a recursive CTE on large sets

    Hi Paul, an excellent solution as always, and a method to remember.

    There are a couple of things I've noticed while working with CTE's over the last few weeks of using them in anger. The first is that, even with relatively small data sets, chaining CTE's can be a really bad idea. The optimiser appears to screw things up, row counts go through the roof and runtimes rapidly escalate. I've got an excellent example of this where breaking the chain by spooling into a temp table improved performance about 20-fold on a modest pagination routine. Funny thing is, the very first run took less than a second, before the optimiser lost it.

    The other thing I've noticed is that recursive CTE's can be blisteringly fast provided that the required conditions are met.

    So, reusing Jeff's sample data (thanks Jeff), clumsily ramping up to about about 3,000 accounts across about 100,000 rows, and processing a data set which already contains a contiguous row number in the correct order for execution, here goes...

    First, the sample data:

    --=====================================================================================================================

    -- This is the original test data including the second account you added this evening

    --=====================================================================================================================

    SET NOCOUNT ON

    IF OBJECT_ID('TempDB..#InterestRates','U') IS NOT NULL

    DROP TABLE #InterestRates

    ;

    CREATE TABLE #InterestRates

    (

    [InterestRate] DECIMAL(28,17) NULL,

    [Month] [int] NULL,

    [year] [int] NULL

    )

    ;

    INSERT INTO #InterestRates

    (InterestRate,[Month],[Year])

    SELECT '0.105','8','2007' UNION ALL

    SELECT '0.105','9','2007' UNION ALL

    SELECT '0.105','10','2007' UNION ALL

    SELECT '0.105','11','2007' UNION ALL

    SELECT '0.105','12','2007' UNION ALL

    SELECT '0.105','1','2008' UNION ALL

    SELECT '0.105','2','2008' UNION ALL

    SELECT '0.105','3','2008' UNION ALL

    SELECT '0.105','4','2008' UNION ALL

    SELECT '0.105','5','2008' UNION ALL

    SELECT '0.105','6','2008' UNION ALL

    SELECT '0.105','7','2008' UNION ALL

    SELECT '0.105','8','2008' UNION ALL

    SELECT '0.105','9','2008' UNION ALL

    SELECT '0.105','10','2008' UNION ALL

    SELECT '0.105','11','2008' UNION ALL

    SELECT '0.105','12','2008' UNION ALL

    SELECT '0.105','1','2009' UNION ALL

    SELECT '0.105','2','2009' UNION ALL

    SELECT '0.105','3','2009' UNION ALL

    SELECT '0.105','4','2009' UNION ALL

    SELECT '0.105','5','2009' UNION ALL

    SELECT '0.105','6','2009' UNION ALL

    SELECT '0.0067','7','2009' UNION ALL

    SELECT '0.0067','8','2009' UNION ALL

    SELECT '0.0067','9','2009' UNION ALL

    SELECT '0.0055','10','2009' UNION ALL

    SELECT '0.0055','11','2009' UNION ALL

    SELECT '0.0055','12','2009' UNION ALL

    SELECT '0.0055','1','2010' UNION ALL

    SELECT '0.0055','2','2010'

    ;

    IF OBJECT_ID('TempDB..#Refunds','U') IS NOT NULL

    DROP TABLE #Refunds

    ;

    CREATE TABLE #Refunds

    (

    [Acct_no] varchar(20) NULL,

    [Amount] DECIMAL(28,17) NULL,

    [Month] [int] NULL,

    [year] [int] NULL

    )

    ;

    INSERT INTO #Refunds

    (Acct_no,Amount, [Month],[year])

    SELECT '1271003600','333.107456586203','1','2008' UNION ALL

    SELECT '1271003600','87.6816131178288','1','2009' UNION ALL

    SELECT '1271003600','103.602002310821','2','2008' UNION ALL

    SELECT '1271003600','81.9722910242125','2','2009' UNION ALL

    SELECT '1271003600','72.3401647559977','3','2008' UNION ALL

    SELECT '1271003600','86.1625895742832','3','2009' UNION ALL

    SELECT '1271003600','103.361218989159','4','2008' UNION ALL

    SELECT '1271003600','84.6065155612006','4','2009' UNION ALL

    SELECT '1271003600','92.2834655375568','5','2008' UNION ALL

    SELECT '1271003600','84.6688365382596','5','2009' UNION ALL

    SELECT '1271003600','89.0687598341003','6','2008' UNION ALL

    SELECT '1271003600','79.1241550533365','6','2009' UNION ALL

    SELECT '1271003600','123.131877952709','7','2008' UNION ALL

    SELECT '1271003600','89.3938505146093','7','2009' UNION ALL

    SELECT '1271003600','87.2510378856992','8','2007' UNION ALL

    SELECT '1271003600','94.0687846832911','8','2008' UNION ALL

    SELECT '1271003600','340.381691069552','8','2009' UNION ALL

    SELECT '1271003600','80.3632310400272','9','2007' UNION ALL

    SELECT '1271003600','121.96840107118','9','2008' UNION ALL

    SELECT '1271003600','87.1037670414245','10','2007' UNION ALL

    SELECT '1271003600','94.9861134399298','10','2008' UNION ALL

    SELECT '1271003600','73.8008470324637','11','2007' UNION ALL

    SELECT '1271003600','78.537373182859','11','2008' UNION ALL

    SELECT '1271003600','90.7988754111144','12','2007' UNION ALL

    SELECT '1271003600','88.4954811818565','12','2008' UNION ALL

    SELECT '1271005700 ','651.764','1','2008' UNION ALL

    SELECT '1271005700 ','650.766','1','2009' UNION ALL

    SELECT '1271005700 ','897.994','2','2008' UNION ALL

    SELECT '1271005700 ','688.486','2','2009' UNION ALL

    SELECT '1271005700 ','721.466','3','2008' UNION ALL

    SELECT '1271005700 ','718.654','3','2009' UNION ALL

    SELECT '1271005700 ','913.072','4','2008' UNION ALL

    SELECT '1271005700 ','863.224','4','2009' UNION ALL

    SELECT '1271005700 ','735.081','5','2008' UNION ALL

    SELECT '1271005700 ','647.368','5','2009' UNION ALL

    SELECT '1271005700 ','607.721','6','2008' UNION ALL

    SELECT '1271005700 ','580.359','6','2009' UNION ALL

    SELECT '1271005700 ','718.969','7','2008' UNION ALL

    SELECT '1271005700 ','693.378','7','2009' UNION ALL

    SELECT '1271005700 ','734.005','8','2007' UNION ALL

    SELECT '1271005700 ','591.788','8','2008' UNION ALL

    SELECT '1271005700 ','679.773','8','2009' UNION ALL

    SELECT '1271005700 ','560.296','9','2007' UNION ALL

    SELECT '1271005700 ','685.047','9','2008' UNION ALL

    SELECT '1271005700 ','682.834','10','2007' UNION ALL

    SELECT '1271005700 ','608.706','10','2008' UNION ALL

    SELECT '1271005700 ','586.172','11','2007' UNION ALL

    SELECT '1271005700 ','650.191','11','2008' UNION ALL

    SELECT '1271005700 ','858.672','12','2007' UNION ALL

    SELECT '1271005700 ','711.708','12','2008'

    ;

    --=====================================================================================================================

    -- Solution to the problem starts here

    --=====================================================================================================================

    --===== Conditionally drop the work table to make reruns for any troubleshooting easier.

    IF OBJECT_ID('TempDB..#Work','U') IS NOT NULL

    DROP TABLE #Work;

    --===== This set of cascading CTE's preconditions the data for processing and later display

    WITH

    cteAccount AS

    ( --=== Find all distinct account numbers so we can apply to all dates for rates without dupes

    SELECT DISTINCT Acct_No

    FROM #Refunds

    ),

    cteRate AS

    ( --=== Apply the account number to all date rates without dupes

    SELECT acct.Acct_No,

    rate.[Year],

    rate.[Month],

    rate.InterestRate/12 AS MonthRate

    FROM #InterestRates rate

    CROSS JOIN cteAccount acct

    )

    SELECT ExecSeq = ISNULL(CAST(0 AS INT), 0),

    ISNULL(rate.Acct_No,'') AS Acct_No,

    ISNULL(DATEADD(mm,rate.[Month],DATEADD(yy,rate.[YEAR]-1900,0)),0) AS MonthDate,

    rate.MonthRate,

    ISNULL(amt.Amount,0) AS Amount,

    CAST(NULL AS DECIMAL(28,17)) AS InterestAmount,

    CAST(NULL AS DECIMAL(28,17)) AS Balance

    INTO #Work

    FROM cteRate rate

    LEFT JOIN #Refunds amt

    ON rate.Acct_No = amt.Acct_No

    AND rate.[Year] = amt.[Year]

    AND rate.[Month] = amt.[Month]

    ;

    -- scale up the table yuck but it's a one-off

    DECLARE @Acct_no INT, @Counter INT

    SET @Counter = 1

    WHILE @Counter < 3220 -- 322 corresponds to 10,000-ish rows

    BEGIN

    SELECT @Acct_no = MAX(Acct_no) FROM #Work

    INSERT INTO #Work (ExecSeq, Acct_no, MonthDate, MonthRate, Amount)

    SELECT ExecSeq, Acct_no = CAST(@Acct_no+1 AS VARCHAR(20)), MonthDate, MonthRate, Amount

    FROM #Work WHERE Acct_no = CAST(@Acct_no AS VARCHAR(20)) -- 25 rows

    SET @Counter = @Counter + 1

    END

    -- put in the all important execution order

    UPDATE w SET ExecSeq = t.ExecSeq

    FROM #Work w

    INNER JOIN (SELECT ExecSeq = ROW_NUMBER() OVER(ORDER BY Acct_no, MonthDate), Acct_no, MonthDate FROM #Work) t

    ON t.Acct_no = w.Acct_no AND t.MonthDate = w.MonthDate

    --===== Add the very useful clustered index

    CREATE CLUSTERED INDEX [ExecSeq] ON #Work ([ExecSeq] ASC)

    Here's the slightly modified solution:

    ;WITH Calculator AS (

    SELECT ExecSeq,

    Acct_no,

    Amount,

    MonthDate,

    MonthRate,

    InterestAmount = ISNULL(InterestAmount,0)

    FROM #Work w

    WHERE ExecSeq = 1

    UNION ALL

    SELECT cr.ExecSeq,

    ISNULL(cr.Acct_no, lr.Acct_no),

    ISNULL(cr.Amount, 0),

    cr.MonthDate,

    cr.MonthRate,

    InterestAmount = CAST(CASE WHEN cr.Acct_no = lr.Acct_no THEN (lr.Amount + lr.InterestAmount) * (1 + (cr.MonthRate)) ELSE 0 END AS DECIMAL(28,17))

    FROM Calculator lr

    INNER JOIN #Work cr ON cr.ExecSeq = lr.ExecSeq+1

    )

    SELECT * FROM Calculator ORDER BY ExecSeq OPTION (MAXRECURSION 0)

    This runs in 5 seconds, or 3 if the ORDER BY is removed (???). If a single row is selected, say the second last (ExecSeq = 99850), it takes about 3 seconds and the result is correct.

    100,000 rows, 5 seconds - that's not a misprint either Uncle Jeff 😛

    A running total over a million rows of a simple but properly structured and indexed table can take as little as three or four seconds using this method.

    Option 1, version 2.

    Nathan, regarding recursive CTE's; "the reference to self in the recursive part represents the results of the previous iteration". It might be nowhere near what SQL Server is doing under the bonnet but it's all you need to know to make recursive CTE's work.

    Cheers

    ChrisM


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Hi Chris,

    I had to add a quick index on acct_no to get the table scale-up thing to work in my lifetime (the MAX gets out of hand without it!), but other than that:

    This looks to be an interesting way to speed up this sort of recursive query, and it certainly seems to work. There are a number of similarities to the set-based iteration method, like the pre-computing of an order using a ranking function. I want to look at this a bit more deeply before saying much more, but the index spool at the 'lowest level' of the plan seems to be one of the keys to success here - it saves scanning the input table a good many times. I would expect an explicit index on the same keys as the spool to be beneficial, I'll come back to this.

    You're right that recursive CTEs can be fast, but it is relative. I sometimes use them to seek down the distinct keys of a large index rather than scanning the whole thing and relying on an aggregate, for example. Where a good set-based alternative exists I usually go with that. Maybe I'll revisit that once I've had a deeper think on this.

    I must admit I spent a few minutes expanding the data set to work with the set-based iteration method to see how that went. The recursive CTE ran for just over seven seconds on my old laptop before starting to return results. The set-based iteration method ran for two 😛

    Paul

  • Chris,

    Turns out all that is needed to get rid of the index spool is to define the clustered index as UNIQUE. Given that information, a better plan is produced (without the extra index spool), and the recursive CTE now starts to return results after four seconds on my laptop. Well done!

    Paul

  • Paul White (2/27/2010)


    Chris,

    Turns out all that is needed to get rid of the index spool is to define the clustered index as UNIQUE. Given that information, a better plan is produced (without the extra index spool), and the recursive CTE now starts to return results after four seconds on my laptop. Well done!

    Paul

    Heh nice work Paul!

    I figured it would be sufficient to bring the time down to a few seconds - it only took half an hour with Jeff's modified sample generator - but this is the icing on the cake. I'll try this on the pagination query at work. Thanks!


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Paul White (2/27/2010)


    This is a method called 'set-based iteration' by MVP Hugo Kornelis, which was first brought to my attention here in a discussion of one of Jeff's fine articles.

    Heh... that's why I listed cursors as the "third fastest". I even helped Hugo with a tweak on his fine method.

    I have shamelessly re-used Jeff's fine test script, up to the point marked 'Solution to the problem starts here'. The following code runs from that point, and again draws heavily on Jeff's work. I hope he won't mind.

    I never mind but this one was all provided by the OP. 🙂 He did a really fine job of providing test tables and data especially for a newbie to the forum.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White (2/27/2010)


    Chris,

    Turns out all that is needed to get rid of the index spool is to define the clustered index as UNIQUE. Given that information, a better plan is produced (without the extra index spool), and the recursive CTE now starts to return results after four seconds on my laptop. Well done!

    Paul

    Was that for the 100k rows, Paul?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm overwhelmed with all the help. When I first posted this problem I figured it might be quite some time if ever that someone came up with a solution. Thank you all so much. I will be running these queries and puzzling out how they work on monday. I will let you all know how it goes. Once again thank you so much.

  • nathan 7372 (2/27/2010)


    I'm overwhelmed with all the help. When I first posted this problem I figured it might be quite some time if ever that someone came up with a solution. Thank you all so much. I will be running these queries and puzzling out how they work on monday. I will let you all know how it goes. Once again thank you so much.

    You only have yourself to thank. 🙂 Thanks for taking the time to read the "etiquette" article I led you to... I just knew a bunch of great people would jump in on this one if they had a little data to work with. Well done on the data you posted, Nathan.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/27/2010)


    Paul White (2/27/2010)


    Chris,

    Turns out all that is needed to get rid of the index spool is to define the clustered index as UNIQUE. Given that information, a better plan is produced (without the extra index spool), and the recursive CTE now starts to return results after four seconds on my laptop. Well done!

    Paul

    Was that for the 100k rows, Paul?

    Sure was, yes.

  • Jeff Moden (2/27/2010)


    nathan 7372 (2/27/2010)


    I'm overwhelmed with all the help. When I first posted this problem I figured it might be quite some time if ever that someone came up with a solution. Thank you all so much. I will be running these queries and puzzling out how they work on monday. I will let you all know how it goes. Once again thank you so much.

    You only have yourself to thank. 🙂 Thanks for taking the time to read the "etiquette" article I led you to... I just knew a bunch of great people would jump in on this one if they had a little data to work with. Well done on the data you posted, Nathan.

    +1 🙂

  • I just wanted to let everyone know how things went. I have to say I learned a lot about the different ways to solve this problem. All three queries ran exceptionally fast. I had to modify Chris' query a tad since it was off by one month on the interest rates used. Since the interest rates are so similiar it only had a minor effect on the outcome.

    On an interesting note just by using the different datatypes between Jeff and Paul we get a difference of $374 out of a total around $4.5M. Just goes to show you the power of decimal places and rounding.

    Thank you all for tackling this problem. I really feel I learned a lot and the accounting department is ecstatic to finally have the numbers so we can move forward. Thank you again.

  • Thanks for the feedback. I am a bit confused, though... I thought Paul and I both used DECIMAL(28,17).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 67 total)

You must be logged in to reply to this topic. Login to reply