RE: Calculating interest query

SSC Guru

Points: 150467

February 27, 2010 at 2:33 am

For completeness, I'm going to add a new method to the mix (before anyone else jumps in!).

This is a method called 'set-based iteration' by MVP Hugo Kornelis, which was first brought to my attention here in a discussion of one of Jeff's fine articles.

While not quite as fast as the Quirky Update, it is in the same ballpark. It is faster than a recursive CTE on large sets, and even manages to out-perform the best CLR solution I have been able to put together, at least for some data sets.

The key to performance with the set-based iteration method is to have at least several groups in the data. It operates on a divide-and-conquer basis, and the unit of division is the group. In the present case, these groups are accounts - in general, they are the things that reset the running total. Worst case, if there is only one group, this method performs worse than an optimized cursor - so be aware of that.

In the sample data provided, there are two accounts, which is enough to make the method viable, but certainly not optimal. My guess is that the real data set contains many accounts, so that should be all good.

I have shamelessly re-used Jeff's fine test script, up to the point marked 'Solution to the problem starts here'. The following code runs from that point, and again draws heavily on Jeff's work. I hope he won't mind.

-- A work table to hold the reformatted data, and

-- ultimately, the results

CREATE TABLE #Work

(

Acct_No VARCHAR(20) NOT NULL,

MonthDate DATETIME NOT NULL,

MonthRate DECIMAL(19,12) NOT NULL,

Amount DECIMAL(19,12) NOT NULL,

InterestAmount DECIMAL(19,12) NOT NULL,

RunningTotal DECIMAL(19,12) NOT NULL,

RowRank BIGINT NOT NULL

);

-- Slightly modified from Jeff's example, to prepare

-- for the set-based iteration method

WITH Accounts

AS (

-- Get a list of the account numbers

SELECT DISTINCT Acct_No

FROM #Refunds

),

Rates

AS (

-- Apply all the accounts to all the rates

SELECT A.Acct_No,

R.[Year],

R.[Month],

MonthRate = R.InterestRate / 12

FROM #InterestRates R

CROSS

JOIN Accounts A

),

BaseData

AS (

-- The basic data we need to work with

SELECT Acct_No = ISNULL(R.Acct_No,''),

MonthDate = ISNULL(DATEADD(MONTH, R.[Month], DATEADD(YEAR, R.[year] - 1900, 0)), 0),

R.MonthRate,

Amount = ISNULL(RF.Amount,0),

InterestAmount = ISNULL(RF.Amount,0) * R.MonthRate,

RunningTotal = ISNULL(RF.Amount,0)

FROM Rates R

LEFT

JOIN #Refunds RF

ON RF.Acct_No = R.Acct_No

AND RF.[Year] = R.[Year]

AND RF.[Month] = R.[Month]

)

-- Basic data plus a rank id, numbering the rows by MonthDate, and resetting to 1 for each new Account

INSERT #Work

(Acct_No, MonthDate, MonthRate, Amount, InterestAmount, RunningTotal, RowRank)

SELECT BD.Acct_No, BD.MonthDate, BD.MonthRate, BD.Amount, BD.InterestAmount, BD.RunningTotal,

RowRank = RANK() OVER (PARTITION BY BD.Acct_No ORDER BY MonthDate)

FROM BaseData BD;

-- An index to speed the next stage (different from that used with the Quirky Update method)

CREATE UNIQUE CLUSTERED INDEX nc1 ON #Work (RowRank, Acct_No);

-- Iteration variables

DECLARE @Rank BIGINT,

@RowCount INTEGER;

-- Initialize

SELECT @Rank = 1,

@RowCount = 1;

-- This is the iteration bit, processes a rank id per iteration

-- The number of rows processed with each iteration is equal to the number of groups in the data

-- More groups --> greater efficiency

WHILE (1 = 1)

BEGIN

SET @Rank = @Rank + 1;

-- Set-based update with running totals for the current rank id

UPDATE This

SET InterestAmount = (Previous.RunningTotal + This.Amount) * This.MonthRate,

RunningTotal = Previous.RunningTotal + This.Amount + (Previous.RunningTotal + This.Amount) * This.MonthRate

FROM #Work This

JOIN #Work Previous

ON Previous.Acct_No = This.Acct_No

AND Previous.RowRank = @Rank - 1

WHERE This.RowRank = @Rank;

IF (@@ROWCOUNT = 0) BREAK;

END;

-- Show the results in natural order

SELECT *

FROM #Work

ORDER BY

Acct_No, RowRank;

I continue to recommend the Quirky Update over this method, for sheer speed, but this method does have the advantage of being fully documented and supported. I present it as 'Option 2'.

Paul

edit: to reflect the improvements in the recursive CTE method

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi