RE: Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

Mr or Mrs. 500

Points: 570

February 15, 2008 at 2:36 pm

OK, some fun on a slow Friday afternoon: this can indeed be implemented so that it is very, very quick without breaking any of the rules/guarantees provided by SQL.

The key is to recognize that CTE's defined as recursive queries are in fact a set-oriented tail recursion operator for SQL. This is an extremely powerful concept, and thus CTEs can be employed to do all sorts of wonderful things in a fast set-oriented manner.

Here's my code:

The source table create and data load; this takes 2:04 (mins:secs) on my machine (pretty much lifted from the original article):

CREATE TABLE dbo.JBMTest

( RowNum INT IDENTITY (1,1) NOT NULL,

AccountID INT not NULL,

Amount MONEY not NULL,

Date DATETIME not NULL

--RunBal MONEY NULL,

--GrpBal MONEY NULL,

--RunCnt INT NULL,

--GrpCnt INT NULL

)

CREATE NONCLUSTERED INDEX IX_JBMTest_AccountID_Date --not clustered for "Merry-go-Round" test

ON dbo.JBMTest (AccountID, Date)

-- 2mins, 4secs on my machine:

--===== Build the table 100 rows at a time to "mix things up"

DECLARE @Counter INT

SET @Counter = 0

WHILE @Counter < 1000000

BEGIN --===== Add 1000 rows to the test table

INSERT INTO dbo.JBMTest

(AccountID, Amount, Date)

SELECT TOP 100

AccountID = ABS(CHECKSUM(NEWID()))%50000+1

, Amount = CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY)

, Date = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME)

FROM Master.dbo.SysColumns t1

CROSS JOIN Master.dbo.SysColumns t2

--===== Increment the counter

SET @Counter = @Counter + 100

END

Now the tricky CTE; we first build a table variable that sucks the whole source table out and applies a clustered index, then do the CTE query that adds the running totals:

declare @sequential as table

(

AccountID INT not NULL

, seq int not null

, Date DATETIME not NULL

, Amount MONEY not NULL

, unique clustered (AccountID, seq)

);

insert into @sequential

select

AccountID

, row_number() over(partition by AccountID order by Date) as seq

, Date

, Amount

from JBMTest;

with

summed(AccountID, seq, Date, Ammount, Running) as

(

select

sq.AccountID

, sq.seq

, sq.Date

, sq.Amount

, sq.Amount as Running

from @sequential as sq

where sq.seq = 1

UNION ALL

select

sq.AccountID

, sq.seq

, sq.Date

, sq.Amount

, sq.Amount + pri.Running as Running

from @sequential as sq

join summed as pri on

sq.AccountID = pri.AccountID

and sq.seq = pri.seq + 1

where sq.seq > 1

)

select * from summed;

The above query -- INCLUDING generating the table variable -- takes 29 seconds on my machine, and is eligible for full parallelism, etc. (Note also that I did not specify any table/index hints.) If we add a final sort (order by accountID, seq) (most likely in practice, if used for reporting), the time balloons to all of 43 seconds.....

Reviewing the query plans shows that each row is visited exactly once; the recursive CTE is trully a tail-recursion operator in SQL!

Out of curiousity, if we were to grab the current balance using a sum...over(partition) construct:

select distinct(AccountID), sum(Amount) over(partition by AccountID) as cAmount

from JBMTest

that takes 2 seconds on my machine.

-frank

The End.