• OK, some fun on a slow Friday afternoon: this can indeed be implemented so that it is very, very quick without breaking any of the rules/guarantees provided by SQL.

    The key is to recognize that CTE's defined as recursive queries are in fact a set-oriented tail recursion operator for SQL. This is an extremely powerful concept, and thus CTEs can be employed to do all sorts of wonderful things in a fast set-oriented manner.

    Here's my code:

    The source table create and data load; this takes 2:04 (mins:secs) on my machine (pretty much lifted from the original article):

    CREATE TABLE dbo.JBMTest

    ( RowNum INT IDENTITY (1,1) NOT NULL,

    AccountID INT not NULL,

    Amount MONEY not NULL,

    Date DATETIME not NULL

    --RunBal MONEY NULL,

    --GrpBal MONEY NULL,

    --RunCnt INT NULL,

    --GrpCnt INT NULL

    )

    CREATE NONCLUSTERED INDEX IX_JBMTest_AccountID_Date --not clustered for "Merry-go-Round" test

    ON dbo.JBMTest (AccountID, Date)

    -- 2mins, 4secs on my machine:

    --===== Build the table 100 rows at a time to "mix things up"

    DECLARE @Counter INT

    SET @Counter = 0

    WHILE @Counter < 1000000

    BEGIN --===== Add 1000 rows to the test table

    INSERT INTO dbo.JBMTest

    (AccountID, Amount, Date)

    SELECT TOP 100

    AccountID = ABS(CHECKSUM(NEWID()))%50000+1

    , Amount = CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY)

    , Date = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME)

    FROM Master.dbo.SysColumns t1

    CROSS JOIN Master.dbo.SysColumns t2

    --===== Increment the counter

    SET @Counter = @Counter + 100

    END

    Now the tricky CTE; we first build a table variable that sucks the whole source table out and applies a clustered index, then do the CTE query that adds the running totals:

    declare @sequential as table

    (

    AccountID INT not NULL

    , seq int not null

    , Date DATETIME not NULL

    , Amount MONEY not NULL

    , unique clustered (AccountID, seq)

    );

    insert into @sequential

    select

    AccountID

    , row_number() over(partition by AccountID order by Date) as seq

    , Date

    , Amount

    from JBMTest;

    with

    summed(AccountID, seq, Date, Ammount, Running) as

    (

    select

    sq.AccountID

    , sq.seq

    , sq.Date

    , sq.Amount

    , sq.Amount as Running

    from @sequential as sq

    where sq.seq = 1

    UNION ALL

    select

    sq.AccountID

    , sq.seq

    , sq.Date

    , sq.Amount

    , sq.Amount + pri.Running as Running

    from @sequential as sq

    join summed as pri on

    sq.AccountID = pri.AccountID

    and sq.seq = pri.seq + 1

    where sq.seq > 1

    )

    select * from summed;

    The above query -- INCLUDING generating the table variable -- takes 29 seconds on my machine, and is eligible for full parallelism, etc. (Note also that I did not specify any table/index hints.) If we add a final sort (order by accountID, seq) (most likely in practice, if used for reporting), the time balloons to all of 43 seconds.....

    Reviewing the query plans shows that each row is visited exactly once; the recursive CTE is trully a tail-recursion operator in SQL!

    Out of curiousity, if we were to grab the current balance using a sum...over(partition) construct:

    select distinct(AccountID), sum(Amount) over(partition by AccountID) as cAmount

    from JBMTest

    that takes 2 seconds on my machine.

    -frank


    The End.