Get running total in the following query

  • Hi

    I have a table which holds the running total.

    DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT)

    INSERT INTO @CumCredit VALUES(1,123,121,10,10), (2,123,131,20,30), (3,123,141,15,45)

    New resultset which needs to be inserted into the above table with running total.(Kind of incremental load)

    DECLARE @Cum TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT)

    INSERT INTO @Cum VALUES(4,123,151,10), (5,123,161,20)

    I am doing as below but not succeeded.

    insert into @CumCredit

    select *, (SELECT MAX(RunningTotal)+c.Pts AS Total FROM @CumCredit WHERE CustomerId = c.CustomerId) AS Total from @Cum c

    select * from @CumCredit

    Issue with the RowId=5, the running total at this rowid should be 75 instead of 65.

    Where I am doing wrong?

    thanks

  • See if this article helps. http://www.sqlservercentral.com/articles/T-SQL/68467/[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 โ€“ Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • a2zwd (7/18/2012)


    Hi

    I have a table which holds the running total.

    DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT)

    INSERT INTO @CumCredit VALUES(1,123,121,10,10), (2,123,131,20,30), (3,123,141,15,45)

    New resultset which needs to be inserted into the above table with running total.(Kind of incremental load)

    DECLARE @Cum TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT)

    INSERT INTO @Cum VALUES(4,123,151,10), (5,123,161,20)

    I am doing as below but not succeeded.

    insert into @CumCredit

    select *, (SELECT MAX(RunningTotal)+c.Pts AS Total FROM @CumCredit WHERE CustomerId = c.CustomerId) AS Total from @Cum c

    select * from @CumCredit

    Issue with the RowId=5, the running total at this rowid should be 75 instead of 65.

    Where I am doing wrong?

    thanks

    I think this is what you are looking for:

    DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, PRIMARY KEY(RowID ASC))

    INSERT INTO @CumCredit VALUES(1,123,121,10), (2,123,131,20), (3,123,141,15)

    SELECT * FROM @CumCredit

    DECLARE @min-2 int = (SELECT MIN([RowID]) FROM @CumCredit)

    ;WITH CTE ([RowId], [CustomerID], [TransID], [Pts], [Running Total])

    AS

    (

    SELECT[RowId],

    [CustomerID],

    [TransID],

    [Pts],

    [Pts]

    FROM @CumCredit

    WHERE [RowId] = @min-2

    UNION ALL

    SELECTa.[RowId],

    a.[CustomerID],

    a.[TransId],

    a.[Pts],

    CTE.[Running Total] + a.[Pts]

    FROM CTE

    JOIN @CumCredit a ON CTE.[RowId] + 1 = a.[RowId]

    )

    SELECT * FROM CTE

    OPTION (MAXRECURSION 0)

    There are other ways to do this but using recursion is generally the fastest.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Sean Lange (7/18/2012)


    See if this article helps. http://www.sqlservercentral.com/articles/T-SQL/68467/[/url]

    Jeff's article is certainly a must-read on this topic.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • I must be missing something but I don't see how that recursive CTE (as much as I like rCTEs) accounts for the RunningTotal existing on the last record that already exists in @CumCredit.

    Here's an alternative, just as something different hoping that someone can find a reason to shoot it down.

    DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT)

    INSERT INTO @CumCredit VALUES(1,123,121,10,10), (2,123,131,20,30), (3,123,141,15,45)

    select * from @CumCredit

    DECLARE @Cum TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT)

    INSERT INTO @Cum VALUES(4,123,151,10), (5,123,161,20), (6,123, 151, 20), (7, 123, 161, 40)

    select * from @Cum

    INSERT INTO @CumCredit

    SELECT RowID, CustomerID, Transid, Pts

    ,RunningTotal=Pts + CASE WHEN Pts = RunningTotal

    THEN (

    SELECT RunningTotal=MAX(RunningTotal)

    FROM @CumCredit b

    WHERE a.CustomerID = b.CustomerID)

    ELSE RunningTotal END

    FROM (

    SELECT RowID, CustomerID, Transid=MAX(Transid)

    ,Pts=MAX(b.Pts), RunningTotal=SUM(ISNULL(c.Pts, 0)) + MAX(rt)

    ,rn=ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY RowID)

    FROM @Cum b

    OUTER APPLY (

    SELECT Pts

    FROM @Cum a

    WHERE a.RowID < b.RowID

    ) c

    CROSS APPLY (

    SELECT Pts=SUM(Pts)

    FROM @CumCredit c

    WHERE c.CustomerID = b.CustomerID) d(rt)

    GROUP BY RowID, CustomerID

    ) a

    Performance perhaps? Then again it is going up against an rCTE!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • XMLSQLNinja (7/18/2012)


    ...

    There are other ways to do this but using recursion is generally the fastest.

    Recursion is the third fastest; CLR > "Quirky Update" > rCTE > cursor > triangular join. Quite a few comparisons have been made between these methods here on ssc over the years.

    If you use a rCTE for running totals, "next row" from the source table will be picked up by a table scan if there's no index on the sequence column, and the fastest index is usually a unique clustered index. So if your source table has a million rows, that's 999,999 table scans.

    Triangular joins can perform quite well if the aggregated partitions are small relative to the whole data set (and supporting indexes exist); 100,000 customers each with 10 transactions (10 aggregating scans per customer, max ten rows) will perform far better than a running total over the whole million-row table (a million aggregating scans, max a million rows).


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Just got back from vacation and wanted to address this one first...

    I must be missing something but I don't see how that recursive CTE (as much as I like rCTEs) accounts for the RunningTotal existing on the last record that already exists in @CumCredit.

    I fixed and error and updated my query to handle the insert correctly:

    DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT, PRIMARY KEY(RowID ASC))

    DECLARE @Cum TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT, PRIMARY KEY(RowID ASC))

    INSERT INTO @CumCredit VALUES(1,123,121,10,10), (2,123,131,20,30), (3,123,141,15,45)

    INSERT INTO @Cum VALUES(4,123,151,10,NULL), (5,123,161,20,NULL), (6,123, 151, 20,NULL), (7, 123, 161, 40,NULL)

    DECLARE @total INT = (SELECT RunningTotal FROM @CumCredit WHERE RowId = (SELECT MAX(RowID) FROM @CumCredit)),

    @min-2 int = (SELECT MIN([RowID]) FROM @Cum)

    ;WITH CTE ([RowId], [CustomerID], [TransID], [Pts], [Running Total])

    AS

    (

    SELECT[RowId],

    [CustomerID],

    [TransID],

    [Pts],

    @total + [Pts]

    FROM @Cum

    WHERE [RowId] = @min-2

    UNION ALL

    SELECTa.[RowId],

    a.[CustomerID],

    a.[TransId],

    a.[Pts],

    CTE.[Running Total] + a.[Pts]

    FROM CTE

    JOIN @Cum a ON CTE.[RowId] + 1 = a.[RowId]

    )

    INSERT INTO @CumCredit

    SELECT * FROM CTE

    OPTION (MAXRECURSION 0);

    SELECT * FROM @CumCredit

    GO

    This is an elegent solution which will get you the desired results and the desired performance.

    Here's an alternative, just as something different hoping that someone can find a reason to shoot it down.

    Reviewing your code (above) let's look at the pros and cons:

    Pro: It works and produces the desired results.

    Con: Performance

    I performed a test to compare the results of both queries:

    First I put together code to produce sample data:

    -- CREATE BIG SAMPLE DATA

    SET NOCOUNT ON;

    IF object_id('tempdb..##sampleData') IS NOT NULL

    DROP TABLE ##sampleData;

    CREATE TABLE ##sampleData

    (

    RowID int,

    cID int,

    TransID int,

    pts int,

    PRIMARY KEY([RowID])

    );

    -- Begin Routine

    DECLARE @RowID INT = 4,

    @val INT,

    @rand1 INT,

    @rand2 INT

    WHILE @RowID <= 100

    BEGIN

    SET @val = ((RAND()*8)+1);

    SET @val = @val*5

    INSERT ##sampleData VALUES (@RowID,123,1999,@val)

    SET @RowID = @RowID + 1

    END

    GO

    Then, for each table inserted the values from ##sampleData into @Cum so we could test this 100 records, then 500, then 1K, 5K, 100K then 1,000,0000 records.

    With 5 records both queries ran at the same speed ~6Kms.

    ... {fast forward}

    With 1000 records mine (we'll call Q1) runs at 28Kms, yours (Q2) runs for 4 seconds.

    With 2,500 records: Q1 = 60Kms, Q2 = 30 seconds

    With 5K: Q1 = 00.10 seconds, Q2: 80.00 seconds

    ... {fast forward}

    With 1 million records: My rCTE: 20 seconds, Q2 - did not bother

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • ChrisM@home (7/20/2012)


    XMLSQLNinja (7/18/2012)


    ...

    There are other ways to do this but using recursion is generally the fastest.

    Recursion is the third fastest; CLR > "Quirky Update" > rCTE > cursor > triangular join...

    Are these your findings or are you quoting something you read?

    Let's start with the CLR... This is a T-SQL forum which is why a T-SQL solution was provided but I digress. When saying that a CLR is fastest are you taking into account how much time it takes to learn a new programming language? My query will insert a million rows in 20 seconds and I wrote it with just SQL. Do you have a faster CLR that you tested when you made that statement or are you just relying on what you read on the internet?

    "Quirky Update" -- again, do you have an example that works here? Have you tested it?

    Quite a few comparisons have been made between these methods here on ssc over the years.

    Have any of those comparisons been made by you? Or, again, are you just relying on what you read on the Internet?

    If you use a rCTE for running totals, "next row" from the source table will be picked up by a table scan if there's no index on the sequence column...

    So, perhaps you should put an index on that table, eh? You will notice that I included a PK in my code. I did that for the reason you mentioned. The more thinking you do when writing you DDL, the less you thinking you will need to do when writing you DML.

    Triangular joins can perform quite well if the aggregated partitions are small relative to the ...

    It is statements like this that make me with there was an article on SSC titled, How to Answer Questions. Again, a lot of theory, a lot of what you read, not any examples to show that what you read is true. I have learned much of what I know from reading too but, until I have tested it myself, I really have not learned that much.

    "In theory, theory and practice are the same thing. In practice they are not!" -- Yogi Berra

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • XMLSQLNinja (7/23/2012)


    Just got back from vacation and wanted to address this one first...

    I must be missing something but I don't see how that recursive CTE (as much as I like rCTEs) accounts for the RunningTotal existing on the last record that already exists in @CumCredit.

    I fixed and error and updated my query to handle the insert correctly:

    DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT, PRIMARY KEY(RowID ASC))

    DECLARE @Cum TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT, PRIMARY KEY(RowID ASC))

    INSERT INTO @CumCredit VALUES(1,123,121,10,10), (2,123,131,20,30), (3,123,141,15,45)

    INSERT INTO @Cum VALUES(4,123,151,10,NULL), (5,123,161,20,NULL), (6,123, 151, 20,NULL), (7, 123, 161, 40,NULL)

    DECLARE @total INT = (SELECT RunningTotal FROM @CumCredit WHERE RowId = (SELECT MAX(RowID) FROM @CumCredit)),

    @min-2 int = (SELECT MIN([RowID]) FROM @Cum)

    ;WITH CTE ([RowId], [CustomerID], [TransID], [Pts], [Running Total])

    AS

    (

    SELECT[RowId],

    [CustomerID],

    [TransID],

    [Pts],

    @total + [Pts]

    FROM @Cum

    WHERE [RowId] = @min-2

    UNION ALL

    SELECTa.[RowId],

    a.[CustomerID],

    a.[TransId],

    a.[Pts],

    CTE.[Running Total] + a.[Pts]

    FROM CTE

    JOIN @Cum a ON CTE.[RowId] + 1 = a.[RowId]

    )

    INSERT INTO @CumCredit

    SELECT * FROM CTE

    OPTION (MAXRECURSION 0);

    SELECT * FROM @CumCredit

    GO

    This is an elegent solution which will get you the desired results and the desired performance.

    Here's an alternative, just as something different hoping that someone can find a reason to shoot it down.

    Reviewing your code (above) let's look at the pros and cons:

    Pro: It works and produces the desired results.

    Con: Performance

    I performed a test to compare the results of both queries:

    First I put together code to produce sample data:

    -- CREATE BIG SAMPLE DATA

    SET NOCOUNT ON;

    IF object_id('tempdb..##sampleData') IS NOT NULL

    DROP TABLE ##sampleData;

    CREATE TABLE ##sampleData

    (

    RowID int,

    cID int,

    TransID int,

    pts int,

    PRIMARY KEY([RowID])

    );

    -- Begin Routine

    DECLARE @RowID INT = 4,

    @val INT,

    @rand1 INT,

    @rand2 INT

    WHILE @RowID <= 100

    BEGIN

    SET @val = ((RAND()*8)+1);

    SET @val = @val*5

    INSERT ##sampleData VALUES (@RowID,123,1999,@val)

    SET @RowID = @RowID + 1

    END

    GO

    Then, for each table inserted the values from ##sampleData into @Cum so we could test this 100 records, then 500, then 1K, 5K, 100K then 1,000,0000 records.

    With 5 records both queries ran at the same speed ~6Kms.

    ... {fast forward}

    With 1000 records mine (we'll call Q1) runs at 28Kms, yours (Q2) runs for 4 seconds.

    With 2,500 records: Q1 = 60Kms, Q2 = 30 seconds

    With 5K: Q1 = 00.10 seconds, Q2: 80.00 seconds

    ... {fast forward}

    With 1 million records: My rCTE: 20 seconds, Q2 - did not bother

    Ouch! I knew mine wouldn't be real fast but didn't expect it to be that slow either!

    Nice work Alan.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Okay, now modify the rCTE to run against this dataset. It will contain 1,000,000 rows of data and potentially 10,000 unique cID's, not 1 (cID = 123) as in the other test data set.

    CREATE TABLE #sampleData

    (

    RowID int,

    cID int,

    TransID int,

    pts int,

    PRIMARY KEY NONCLUSTERED([RowID])

    );

    declare @TransID int = 1;

    with

    e1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),

    e2(n) as (select 1 from e1 a cross join e1 b),

    e4(n) as (select 1 from e2 a cross join e2 b),

    e6(n) as (select row_number() over (order by (select null)) from e4 a cross join e2 b)

    insert into #sampleData (RowID, cID, TransID, pts)

    select

    n,

    ceiling(rand(checksum(newid())) * 10000),

    @TransID + (5 * n),

    ceiling(rand(checksum(newid())) * 100)

    from

    e6;

    create clustered index IX_CustTran on #sampleData (

    cID,

    TransID);

  • Actually, Itzik Ben-Gan's new "Microsoft SQL Server 2012 High-Performance T-SQL Using Windowing Functions" shows that the Quirky Update and the new windowing-functions in 2012 beats the CLR and the rCTE comes in at a very poor 4th being more than 7 times slower than even the slowest of the rest. It's on page 170 and the ISBN is 978-0-7356-5836-3.

    YMMV. It Depends. ๐Ÿ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • XMLSQLNinja (7/23/2012)


    ChrisM@home (7/20/2012)


    XMLSQLNinja (7/18/2012)


    ...

    There are other ways to do this but using recursion is generally the fastest.

    Recursion is the third fastest; CLR > "Quirky Update" > rCTE > cursor > triangular join...

    Are these your findings or are you quoting something you read?

    [/code]

    I was going to suggest that you post your test code because I've not found rCTE's to be very fast at very much.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/23/2012)

    I've not found rCTE's to be very fast at very much.

    I know you're a busy guy Jeff, so you must've missed my rCTE solution to the Sweden redistricting problem. Made pretty short work of it, along with Les Cardwell's original, similar problem or so I understand.

    They do have their niche. ๐Ÿ˜€


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Jeff Moden (7/23/2012)


    XMLSQLNinja (7/23/2012)


    ChrisM@home (7/20/2012)


    XMLSQLNinja (7/18/2012)


    ...

    There are other ways to do this but using recursion is generally the fastest.

    Recursion is the third fastest; CLR > "Quirky Update" > rCTE > cursor > triangular join...

    Are these your findings or are you quoting something you read?

    [/code]

    I was going to suggest that you post your test code because I've not found rCTE's to be very fast at very much.

    I posted some test code in the first page of this thread.ย The comment at @ 9:55:46 PM.

    I will post something a little more detailed tomorrow.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • XMLSQLNinja (7/23/2012)


    ChrisM@home (7/20/2012)


    XMLSQLNinja (7/18/2012)


    ...

    There are other ways to do this but using recursion is generally the fastest.

    Recursion is the third fastest; CLR > "Quirky Update" > rCTE > cursor > triangular join...

    Are these your findings or are you quoting something you read? ...

    Alan

    Gosh, that's quite confrontational.

    I do pick up information from the internet. Mostly from here, but also blogs: Paul Randall, Paul White, Gail Shaw, Hugo Kornelis to mention a very few.

    Here's[/url] Jeff Moden's Quirky Update article again - if you remember, it was referenced earlier in this thread. The article is a direct comparison of the "quirky update" method vs cursor and triangular join solutions. It unfortunately doesn't include a rCTE comparison, so here it is.

    I've taken the liberty of copying and adjusting the sample data scripts from the article to make it a little more forgiving towards the TJ method and I've also changed the cursor code to run significantly faster. To level out the playing field I've taken CLR solutions out of the equation altogether.

    The results for QU, cursor and TJ are in the same order as Jeff's findings, and the rCTE comes in exactly as my previous - disputed - post: between QU and cursor.

    QU: 1.2 seconds

    rCTE about 20 seconds (same ballpark as your figures in an earlier post)

    Cursor about 38 seconds

    Triangular Join about 50 seconds.

    Can you beat the QU with a rCTE?

    Here's the table script:

    /*************************************************************************************

    Truncate the table if it exists, otherwise create it.

    Generate a million rows with @Accounts number of accounts

    and @Transactions number of transactions per account.

    The following statement will create a million row sample data set

    with 2000 different accounts, each with about 500 transactions.

    EXEC [dbo].[QUT_RepopulateTransactionDetail] @Accounts = 2000

    *************************************************************************************/

    CREATE PROCEDURE [dbo].[QUT_RepopulateTransactionDetail]

    (@Accounts INT)

    AS

    SET NOCOUNT ON

    IF @Accounts IS NULL SET @Accounts = 2000

    DECLARE @Transactions INT

    SET @Transactions = 1000000/@Accounts

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TransactionDetail]') AND type in (N'U'))

    DROP TABLE [dbo].[TransactionDetail]

    CREATE TABLE TransactionDetail(

    TransactionDetailID int IDENTITY(1,1) NOT NULL,

    [Date] datetime NULL,

    AccountID int NOT NULL,

    Amount money NULL,

    AccountRunningTotal money NULL,

    rn INT NULL) ON [PRIMARY]

    INSERT INTO dbo.TransactionDetail

    (AccountID, [Date], Amount)

    SELECT

    a.AccountID,

    t.[Date],

    t.Amount

    FROM (

    SELECT TOP(@Accounts)

    AccountID = ABS(CHECKSUM(NEWID()))%@Accounts+1

    FROM Master.dbo.SysColumns sc1 CROSS JOIN Master.dbo.SysColumns sc2) a

    CROSS JOIN (

    SELECT TOP(@Transactions)

    [Date] = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    Amount = CAST(CHECKSUM(NEWID())%@Transactions /100.0 AS MONEY)

    FROM Master.dbo.SysColumns sc1 CROSS JOIN Master.dbo.SysColumns sc2) t

    ORDER BY a.AccountID, t.[Date]

    UPDATE t SET rn = d.rn

    FROM TransactionDetail t

    INNER JOIN (

    SELECT

    TransactionDetailID,

    rn = ROW_NUMBER() OVER(PARTITION BY AccountID ORDER BY [Date])

    FROM TransactionDetail

    ) d ON d.TransactionDetailID = t.TransactionDetailID

    -- A clustered index is ESSENTIAL for the QU: it defines the running order. Handy for the TJ too.

    CREATE UNIQUE CLUSTERED INDEX [ucx_TransactionDetail_AccountID_rn] ON TransactionDetail (AccountID, rn)

    -- always nice to have a PK - and essential for the cursor method.

    ALTER TABLE [dbo].[TransactionDetail]

    ADD CONSTRAINT [PK_TransactionDetail_TransactionDetailID] PRIMARY KEY NONCLUSTERED

    ([TransactionDetailID] ASC)

    -- essential for the rCTE.

    CREATE UNIQUE INDEX [uix_TransactionDetail_rn_AccountID] ON TransactionDetail (rn, AccountID) INCLUDE (Amount)

    -- ALWAYS eyeball the data

    SELECT TOP 2000 * FROM [TransactionDetail] WHERE rn = 1

    RETURN 0

    Here's the script to compare the different methods:

    /*---------------------------------------------------------------

    These results are from a million-row sample set consisting of

    2000 accounts each with 500 transactions

    ---------------------------------------------------------------*/

    SET NOCOUNT ON

    -- METHOD 1 triangular join ============================================================

    -- about 50 seconds ====================================================================

    SELECT t.AccountID,

    t.rn,

    t.Amount, x.SUM_Amount

    FROM dbo.TransactionDetail t

    CROSS APPLY (

    SELECT SUM_Amount = SUM(Amount)

    FROM dbo.TransactionDetail s

    WHERE s.AccountID = t.AccountID

    AND s.rn <= t.rn

    ) x

    -- METHOD 2 Cursor =====================================================================

    -- about 40 seconds ====================================================================

    --===== Declare the cursor storage variables

    DECLARE @Amount MONEY

    DECLARE @CurAccountID INT

    DECLARE @TransactionDetailID INT

    --===== Declare the working variables

    DECLARE @PrevAccountID INT

    DECLARE @AccountRunningTotal MONEY

    --===== Create the cursor with rows sorted in the correct

    -- order to do the running total by account

    DECLARE curRunningTotal CURSOR LOCAL STATIC READ_ONLY

    FOR

    SELECT AccountID, TransactionDetailID, Amount

    FROM dbo.TransactionDetail

    ORDER BY AccountID, rn

    OPEN curRunningTotal

    FETCH NEXT FROM curRunningTotal

    INTO @CurAccountID, @TransactionDetailID, @Amount

    WHILE @@FETCH_STATUS = 0

    BEGIN

    --===== Calculate the running total for this row

    -- and remember this AccountID for the next row

    SELECT

    @AccountRunningTotal = @Amount + CASE

    WHEN @CurAccountID = @PrevAccountID THEN @AccountRunningTotal

    ELSE 0 END,

    @PrevAccountID = @CurAccountID

    --===== Update the running total for this row using the PK

    -- this is much faster than "WHERE CURRENT OF"

    UPDATE dbo.TransactionDetail

    SET AccountRunningTotal = @AccountRunningTotal

    WHERE TransactionDetailID = @TransactionDetailID

    FETCH NEXT FROM curRunningTotal

    INTO @CurAccountID, @TransactionDetailID, @Amount

    END

    CLOSE curRunningTotal

    DEALLOCATE curRunningTotal

    -- verify results are correct

    EXEC QUT_Verify

    -- (see JBM's article for the code)

    -- METHOD 3 rCTE ==============================================================

    -- about 20 seconds ===========================================================

    SET STATISTICS IO, TIME ON

    ;WITH Calculator AS (

    SELECT

    --[Level] = 1,

    --TransactionDetailID, [Date],

    AccountID, Amount, rn,

    RunningTotal = Amount

    FROM TransactionDetail

    WHERE rn = 1

    UNION ALL

    SELECT

    --[Level] = lr.[Level] + 1,

    --tr.TransactionDetailID, tr.[Date],

    x.AccountID, x.Amount, x.rn,

    x.RunningTotal -- = lr.RunningTotal + x.Amount

    FROM Calculator lr

    CROSS APPLY (SELECT t.rn, t.AccountID, t.Amount, RunningTotal = lr.RunningTotal + t.Amount FROM TransactionDetail t WHERE lr.AccountID = t.AccountID) x

    WHERE x.rn = lr.rn+1

    )

    SELECT *

    FROM Calculator

    WHERE AccountID IN (1,5,37,567,1999) -- ensures whole result set is calculated

    ORDER BY AccountID, rn

    OPTION(MAXRECURSION 0)

    SET STATISTICS IO, TIME OFF

    -- METHOD 4 Quirky Update =====================================================

    -- about 1.2 seconds =============================================================

    -- ensure running totals column is initialised to NULL

    -- (see JBM's article for the code)

    EXEC QUT_ResetTestTable

    --===== Declare the working variables (uncomment to run this section alone)

    --DECLARE @PrevAccountID INT

    --DECLARE @AccountRunningTotal MONEY

    SELECT @PrevAccountID = NULL, @AccountRunningTotal = NULL

    --===== Run the update

    SET STATISTICS IO, TIME ON

    UPDATE dbo.TransactionDetail SET

    @AccountRunningTotal = AccountRunningTotal = CASE

    WHEN AccountID = @PrevAccountID THEN @AccountRunningTotal + Amount

    ELSE Amount END,

    @PrevAccountID = AccountID

    FROM dbo.TransactionDetail WITH (TABLOCKX)

    OPTION (MAXDOP 1)

    SET STATISTICS IO, TIME OFF

    -- 00:00:01

    -- verify results are correct

    EXEC QUT_Verify

    -- (see JBM's article for the code)

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply