Cumulative sum of previous rows

  • Comments posted to this topic are about the item Cumulative sum of previous rows

  • Nice article, Ben - does exactly what it says on the tin, although many folks lurking around here would be more familiar with the name Running Total.

    The old version of your update is what we call a "Triangular Join". If the partitions are small and hence you're only summing up a few rows it can be fast enough for production code. There are other ways of performing a running totals update including cursor, recursive CTE and the so-called "Quirky Update", and these have been exhaustively researched and compared by Jeff Moden et al in this[/url] and other articles.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hey Ben,

    Speaking of SQL 2012 and using "PARTITION OVER" have you tried specifying the rows relative to the current one and avoiding the UPDATE statement completely?

    This will return the same results as at the end of your article.

    eg)

    SELECT D.demandID

    ,D.itemID

    ,D.orderNbr

    ,D.orderQty

    ,I.inventoryQty-SUM(D.orderQty) OVER (PARTITION BY D.itemID ORDER BY D.orderNbr RANGE UNBOUNDED PRECEDING) as netInventoryQty

    FROM Demand D

    INNER JOIN Inventory I

    ON I.itemID=D.itemID

    ORDER BY D.demandID

    "PARTITION BY ItemID " will group them by item number and "ORDER BY orderNbr" will also put them in the correct order for the aggregation.

    "RANGE UNBOUNDED PRECEDING" grabs all the preceding rows, grouped by itemID ordered by orderNbr.

    I just found out about this new relative reference clause for OVER myself, very neat.

    Simon 🙂

  • Thanks Chris for your post.

    Ben

  • I will have to take a closer look at your code. Thanks for posting it.

    Ben

  • I agree with SimonC, the RANGE clause is the better solution to the Running Total problem for read operations.

  • Here's a version using a CTE for an easy-to-read implimenation:

    ;with d as (

    select netInventoryQty, demandID, d.itemID, i.inventoryQty,

    SUM(orderQty) OVER ( PARTITION BY d.itemID ORDER BY demandid ) AS 'cumQty'

    from demand d

    join inventory i on d.itemID = i.itemID

    )

    update d set netInventoryQty = d.inventoryQty - d.cumqty

  • Good article.. Thanks!..

    Here is how I would have solve the problem.

    UPDATE a

    SET a.netInventoryQty=c.inventoryQty-b.cQty

    --SELECT *

    FROM demand a

    INNER JOIN (

    select a.demandID, sum(b.orderQty) as cQty

    FROM demand a

    INNER JOIN demand b ON b.demandID<=a.demandID AND b.itemID=a.itemID

    GROUP BY a.demandID

    ) b on b.demandID=a.demandID

    INNER JOIN Inventory c ON c.itemID=a.itemID

    I'll have to check out if your method is faster.

    Thanks again

  • Not sure if you completely read my article, but the solution you posted is what I replaced to speed up the update.

    Thanks for your post.

    Ben

  • I did read your article and my solution is similar but there are differences - mainly that I'm joining to sub query that is grouping - where you originally are using nested query for summing - correct??.. Before I read your article I wouldn't have even thought to have used OVER ( PARTITION BY... -- As it is pretty new to me. 🙂

    Thanks

  • Good article.

    I'm not sure what you meant by this though

    I found something new in SQL Server 2012. You can now add an order by to the over( partition by and it will do a cumulative sum of the previous rows.

    The ORDER BY clause has been part of the OVER statement since 2005. Or do you mean that previously it wouldn't result in a cumulative sum?

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • rwatkins 25267 (5/8/2014)


    I agree with SimonC, the RANGE clause is the better solution to the Running Total problem for read operations.

    I disagree. For running aggregations (including totals), you need to use ROW, not RANGE. RANGE works by the value of the column, and where the value is the same, it shows the same total for both rows. Here's a quick example:

    DECLARE @test-2 TABLE (

    RowID INT IDENTITY,

    FName varchar(20),

    Salary smallint);

    INSERT INTO @test-2 (FName, Salary)

    VALUES ('George', 800),

    ('Sam', 950),

    ('Diane', 1100),

    ('Nicholas', 1250),

    ('Samuel', 1250),

    ('Patricia', 1300),

    ('Brian', 1500),

    ('Thomas', 1600),

    ('Fran', 2450),

    ('Debbie', 2850),

    ('Mark', 2975),

    ('James', 3000),

    ('Cynthia', 3000),

    ('Christopher', 5000);

    SELECT RowID,

    FName,

    Salary,

    SumByRows = SUM(Salary) OVER (ORDER BY Salary

    ROWS UNBOUNDED PRECEDING),

    SumByRange = SUM(Salary) OVER (ORDER BY Salary

    RANGE UNBOUNDED PRECEDING)

    FROM @test-2;

    Which gives these results:

    RowID FName Salary SumByRows SumByRange

    ----------- -------------------- ------ ----------- -----------

    1 George 800 800 800

    2 Sam 950 1750 1750

    3 Diane 1100 2850 2850

    4 Nicholas 1250 4100 5350

    5 Samuel 1250 5350 5350

    6 Patricia 1300 6650 6650

    7 Brian 1500 8150 8150

    8 Thomas 1600 9750 9750

    9 Fran 2450 12200 12200

    10 Debbie 2850 15050 15050

    11 Mark 2975 18025 18025

    12 James 3000 21025 24025

    13 Cynthia 3000 24025 24025

    14 Christopher 5000 29025 29025

    Note the results for Nicholas/Samuel (RowID 4/5) and James/Cynthia (RowID 12/13). The use of RANGE does not give the proper running total for the first record returned for each pair.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Stefan Krzywicki (5/8/2014)


    Good article.

    I'm not sure what you meant by this though

    I found something new in SQL Server 2012. You can now add an order by to the over( partition by and it will do a cumulative sum of the previous rows.

    The ORDER BY clause has been part of the OVER statement since 2005. Or do you mean that previously it wouldn't result in a cumulative sum?

    Prior to 2012, when using the OVER clause with an aggregate function, you could only specify the PARTITION BY clause, so every row within the partition would have the same value. With 2012, you can now specify the ORDER BY and ROW/RANGE clause. If you use the ORDER BY, you are using the default ROW/RANGE clause, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (from the first row in the partition to the current row). You can see more about the changes in my article[/url], or my blog post comparing running totals[/url].

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Good point, Wayne. I had forgotten that there is a definite difference between ROWS and RANGE, despite the similar syntax.

  • Stefan Krzywicki (5/8/2014)


    Good article.

    I'm not sure what you meant by this though

    I found something new in SQL Server 2012. You can now add an order by to the over( partition by and it will do a cumulative sum of the previous rows.

    The ORDER BY clause has been part of the OVER statement since 2005. Or do you mean that previously it wouldn't result in a cumulative sum?

    Stefan,

    If you create the tables in my article in a sql server 2008 R2 or lower version of sql server and try to run the select:

    SELECT demandID

    , itemID

    , orderQty

    , SUM(orderQty) OVER ( PARTITION BY itemID ORDER BY demandid ) AS 'cumQty'

    FROM demand

    You will get an error "Incorrect syntax near 'order'

    So adding the order by with the partition by is something new you can only do in sql server 2012.

    Thanks,

    Ben

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply