Cumulative sum of previous rows

  • WayneS (5/8/2014)


    Stefan Krzywicki (5/8/2014)


    Good article.

    I'm not sure what you meant by this though

    I found something new in SQL Server 2012. You can now add an order by to the over( partition by and it will do a cumulative sum of the previous rows.

    The ORDER BY clause has been part of the OVER statement since 2005. Or do you mean that previously it wouldn't result in a cumulative sum?

    Prior to 2012, when using the OVER clause with an aggregate function, you could only specify the PARTITION BY clause, so every row within the partition would have the same value. With 2012, you can now specify the ORDER BY and ROW/RANGE clause. If you use the ORDER BY, you are using the default ROW/RANGE clause, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (from the first row in the partition to the current row). You can see more about the changes in my article[/url], or my blog post comparing running totals[/url].

    Thanks for the clarification.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • bkubicek (5/8/2014)


    Stefan Krzywicki (5/8/2014)


    Good article.

    I'm not sure what you meant by this though

    I found something new in SQL Server 2012. You can now add an order by to the over( partition by and it will do a cumulative sum of the previous rows.

    The ORDER BY clause has been part of the OVER statement since 2005. Or do you mean that previously it wouldn't result in a cumulative sum?

    Stefan,

    If you create the tables in my article in a sql server 2008 R2 or lower version of sql server and try to run the select:

    SELECT demandID

    , itemID

    , orderQty

    , SUM(orderQty) OVER ( PARTITION BY itemID ORDER BY demandid ) AS 'cumQty'

    FROM demand

    You will get an error "Incorrect syntax near 'order'

    So adding the order by with the partition by is something new you can only do in sql server 2012.

    Thanks,

    Ben

    Cool, thanks. Just needed some clarification.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • +1 Good topic, good article, and good discussion. 🙂

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thanks WayneS, read your article, I was looking for a good description of the difference between ROW and RANGE. Yes, most definitely ROW should be used here.

    Interesting, so in the example if the ORDER BY value happened to be unique, it would have appeared to return the correct values, a problem waiting to happen if the column is not truly unique.

  • simonc 39536 (5/9/2014)


    Thanks WayneS, read your article, I was looking for a good description of the difference between ROW and RANGE. Yes, most definitely ROW should be used here.

    Interesting, so in the example if the ORDER BY value happened to be unique, it would have appeared to return the correct values, a problem waiting to happen if the column is not truly unique.

    In my example each demand would clearly be unique, so no problem there.

    Thanks,

    Ben

  • WayneS is still absolutely correct about the usage of ROWS though.

    In this instance you should use ROWS because it is the more correct solution for what you are doing,

    it just so happens that the ORDER BY column is unique in this instance and both return the same results.

    Best practice would say you should use ROWS.

    However, for the heck of it, since you would be using "ORDER BY demandid", which is the primary key and therefore unique, both should return the same results, yes.

    Query from WayneS altered to include ORDER BY RowID:-

    DECLARE @test-2 TABLE (

    RowID INT IDENTITY,

    FName varchar(20),

    Salary smallint);

    INSERT INTO @test-2 (FName, Salary)

    VALUES ('George', 800),

    ('Sam', 950),

    ('Diane', 1100),

    ('Nicholas', 1250),

    ('Samuel', 1250),

    ('Patricia', 1300),

    ('Brian', 1500),

    ('Thomas', 1600),

    ('Fran', 2450),

    ('Debbie', 2850),

    ('Mark', 2975),

    ('James', 3000),

    ('Cynthia', 3000),

    ('Christopher', 5000);

    SELECT RowID,

    FName,

    Salary,

    SumByRowsBySalary = SUM(Salary) OVER (ORDER BY Salary

    ROWS UNBOUNDED PRECEDING),

    SumByRangeBySalary = SUM(Salary) OVER (ORDER BY Salary

    RANGE UNBOUNDED PRECEDING),

    SumByRowsByRowID = SUM(Salary) OVER (ORDER BY RowID

    ROWS UNBOUNDED PRECEDING),

    SumByRangeByRowID = SUM(Salary) OVER (ORDER BY RowID

    RANGE UNBOUNDED PRECEDING)

    FROM @test-2;

  • simonc 39536 (5/9/2014)


    ...

    In this instance you should use ROWS because it is the more correct solution for what you are doing,

    it just so happens that the ORDER BY column is unique in this instance and both return the same results.

    Best practice would say you should use ROWS....

    There's no evidence for this. Ben's data is partitioned by a column and doesn't require any kind of subgrouping within partitions. From BOL: "The ROWS clause limits the rows within a partition".

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • simonc 39536 (5/9/2014)


    WayneS is still absolutely correct about the usage of ROWS though.

    In this instance you should use ROWS because it is the more correct solution for what you are doing,

    it just so happens that the ORDER BY column is unique in this instance and both return the same results.

    Best practice would say you should use ROWS.

    However, for the heck of it, since you would be using "ORDER BY demandid", which is the primary key and therefore unique, both should return the same results, yes.

    Query from WayneS altered to include ORDER BY RowID:-

    I understand you are trying to be helpful, but it didn't just so happen that the demandid was unique. The table and the code was designed that way.

    Ben

  • Hey Ben,

    Sorry, not disrespecting your data structure, just trying to highlight that the results for ROWS/RANGE would always be the same

    in the situation where the column in the ORDER BY happens to contain unique data, regardless of it being designed for that purpose or not.

    That's why I included the example based on code from WayneS.

    Anyway, why I'm thinking ROWS is because the column in the ORDER BY is a primary key and therefore always unique,

    you don't need the system looking for rows with the same value to be considered as the current row,

    as is the behaviour of RANGE, because there aren't any, the column's unique, you just want it to add up the

    preceding rows based on the single current row.

    Hey, if RANGE returns the data vastly quicker than ROWS for this data structure, you should probably use it,

    but if they return the data at the same rate and there is a choice I'd be picking ROWS.

    Sometimes you gotta do what you gotta do for performance.

    Simon 🙂

  • Ben,

    A super article and thanks ever so much. I would also recommend that folks have a look at similar techniques done by Jacob Sebastian. This series of recursion articles(cte) is second to none. I implemented the technique at a client site a few years back. The clients were thrilled.

    For those of you in the financial world, Jacob's techniques are first class for composite funds.

    http://beyondrelational.com/modules/2/blogs/28/posts/10486/recursive-cte-and-ordering-of-the-hierarchical-result.aspx

  • First, thank you for an introduction to a command I've never heard of. I'll need to read up on it to see if it has an application for us.

    But the thing that throws me off, and I know this is basically off topic, is why would you have the quantity of an item in a separate table from the item itself? It will always be a 1 to 1 relation, so I would think it would be more efficient to have quantity as another field in the item table. Yes, I realize you always want to use a greatly simplified table setup to demonstrate a command, but I was curious what your real world data structure is like. Do you actually have a table just to keep track of quantities, or are there additional fields that would warrant the extra table and just aren't relevant to the demonstration? Is there significant performance that can be gained by this structure?

    Thanks,

    Ken

  • Great. Thanks for this one.

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply