Cumulative sum of previous rows

Question

Cumulative sum of previous rows

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 1

WayneS (5/8/2014)
Stefan Krzywicki (5/8/2014)
Good article.
I'm not sure what you meant by this though
I found something new in SQL Server 2012. You can now add an order by to the over( partition by and it will do a cumulative sum of the previous rows.
The ORDER BY clause has been part of the OVER statement since 2005. Or do you mean that previously it wouldn't result in a cumulative sum?
Prior to 2012, when using the OVER clause with an aggregate function, you could only specify the PARTITION BY clause, so every row within the partition would have the same value. With 2012, you can now specify the ORDER BY and ROW/RANGE clause. If you use the ORDER BY, you are using the default ROW/RANGE clause, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (from the first row in the partition to the current row). You can see more about the changes in my article[/url], or my blog post comparing running totals[/url].

Thanks for the clarification.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 2

bkubicek (5/8/2014)
Stefan Krzywicki (5/8/2014)
Good article.
I'm not sure what you meant by this though
I found something new in SQL Server 2012. You can now add an order by to the over( partition by and it will do a cumulative sum of the previous rows.
The ORDER BY clause has been part of the OVER statement since 2005. Or do you mean that previously it wouldn't result in a cumulative sum?
Stefan,
If you create the tables in my article in a sql server 2008 R2 or lower version of sql server and try to run the select:
SELECT demandID
, itemID
, orderQty
, SUM(orderQty) OVER ( PARTITION BY itemID ORDER BY demandid ) AS 'cumQty'
FROM demand
You will get an error "Incorrect syntax near 'order'
So adding the order by with the partition by is something new you can only do in sql server 2012.
Thanks,
Ben

Cool, thanks. Just needed some clarification.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 3

+1 Good topic, good article, and good discussion. 🙂

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

SQL Simmo SSC Enthusiast Points: 132 More actions · Answer 4

Thanks WayneS, read your article, I was looking for a good description of the difference between ROW and RANGE. Yes, most definitely ROW should be used here.

Interesting, so in the example if the ORDER BY value happened to be unique, it would have appeared to return the correct values, a problem waiting to happen if the column is not truly unique.

bkubicek SSChampion Points: 10948 More actions · Answer 5

simonc 39536 (5/9/2014)
Thanks WayneS, read your article, I was looking for a good description of the difference between ROW and RANGE. Yes, most definitely ROW should be used here.
Interesting, so in the example if the ORDER BY value happened to be unique, it would have appeared to return the correct values, a problem waiting to happen if the column is not truly unique.

In my example each demand would clearly be unique, so no problem there.

Thanks,

Ben

SQL Simmo SSC Enthusiast Points: 132 More actions · Answer 6

WayneS is still absolutely correct about the usage of ROWS though.

In this instance you should use ROWS because it is the more correct solution for what you are doing,

it just so happens that the ORDER BY column is unique in this instance and both return the same results.

Best practice would say you should use ROWS.

However, for the heck of it, since you would be using "ORDER BY demandid", which is the primary key and therefore unique, both should return the same results, yes.

Query from WayneS altered to include ORDER BY RowID:-

DECLARE @test-2 TABLE (

RowID INT IDENTITY,

FName varchar(20),

Salary smallint);

INSERT INTO @test-2 (FName, Salary)

VALUES ('George', 800),

('Sam', 950),

('Diane', 1100),

('Nicholas', 1250),

('Samuel', 1250),

('Patricia', 1300),

('Brian', 1500),

('Thomas', 1600),

('Fran', 2450),

('Debbie', 2850),

('Mark', 2975),

('James', 3000),

('Cynthia', 3000),

('Christopher', 5000);

SELECT RowID,

FName,

Salary,

SumByRowsBySalary = SUM(Salary) OVER (ORDER BY Salary

ROWS UNBOUNDED PRECEDING),

SumByRangeBySalary = SUM(Salary) OVER (ORDER BY Salary

RANGE UNBOUNDED PRECEDING),

SumByRowsByRowID = SUM(Salary) OVER (ORDER BY RowID

ROWS UNBOUNDED PRECEDING),

SumByRangeByRowID = SUM(Salary) OVER (ORDER BY RowID

RANGE UNBOUNDED PRECEDING)

FROM @test-2;

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

simonc 39536 (5/9/2014)
...
In this instance you should use ROWS because it is the more correct solution for what you are doing,
it just so happens that the ORDER BY column is unique in this instance and both return the same results.
Best practice would say you should use ROWS....

There's no evidence for this. Ben's data is partitioned by a column and doesn't require any kind of subgrouping within partitions. From BOL: "The ROWS clause limits the rows within a partition".

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

bkubicek SSChampion Points: 10948 More actions · Answer 8

simonc 39536 (5/9/2014)
WayneS is still absolutely correct about the usage of ROWS though.
In this instance you should use ROWS because it is the more correct solution for what you are doing,
it just so happens that the ORDER BY column is unique in this instance and both return the same results.
Best practice would say you should use ROWS.
However, for the heck of it, since you would be using "ORDER BY demandid", which is the primary key and therefore unique, both should return the same results, yes.
Query from WayneS altered to include ORDER BY RowID:-

I understand you are trying to be helpful, but it didn't just so happen that the demandid was unique. The table and the code was designed that way.

Ben

SQL Simmo SSC Enthusiast Points: 132 More actions · Answer 9

Hey Ben,

Sorry, not disrespecting your data structure, just trying to highlight that the results for ROWS/RANGE would always be the same

in the situation where the column in the ORDER BY happens to contain unique data, regardless of it being designed for that purpose or not.

That's why I included the example based on code from WayneS.

Anyway, why I'm thinking ROWS is because the column in the ORDER BY is a primary key and therefore always unique,

you don't need the system looking for rows with the same value to be considered as the current row,

as is the behaviour of RANGE, because there aren't any, the column's unique, you just want it to add up the

preceding rows based on the single current row.

Hey, if RANGE returns the data vastly quicker than ROWS for this data structure, you should probably use it,

but if they return the data at the same rate and there is a choice I'd be picking ROWS.

Sometimes you gotta do what you gotta do for performance.

Simon 🙂

Stefaans SSC-Addicted Points: 488 More actions · Answer 10

Ben,

A super article and thanks ever so much. I would also recommend that folks have a look at similar techniques done by Jacob Sebastian. This series of recursion articles(cte) is second to none. I implemented the technique at a client site a few years back. The clients were thrilled.

For those of you in the financial world, Jacob's techniques are first class for composite funds.

http://beyondrelational.com/modules/2/blogs/28/posts/10486/recursive-cte-and-ordering-of-the-hierarchical-result.aspx

Ken Hiatt SSC-Addicted Points: 469 More actions · Answer 11

First, thank you for an introduction to a command I've never heard of. I'll need to read up on it to see if it has an application for us.

But the thing that throws me off, and I know this is basically off topic, is why would you have the quantity of an item in a separate table from the item itself? It will always be a 1 to 1 relation, so I would think it would be more efficient to have quantity as another field in the item table. Yes, I realize you always want to use a greatly simplified table setup to demonstrate a command, but I was curious what your real world data structure is like. Do you actually have a table just to keep track of quantities, or are there additional fields that would warrant the extra table and just aren't relevant to the demonstration? Is there significant performance that can be gained by this structure?

Thanks,

Ken

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 12

akljfhnlaflkj

SSC Guru

Points: 76202

April 21, 2015 at 4:03 pm

#1792585

Great. Thanks for this one.