Cumulative sum of previous rows

Question

Cumulative sum of previous rows

bkubicek

SSChampion

Points: 10948
More actions
May 7, 2014 at 10:51 pm

#285392

Comments posted to this topic are about the item Cumulative sum of previous rows

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186126 More actions · Answer 1

Nice article, Ben - does exactly what it says on the tin, although many folks lurking around here would be more familiar with the name Running Total.

The old version of your update is what we call a "Triangular Join". If the partitions are small and hence you're only summing up a few rows it can be fast enough for production code. There are other ways of performing a running totals update including cursor, recursive CTE and the so-called "Quirky Update", and these have been exhaustively researched and compared by Jeff Moden et al in this[/url] and other articles.

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

SQL Simmo SSC Enthusiast Points: 132 More actions · Answer 2

Hey Ben,

Speaking of SQL 2012 and using "PARTITION OVER" have you tried specifying the rows relative to the current one and avoiding the UPDATE statement completely?

This will return the same results as at the end of your article.

eg)

SELECT D.demandID

,D.itemID

,D.orderNbr

,D.orderQty

,I.inventoryQty-SUM(D.orderQty) OVER (PARTITION BY D.itemID ORDER BY D.orderNbr RANGE UNBOUNDED PRECEDING) as netInventoryQty

FROM Demand D

INNER JOIN Inventory I

ON I.itemID=D.itemID

ORDER BY D.demandID

"PARTITION BY ItemID " will group them by item number and "ORDER BY orderNbr" will also put them in the correct order for the aggregation.

"RANGE UNBOUNDED PRECEDING" grabs all the preceding rows, grouped by itemID ordered by orderNbr.

I just found out about this new relative reference clause for OVER myself, very neat.

Simon 🙂

bkubicek SSChampion Points: 10948 More actions · Answer 3

bkubicek

SSChampion

Points: 10948

May 8, 2014 at 5:11 am

#1711802

Thanks Chris for your post.

Ben

bkubicek SSChampion Points: 10948 More actions · Answer 4

I will have to take a closer look at your code. Thanks for posting it.

Ben

rwatkins 25267 SSC Enthusiast Points: 106 More actions · Answer 5

I agree with SimonC, the RANGE clause is the better solution to the Running Total problem for read operations.

gbritton1 SSCertifiable Points: 6522 More actions · Answer 6

Here's a version using a CTE for an easy-to-read implimenation:

;with d as (

select netInventoryQty, demandID, d.itemID, i.inventoryQty,

SUM(orderQty) OVER ( PARTITION BY d.itemID ORDER BY demandid ) AS 'cumQty'

from demand d

join inventory i on d.itemID = i.itemID

)

update d set netInventoryQty = d.inventoryQty - d.cumqty

JohnK777 SSC Journeyman Points: 82 More actions · Answer 7

Good article.. Thanks!..

Here is how I would have solve the problem.

UPDATE a

SET a.netInventoryQty=c.inventoryQty-b.cQty

--SELECT *

FROM demand a

INNER JOIN (

select a.demandID, sum(b.orderQty) as cQty

FROM demand a

INNER JOIN demand b ON b.demandID<=a.demandID AND b.itemID=a.itemID

GROUP BY a.demandID

) b on b.demandID=a.demandID

INNER JOIN Inventory c ON c.itemID=a.itemID

I'll have to check out if your method is faster.

Thanks again

bkubicek SSChampion Points: 10948 More actions · Answer 8

Not sure if you completely read my article, but the solution you posted is what I replaced to speed up the update.

Thanks for your post.

Ben

JohnK777 SSC Journeyman Points: 82 More actions · Answer 9

I did read your article and my solution is similar but there are differences - mainly that I'm joining to sub query that is grouping - where you originally are using nested query for summing - correct??.. Before I read your article I wouldn't have even thought to have used OVER ( PARTITION BY... -- As it is pretty new to me. 🙂

Thanks

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 10

Good article.

I'm not sure what you meant by this though

I found something new in SQL Server 2012. You can now add an order by to the over( partition by and it will do a cumulative sum of the previous rows.

The ORDER BY clause has been part of the OVER statement since 2005. Or do you mean that previously it wouldn't result in a cumulative sum?

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

WayneS SSC Guru Points: 95458 More actions · Answer 11

rwatkins 25267 (5/8/2014)
I agree with SimonC, the RANGE clause is the better solution to the Running Total problem for read operations.

I disagree. For running aggregations (including totals), you need to use ROW, not RANGE. RANGE works by the value of the column, and where the value is the same, it shows the same total for both rows. Here's a quick example:

DECLARE @Test TABLE (

RowID INT IDENTITY,

FName varchar(20),

Salary smallint);

INSERT INTO @Test (FName, Salary)

VALUES ('George', 800),

('Sam', 950),

('Diane', 1100),

('Nicholas', 1250),

('Samuel', 1250),

('Patricia', 1300),

('Brian', 1500),

('Thomas', 1600),

('Fran', 2450),

('Debbie', 2850),

('Mark', 2975),

('James', 3000),

('Cynthia', 3000),

('Christopher', 5000);

SELECT RowID,

FName,

Salary,

SumByRows = SUM(Salary) OVER (ORDER BY Salary

ROWS UNBOUNDED PRECEDING),

SumByRange = SUM(Salary) OVER (ORDER BY Salary

RANGE UNBOUNDED PRECEDING)

FROM @Test;

Which gives these results:

RowID FName Salary SumByRows SumByRange

----------- -------------------- ------ ----------- -----------

1 George 800 800 800

2 Sam 950 1750 1750

3 Diane 1100 2850 2850

4 Nicholas 1250 4100 5350

5 Samuel 1250 5350 5350

6 Patricia 1300 6650 6650

7 Brian 1500 8150 8150

8 Thomas 1600 9750 9750

9 Fran 2450 12200 12200

10 Debbie 2850 15050 15050

11 Mark 2975 18025 18025

12 James 3000 21025 24025

13 Cynthia 3000 24025 24025

14 Christopher 5000 29025 29025

Note the results for Nicholas/Samuel (RowID 4/5) and James/Cynthia (RowID 12/13). The use of RANGE does not give the proper running total for the first record returned for each pair.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95458 More actions · Answer 12

Stefan Krzywicki (5/8/2014)
Good article.
I'm not sure what you meant by this though
I found something new in SQL Server 2012. You can now add an order by to the over( partition by and it will do a cumulative sum of the previous rows.
The ORDER BY clause has been part of the OVER statement since 2005. Or do you mean that previously it wouldn't result in a cumulative sum?

Prior to 2012, when using the OVER clause with an aggregate function, you could only specify the PARTITION BY clause, so every row within the partition would have the same value. With 2012, you can now specify the ORDER BY and ROW/RANGE clause. If you use the ORDER BY, you are using the default ROW/RANGE clause, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (from the first row in the partition to the current row). You can see more about the changes in my article[/url], or my blog post comparing running totals[/url].

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

rwatkins 25267 SSC Enthusiast Points: 106 More actions · Answer 13

Good point, Wayne. I had forgotten that there is a definite difference between ROWS and RANGE, despite the similar syntax.

bkubicek SSChampion Points: 10948 More actions · Answer 14

Stefan Krzywicki (5/8/2014)
Good article.
I'm not sure what you meant by this though
I found something new in SQL Server 2012. You can now add an order by to the over( partition by and it will do a cumulative sum of the previous rows.
The ORDER BY clause has been part of the OVER statement since 2005. Or do you mean that previously it wouldn't result in a cumulative sum?

Stefan,

If you create the tables in my article in a sql server 2008 R2 or lower version of sql server and try to run the select:

SELECT demandID

, itemID

, orderQty

, SUM(orderQty) OVER ( PARTITION BY itemID ORDER BY demandid ) AS 'cumQty'

FROM demand

You will get an error "Incorrect syntax near 'order'

So adding the order by with the partition by is something new you can only do in sql server 2012.

Thanks,

Ben