July 6, 2011 at 4:02 am
Hi ,
Have a table that looks like this:
id count
1 100
2 50
3 10
I want to add a new column called cumulative_sum, so the table would look like this:
id count cumulative_sum
1 100 100
2 50 150
3 10 160
Is there any SQL statement that can do this easily? What's the best way to accomplish this?
Need help? Help us help you.
July 6, 2011 at 4:49 am
--drop table #data
create table #data
(
id int identity(1,1),
num numeric(18,0),
cnum numeric(18,0)
)
insert into #data (num, cnum)
select 100,0
union
select 50,0
union
select 10,0
union
select 65,0
declare @cnt int
set @cnt = 1
while (@cnt <= (select COUNT(*) from #data))
begin
update #data
set cnum = (select SUM(num) from #data
where id <= @cnt)
where [id] = @cnt
set @cnt = @cnt +1
end
select * from #data
HTH :-):-)
taybre
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
July 6, 2011 at 6:57 am
tommey152 (7/6/2011)
Hi ,.....
Is there any SQL statement that can do this easily? What's the best way to accomplish this?
Easiest is the recursive CTE:
DROP TABLE #SourceData
CREATE TABLE #SourceData (id INT IDENTITY(1,1), [count] INT, cumulative_sum INT)
INSERT INTO #SourceData (count) VALUES (100),(50),(10)
;WITH Accumulator AS (
SELECT id,
[count],
cumulative_sum = [count]
FROM #SourceData
WHERE id = 1
UNION ALL
SELECT s.id,
s.[count],
a.cumulative_sum+s.[count]
FROM #SourceData s
INNER JOIN Accumulator a
ON a.id + 1 = s.id
) SELECT id,
[count],
cumulative_sum
FROM Accumulator OPTION (MAXRECURSION 0)
This method can calculate values for a million row in around ten seconds providing there's a unique clustered index on the column which holds the sequential values
Best is probably the "Running Totals Update"[/url] - it's fastest anyway.
Worst are those methods using a triangular join - where the value for say row 10 is calculated by summing the values for rows 1 to 10, then the value for row 11 is calculated by summing the values for rows 1 to 11...
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 6, 2011 at 7:40 pm
Chris,
Do you know of a way to use the rCTE to do a running total without the ID's being perfectly sequential?
--Jeff Moden
Change is inevitable... Change for the better is not.
July 6, 2011 at 9:09 pm
Jeff Moden (7/6/2011)
Chris,Do you know of a way to use the rCTE to do a running total without the ID's being perfectly sequential?
There's a way using Cross Apply by taking the Min of the greater IDs as your next connection, but it's data volume and index dependent on if you're better off slapping a Row_Number() on it at that point.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
July 7, 2011 at 2:04 am
Craig Farrell (7/6/2011)
Jeff Moden (7/6/2011)
Chris,Do you know of a way to use the rCTE to do a running total without the ID's being perfectly sequential?
There's a way using Cross Apply by taking the Min of the greater IDs as your next connection, but it's data volume and index dependent on if you're better off slapping a Row_Number() on it at that point.
You'd think so, but it's not straightforward:
DROP TABLE #SourceData
CREATE TABLE #SourceData (id INT IDENTITY(1,1), [count] INT, cumulative_sum INT)
INSERT INTO #SourceData (count) VALUES (100),(50),(10)
;WITH Accumulator AS (
SELECT id,
[count],
cumulative_sum = [count]
FROM #SourceData
WHERE id = 1
UNION ALL
SELECT s.id,
s.[count],
a.cumulative_sum+s.[count]
FROM Accumulator a
CROSS APPLY (SELECT NextID = MIN(id) FROM #SourceData WHERE id > a.id) n
INNER JOIN #SourceData s ON s.id = n.NextID
) SELECT id,
[count],
cumulative_sum
FROM Accumulator OPTION (MAXRECURSION 0)
Msg 467, Level 16, State 1, Line 5
GROUP BY, HAVING, or aggregate functions are not allowed in the recursive part of a recursive common table expression 'Accumulator'.
Jeff, thanks for asking this. I've been working on a rCTE running totals article for a few months (when not wrestling with SSIS) but it's too bland to offer as it stands. Cracking this particular nut could make it more interesting and worthwhile.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 7, 2011 at 4:10 am
ChrisM@Work (7/7/2011)
You'd think so, but it's not straightforward...
MIN and MAX may not be allowed in the recursive part, but MIN and MAX are no more than the first row (TOP 1) ordered in some particular way. Now, TOP isn't allowed in the recursive part either, but nobody told the query optimizer that, so we can use ROW_NUMBER = 1 to achieve a TOP operator in the plan.
DROP TABLE #SourceData;
GO
CREATE TABLE #SourceData
(
idINTEGER IDENTITY(1,1) PRIMARY KEY,
[count]INTEGER NOT NULL,
cumulative_sumINTEGER NULL,
);
GO
INSERT #SourceData
(count)
VALUES
(100),
(50),
(10);
WITH Accumulator AS
(
SELECT
sd.id,
sd.[count],
cumulative_sum = sd.[count]
FROM #SourceData AS sd
WHERE
sd.id = 1
UNION ALL
SELECT
iTVF.id,
iTVF.[count],
a.cumulative_sum + iTVF.[count]
FROM Accumulator AS a
CROSS APPLY
(
SELECT
r.id,
r.[count]
FROM
(
SELECT
sd.id,
sd.[count],
rn = ROW_NUMBER() OVER (
ORDER BY sd.id)
FROM #SourceData AS sd
WHERE
sd.id > a.id
) AS r
WHERE
r.rn = 1
) AS iTVF
)
SELECT
a.id,
a.[count],
a.cumulative_sum
FROM Accumulator AS a
OPTION (MAXRECURSION 0);
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 7, 2011 at 4:47 am
Nice one, Paul. The triangular join put me off writing and testing it, but in practice the performance is stunningly good - half a million rows return in 12s on this box.
INSERT INTO #SourceData ([count]) SELECT a.[count] FROM #SourceData a, #SourceData b, #SourceData c, #SourceData d
INSERT INTO #SourceData ([count]) SELECT a.[count] FROM #SourceData a, #SourceData b, #SourceData c
go
;WITH Accumulator AS
(
SELECT
sd.id,
sd.[count],
cumulative_sum = sd.[count]
FROM #SourceData AS sd
WHERE
sd.id = 1
UNION ALL
SELECT
s.id,
s.[count],
a.cumulative_sum+s.[count]
FROM Accumulator AS a
CROSS APPLY
(
SELECT
NextID = r.id
FROM
(
SELECT
sd.id,
rn = ROW_NUMBER() OVER (ORDER BY sd.id)
FROM #SourceData AS sd
WHERE
sd.id > a.id
) AS r
WHERE
r.rn = 1
) AS n
INNER JOIN #SourceData s ON
s.id = n.NextID
)
SELECT
a.id,
a.[count],
a.cumulative_sum
FROM Accumulator AS a
OPTION (MAXRECURSION 0);
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 7, 2011 at 4:58 am
Hi Chris,
There's no need for a join at all - a point I realised after editing the TOP trick into your code. I have updated the code and query plan to show what I'm getting at here. Eight seconds on my machine. The only real issue I have is that recursive query plans cannot use parallelism directly. Shame.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 7, 2011 at 5:36 am
The new version runs faster but not by much.
So the optimizer converts what looks like a triangular join (the entire CROSS APPLY part) into a TOP query?
Interestingly, you can put SUM OVER () into the iTVF without an error message, and so long as you don't reference the product in the outer query, the optimizer will still recognise the iTVF as a TOP query.
SELECT
sd.id,
sd.[count],
SUMcount = SUM([count]) OVER (PARTITION BY 1),
rn = ROW_NUMBER() OVER (ORDER BY sd.id)
FROM #SourceData AS sd
WHERE
sd.id > a.id
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 7, 2011 at 5:50 am
SQLkiwi (7/7/2011)
Hi Chris,There's no need for a join at all - a point I realised after editing the TOP trick into your code. I have updated the code and query plan to show what I'm getting at here. Eight seconds on my machine. The only real issue I have is that recursive query plans cannot use parallelism directly. Shame.
Paul
This looks similar to your DISTINCT trick...
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 7, 2011 at 6:12 am
ChrisM@Work (7/7/2011)
So the optimizer converts what looks like a triangular join (the entire CROSS APPLY part) into a TOP query?
There's a bit more going on than that, but essentially yes.
Interestingly, you can put SUM OVER () into the iTVF without an error message, and so long as you don't reference the product in the outer query, the optimizer will still recognise the iTVF as a TOP query.
You can do that, but why would you want to? By the way, a windowing function in the recursive part of a CTE has a weird semantic (quite counter-intuitive, in fact) but works the way it does because they are intended for use in hierarchy queries, where apparently the implemented semantic is the desired one.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 7, 2011 at 6:39 am
SQLkiwi (7/7/2011)
... By the way, a windowing function in the recursive part of a CTE has a weird semantic (quite counter-intuitive, in fact) but works the way it does because they are intended for use in hierarchy queries, where apparently the implemented semantic is the desired one.
I heard that too but the full text of the article hasn't reached this end of the solar system yet!
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 7, 2011 at 10:50 am
ChrisM@Work (7/7/2011)
SQLkiwi (7/7/2011)
Hi Chris,There's no need for a join at all - a point I realised after editing the TOP trick into your code. I have updated the code and query plan to show what I'm getting at here. Eight seconds on my machine. The only real issue I have is that recursive query plans cannot use parallelism directly. Shame.
Paul
This looks similar to your DISTINCT trick...
Hey Chris, the DISTINCT trick you are mentioning is the one Paul coded as an altenative to DISTINCT operation ? I have been searching for that piece of code but in vain. Can you point me to that piece?
Paul, if u still on this thread, can you point me to your altenative to DISTINCT function?
July 7, 2011 at 10:56 am
ColdCoffee (7/7/2011)
Paul, if u still on this thread, can you point me to your altenative to DISTINCT function?
Sure.
http://www.sqlservercentral.com/Forums/FindPost1013407.aspx
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply