Cumulative sum for multiple sequences

Question

Post reply

Cumulative sum for multiple sequences

jkeys-972541

SSC-Addicted

Points: 410
More actions
October 14, 2010 at 5:09 pm

#227967

Hi All,
May be a better way of describing what I want to do in the header, but I didn't know it.
I have a number of sequences down a hole. I am trying to work out a cumulative sum at a given depth. If this was simply the sum of all of the shallower values this would be easy enough, however, if there is a break in the depths, then I want the cumulative sum to re-start at the next interval. An example of the data is:
HOLEIDDFROMDTOVALUE
TEST104.43110.435.98
TEST110.43111.430.8
TEST111.43117.436.02
TEST117.43119.752.32
TEST146.42152.426.01
TEST152.42156.424
Notice there is a break in the depths after 119.75. Essentially what I am after is the sum of the value. eg:
HOLEIDDFROMDTOVALUECumulative Sum
TEST104.43110.435.985.98
TEST110.43111.430.86.78
TEST111.43117.436.0212.8
TEST117.43119.752.3215.12
TEST146.42152.426.016.01
TEST152.42156.42410.01
Note that the sum re-starts at 146.42.
I have tried using the various row_number() combinations (similar to what was discussed here:
http://www.sqldev.org/transactsql/merge-like-values-in-consecutive-records-90394.shtml
I have had a crack at this, and can show you where I got to. The problem is that whenever I put in function parameters of "Hole" and "DTo" (there are multiple holes in the same table), I only get the correct answer for the first sequence of each hole.
I wouldn't have thought this would be a difficult task, but I am running out of ideas. Any suggestions appreciated.
JK

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

Here's what I came up with:

-- sample data setup

DECLARE @tbl TABLE

(

HOLEID VARCHAR(30),

DFROM DEC(8,2),

DTO DEC(8,2)

)

INSERT INTO @tbl

SELECT 'TEST', 104.45 , 110.43 UNION ALL

SELECT 'TEST', 110.43 ,111.43 UNION ALL

SELECT 'TEST', 111.43, 117.43 UNION ALL

SELECT 'TEST', 117.43, 119.75 UNION ALL

SELECT 'TEST', 146.41, 152.42 UNION ALL

SELECT 'TEST', 152.42, 156.42

;WITH cte AS -- get the upper limit DTO values per "group"

(

SELECT

t1.dto,

ROW_NUMBER() OVER(ORDER BY t1.dto ) AS ROW,

t2.holeid AS holeid2

FROM @tbl t1

LEFT OUTER JOIN @tbl t2 ON t1.dto=t2.dfrom

WHERE t2.holeid IS NULL

)

,cte2 AS -- get the upper limit of the "previous group"

(

SELECT

ISNULL(c2.dto,0) AS low,

c1.dto AS high

FROM cte c1

LEFT OUTER JOIN cte c2

ON c1.row=c2.row + 1

)

, cte3 AS -- get the corresponding lower limit of the "group"

(

SELECT

MIN(DFROM) AS low,

cte2.high

FROM @tbl t

INNER JOIN cte2

ON DFROM > low

GROUP BY low,high

)

SELECT -- final output

*,

dto-dfrom,

dto-low

FROM cte3

INNER JOIN @tbl t

ON t.DFROM >=low AND t.DTO <= high

Unfortunately, your sample data and your expected output didn't match so I had to change it...

Another option would be to use the quirky update method[/url]. If you have to deal with a large data volume you might want to compare the performance of the two options and decide which one to use (the quirky update will perfom much better on a larger data volume). But you need to follow each and every rule described in the article.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

jkeys-972541 SSC-Addicted Points: 410 More actions · Answer 2

Thanks Lutz,

This works well. Appreciate your time.

JK

jkeys-972541 SSC-Addicted Points: 410 More actions · Answer 3

Hi Lutz,

The solution you provided is what I am after, but I can't get output.

The data that I have provided is drilling information: the DFROM and DTO are the drilled depths and the VALUE is the recovered core (material coming out of the ground). What I am trying to do is provide two calculated fields (hopefully a function that could do either). So, what you provided is the cumulative drilled depth, how do I calculate the cumulative value that pertains to each.

JK

WayneS SSC Guru Points: 95461 More actions · Answer 4

I couldn't figure out a way to modify Lutz's method to work correctly, so here is the Quirky Update method that he mentioned.

You MUST follow ALL of the rules listed in the article that is in the comments, or it just won't work correctly. If you're not going to follow all of the rules, you WILL mess up your data.

-- sample data setup

DECLARE @tbl TABLE

(

HOLEID VARCHAR(30),

DFROM DEC(8,2),

DTO DEC(8,2),

VALUE DEC(8,2),

TOTAL DEC(8,2),

-- clustered index required for the update statement!!!!

PRIMARY KEY CLUSTERED(HOLEID, DFROM)

)

INSERT INTO @tbl (HOLEID, DFROM, DTO, VALUE)

SELECT 'TEST', 104.45 , 110.43, 5.98 UNION ALL

SELECT 'TEST', 110.43 ,111.43, 0.80 UNION ALL

SELECT 'TEST', 111.43, 117.43, 6.02 UNION ALL

SELECT 'TEST', 117.43, 119.75, 2.32 UNION ALL

SELECT 'TEST', 146.41, 152.42, 6.01 UNION ALL

SELECT 'TEST', 152.42, 156.42, 4.00;

-- declare and initialize variables for the update statement

declare @HoleID varchar(30), -- for anchor column

@LastDTO dec(8,2),

@Total dec(8,2),

@Sequence int;

SET @Sequence = 0;

-- This form of the UPDATE statement has some rules for proper usage.

-- See Jeff Moden's article at http://www.sqlservercentral.com/articles/T-SQL/68467/

-- for a complete discussion of how this works, and all of the rules for utilizing it.

WITH SafeTable AS

(

-- generate table with a sequence column in clustered index order

-- in order to verify that update is happening in the correct order

SELECT *,

Sequence = row_number() OVER (ORDER BY HOLEID, DFROM)

FROM @tbl

)

UPDATE t

-- verify in proper sequence order; if not, throw an error so nothing is updated

SET @Sequence = CASE WHEN Sequence = @Sequence + 1 THEN Sequence

ELSE 1/0 END, -- not in proper sequence order, so throw an error

-- total is the previous total plus this value

-- if same HoleID and the DFrom is the previous DTO

@Total = TOTAL = CASE WHEN @HoleID = HoleID AND DFROM = @LastDTO

THEN Value + @Total

-- otherwise, start total over again

ELSE Value END,

@HoleID = HoleID, -- anchor column (first column in clustered index)

@LastDTO = DTO -- need to hang on to the DTO for use in the next column

FROM SafeTable t WITH (TABLOCKX) -- lock the table

OPTION (MAXDOP 1); -- prevent parallelism from messing things up

-- NOTE: the WITH (TABLOCKX) is really not necessary for a table variable,

-- but it is required for regular tables, so it's a real good idea

-- to always use it so that you don't forget to.

-- show the results

SELECT *

FROM @tbl;

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

jkeys-972541 SSC-Addicted Points: 410 More actions · Answer 5

Thanks Wayne,

Works a treat - Although you have explained it, I don't really understand what is going on with your script - I'm sure I could apply this elsewhere. Something for the weekend.

JK

WayneS SSC Guru Points: 95461 More actions · Answer 6

jkeys-972541 (10/15/2010)
Thanks Wayne,
Works a treat - Although you have explained it, I don't really understand what is going on with your script - I'm sure I could apply this elsewhere. Something for the weekend.
JK

If you don't understand it, then DON'T USE IT until you do. Read the article referenced in the comments. Ask questions here. But, DON'T USE IT until you do understand it. (Same applies for any code you plan to use, regardless of the source.)

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

LutzM SSC Guru Points: 107049 More actions · Answer 7

Ok, here's the recursive CTE with a triangular join:

Side note: I changed my version slightly to cover different values for holeid.

If you compare the performance you'll see that the quirky update will be faster.

But speed isn't everything. It's much more important to understand how the code works instead of simply run stuff off the web. Assuming there are less than 10 values per group the triangular join might not be that bad for this case ...

;WITH cte AS -- get the upper limit DTO values per "group"

(

SELECT

t1.dto,

ROW_NUMBER() OVER(PARTITION BY t1.holeid ORDER BY t1.dto ) AS ROW,

t1.holeid AS holeid

FROM @tbl t1

LEFT OUTER JOIN @tbl t2 ON t1.dto = t2.dfrom AND t1.HOLEID = t2.HOLEID

WHERE t2.holeid IS NULL

)

,cte2 AS -- get the upper limit of the "previous group"

(

SELECT

ISNULL(c2.dto,0) AS low,

c1.dto AS high,

c1.holeid

FROM cte c1

LEFT OUTER JOIN cte c2

ON c1.row=c2.row + 1 AND c1.holeid=c2.holeid

), cte3 AS -- link the upper and lower limits to the original data

(

SELECT t.*,cte2.low

FROM cte2

INNER JOIN @tbl t

ON dfrom>=low AND dto <=high AND cte2.holeid=t.holeid

)

SELECT -- final query: use a triangular join per holeid and "data group" to get the running total.

c2.holeid,

c2.dfrom,

c2.dto,

c2.value,

SUM(c1.value) AS total

FROM cte3 c1

INNER JOIN cte3 c2

ON c1.low = c2.low AND c1.dfrom<=c2.dfrom AND c1.holeid=c2.holeid

GROUP BY c2.holeid,c2.dfrom,c2.dto,c2.value

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

jkeys-972541 SSC-Addicted Points: 410 More actions · Answer 8

Hi,

I have set up a derivative of the above function on several databases and worked fine. On the last database I got the error:

Msg 4138, Level 16, State 1, Procedure Cum_Sum_Drilling, Line 82
Conflicting locking hints are specified for table "@tbl". This may be caused by a conflicting hint specified for a view.

Because @tbl is a table variable I commented out the

WITH (TABLOCKX)

section and it worked fine (the comments and the related article on the quirky update method said this was not necessary for a table variable).

Why would it have worked on all databases except this last one? (The databases were all of the sam structure.) What am I missing?

JK

Jeff Moden SSC Guru Points: 1004682 More actions · Answer 9

jkeys-972541 (10/19/2010)
Hi,
I have set up a derivative of the above function on several databases and worked fine. On the last database I got the error:
Msg 4138, Level 16, State 1, Procedure Cum_Sum_Drilling, Line 82
Conflicting locking hints are specified for table "@tbl". This may be caused by a conflicting hint specified for a view.
Because @tbl is a table variable I commented out the
WITH (TABLOCKX)
section and it worked fine (the comments and the related article on the quirky update method said this was not necessary for a table variable).
Why would it have worked on all databases except this last one? (The databases were all of the sam structure.) What am I missing?
JK

Does the last database have a different default isolation level?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jkeys-972541 SSC-Addicted Points: 410 More actions · Answer 10

No, the isolation level is the same in each: read committed

JK

Jeff Moden SSC Guru Points: 1004682 More actions · Answer 11

I guess I'd have to be there to look at the whole thing. I'm out of "remote" ideas on the problem. My apologies.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jkeys-972541 SSC-Addicted Points: 410 More actions · Answer 12

Thanks Jeff,

It is working without TABLOCKX anyway so I am not going to lose any sleep over it.

JK