Cumulative sum for multiple sequences

  • Hi All,

    May be a better way of describing what I want to do in the header, but I didn't know it.

    I have a number of sequences down a hole. I am trying to work out a cumulative sum at a given depth. If this was simply the sum of all of the shallower values this would be easy enough, however, if there is a break in the depths, then I want the cumulative sum to re-start at the next interval. An example of the data is:

    HOLEIDDFROMDTOVALUE

    TEST104.43110.435.98

    TEST110.43111.430.8

    TEST111.43117.436.02

    TEST117.43119.752.32

    TEST146.42152.426.01

    TEST152.42156.424

    Notice there is a break in the depths after 119.75. Essentially what I am after is the sum of the value. eg:

    HOLEIDDFROMDTOVALUECumulative Sum

    TEST104.43110.435.985.98

    TEST110.43111.430.86.78

    TEST111.43117.436.0212.8

    TEST117.43119.752.3215.12

    TEST146.42152.426.016.01

    TEST152.42156.42410.01

    Note that the sum re-starts at 146.42.

    I have tried using the various row_number() combinations (similar to what was discussed here:

    http://www.sqldev.org/transactsql/merge-like-values-in-consecutive-records-90394.shtml

    I have had a crack at this, and can show you where I got to. The problem is that whenever I put in function parameters of "Hole" and "DTo" (there are multiple holes in the same table), I only get the correct answer for the first sequence of each hole.

    I wouldn't have thought this would be a difficult task, but I am running out of ideas. Any suggestions appreciated.

    JK

  • Here's what I came up with:

    -- sample data setup

    DECLARE @tbl TABLE

    (

    HOLEID VARCHAR(30),

    DFROM DEC(8,2),

    DTO DEC(8,2)

    )

    INSERT INTO @tbl

    SELECT 'TEST', 104.45 , 110.43 UNION ALL

    SELECT 'TEST', 110.43 ,111.43 UNION ALL

    SELECT 'TEST', 111.43, 117.43 UNION ALL

    SELECT 'TEST', 117.43, 119.75 UNION ALL

    SELECT 'TEST', 146.41, 152.42 UNION ALL

    SELECT 'TEST', 152.42, 156.42

    ;WITH cte AS -- get the upper limit DTO values per "group"

    (

    SELECT

    t1.dto,

    ROW_NUMBER() OVER(ORDER BY t1.dto ) AS ROW,

    t2.holeid AS holeid2

    FROM @tbl t1

    LEFT OUTER JOIN @tbl t2 ON t1.dto=t2.dfrom

    WHERE t2.holeid IS NULL

    )

    ,cte2 AS -- get the upper limit of the "previous group"

    (

    SELECT

    ISNULL(c2.dto,0) AS low,

    c1.dto AS high

    FROM cte c1

    LEFT OUTER JOIN cte c2

    ON c1.row=c2.row + 1

    )

    , cte3 AS -- get the corresponding lower limit of the "group"

    (

    SELECT

    MIN(DFROM) AS low,

    cte2.high

    FROM @tbl t

    INNER JOIN cte2

    ON DFROM > low

    GROUP BY low,high

    )

    SELECT -- final output

    *,

    dto-dfrom,

    dto-low

    FROM cte3

    INNER JOIN @tbl t

    ON t.DFROM >=low AND t.DTO <= high

    Unfortunately, your sample data and your expected output didn't match so I had to change it...

    Another option would be to use the quirky update method[/url]. If you have to deal with a large data volume you might want to compare the performance of the two options and decide which one to use (the quirky update will perfom much better on a larger data volume). But you need to follow each and every rule described in the article.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Thanks Lutz,

    This works well. Appreciate your time.

    JK

  • Hi Lutz,

    The solution you provided is what I am after, but I can't get output.

    The data that I have provided is drilling information: the DFROM and DTO are the drilled depths and the VALUE is the recovered core (material coming out of the ground). What I am trying to do is provide two calculated fields (hopefully a function that could do either). So, what you provided is the cumulative drilled depth, how do I calculate the cumulative value that pertains to each.

    JK

  • I couldn't figure out a way to modify Lutz's method to work correctly, so here is the Quirky Update method that he mentioned.

    You MUST follow ALL of the rules listed in the article that is in the comments, or it just won't work correctly. If you're not going to follow all of the rules, you WILL mess up your data.

    -- sample data setup

    DECLARE @tbl TABLE

    (

    HOLEID VARCHAR(30),

    DFROM DEC(8,2),

    DTO DEC(8,2),

    VALUE DEC(8,2),

    TOTAL DEC(8,2),

    -- clustered index required for the update statement!!!!

    PRIMARY KEY CLUSTERED(HOLEID, DFROM)

    )

    INSERT INTO @tbl (HOLEID, DFROM, DTO, VALUE)

    SELECT 'TEST', 104.45 , 110.43, 5.98 UNION ALL

    SELECT 'TEST', 110.43 ,111.43, 0.80 UNION ALL

    SELECT 'TEST', 111.43, 117.43, 6.02 UNION ALL

    SELECT 'TEST', 117.43, 119.75, 2.32 UNION ALL

    SELECT 'TEST', 146.41, 152.42, 6.01 UNION ALL

    SELECT 'TEST', 152.42, 156.42, 4.00;

    -- declare and initialize variables for the update statement

    declare @HoleID varchar(30), -- for anchor column

    @LastDTO dec(8,2),

    @Total dec(8,2),

    @Sequence int;

    SET @Sequence = 0;

    -- This form of the UPDATE statement has some rules for proper usage.

    -- See Jeff Moden's article at http://www.sqlservercentral.com/articles/T-SQL/68467/

    -- for a complete discussion of how this works, and all of the rules for utilizing it.

    WITH SafeTable AS

    (

    -- generate table with a sequence column in clustered index order

    -- in order to verify that update is happening in the correct order

    SELECT *,

    Sequence = row_number() OVER (ORDER BY HOLEID, DFROM)

    FROM @tbl

    )

    UPDATE t

    -- verify in proper sequence order; if not, throw an error so nothing is updated

    SET @Sequence = CASE WHEN Sequence = @Sequence + 1 THEN Sequence

    ELSE 1/0 END, -- not in proper sequence order, so throw an error

    -- total is the previous total plus this value

    -- if same HoleID and the DFrom is the previous DTO

    @Total = TOTAL = CASE WHEN @HoleID = HoleID AND DFROM = @LastDTO

    THEN Value + @Total

    -- otherwise, start total over again

    ELSE Value END,

    @HoleID = HoleID, -- anchor column (first column in clustered index)

    @LastDTO = DTO -- need to hang on to the DTO for use in the next column

    FROM SafeTable t WITH (TABLOCKX) -- lock the table

    OPTION (MAXDOP 1); -- prevent parallelism from messing things up

    -- NOTE: the WITH (TABLOCKX) is really not necessary for a table variable,

    -- but it is required for regular tables, so it's a real good idea

    -- to always use it so that you don't forget to.

    -- show the results

    SELECT *

    FROM @tbl;

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Thanks Wayne,

    Works a treat - Although you have explained it, I don't really understand what is going on with your script - I'm sure I could apply this elsewhere. Something for the weekend.

    JK

  • jkeys-972541 (10/15/2010)


    Thanks Wayne,

    Works a treat - Although you have explained it, I don't really understand what is going on with your script - I'm sure I could apply this elsewhere. Something for the weekend.

    JK

    If you don't understand it, then DON'T USE IT until you do. Read the article referenced in the comments. Ask questions here. But, DON'T USE IT until you do understand it. (Same applies for any code you plan to use, regardless of the source.)

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Ok, here's the recursive CTE with a triangular join:

    Side note: I changed my version slightly to cover different values for holeid.

    If you compare the performance you'll see that the quirky update will be faster.

    But speed isn't everything. It's much more important to understand how the code works instead of simply run stuff off the web. Assuming there are less than 10 values per group the triangular join might not be that bad for this case ...

    ;WITH cte AS -- get the upper limit DTO values per "group"

    (

    SELECT

    t1.dto,

    ROW_NUMBER() OVER(PARTITION BY t1.holeid ORDER BY t1.dto ) AS ROW,

    t1.holeid AS holeid

    FROM @tbl t1

    LEFT OUTER JOIN @tbl t2 ON t1.dto = t2.dfrom AND t1.HOLEID = t2.HOLEID

    WHERE t2.holeid IS NULL

    )

    ,cte2 AS -- get the upper limit of the "previous group"

    (

    SELECT

    ISNULL(c2.dto,0) AS low,

    c1.dto AS high,

    c1.holeid

    FROM cte c1

    LEFT OUTER JOIN cte c2

    ON c1.row=c2.row + 1 AND c1.holeid=c2.holeid

    ), cte3 AS -- link the upper and lower limits to the original data

    (

    SELECT t.*,cte2.low

    FROM cte2

    INNER JOIN @tbl t

    ON dfrom>=low AND dto <=high AND cte2.holeid=t.holeid

    )

    SELECT -- final query: use a triangular join per holeid and "data group" to get the running total.

    c2.holeid,

    c2.dfrom,

    c2.dto,

    c2.value,

    SUM(c1.value) AS total

    FROM cte3 c1

    INNER JOIN cte3 c2

    ON c1.low = c2.low AND c1.dfrom<=c2.dfrom AND c1.holeid=c2.holeid

    GROUP BY c2.holeid,c2.dfrom,c2.dto,c2.value



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hi,

    I have set up a derivative of the above function on several databases and worked fine. On the last database I got the error:

    Msg 4138, Level 16, State 1, Procedure Cum_Sum_Drilling, Line 82

    Conflicting locking hints are specified for table "@tbl". This may be caused by a conflicting hint specified for a view.

    Because @tbl is a table variable I commented out the

    WITH (TABLOCKX)

    section and it worked fine (the comments and the related article on the quirky update method said this was not necessary for a table variable).

    Why would it have worked on all databases except this last one? (The databases were all of the sam structure.) What am I missing?

    JK

  • jkeys-972541 (10/19/2010)


    Hi,

    I have set up a derivative of the above function on several databases and worked fine. On the last database I got the error:

    Msg 4138, Level 16, State 1, Procedure Cum_Sum_Drilling, Line 82

    Conflicting locking hints are specified for table "@tbl". This may be caused by a conflicting hint specified for a view.

    Because @tbl is a table variable I commented out the

    WITH (TABLOCKX)

    section and it worked fine (the comments and the related article on the quirky update method said this was not necessary for a table variable).

    Why would it have worked on all databases except this last one? (The databases were all of the sam structure.) What am I missing?

    JK

    Does the last database have a different default isolation level?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No, the isolation level is the same in each: read committed

    JK

  • I guess I'd have to be there to look at the whole thing. I'm out of "remote" ideas on the problem. My apologies.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff,

    It is working without TABLOCKX anyway so I am not going to lose any sleep over it.

    JK

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply