Query to Assign a Sequence Number

  • texpic

    SSCertifiable

    Points: 5882

    I've been away from the computer most of the weekend (a first for me). Lots of interesting solutions. The simplicity of the code stuns me. Thanks!

    simplicity may be the wrong word, elegance

  • Jeff Moden

    SSC Guru

    Points: 996645

    Thanks for the feedback, Texpic.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • JonFox

    SSCrazy

    Points: 2157

    Thanks for the encouragement Jeff! Frankly, this is why I've been trying to spend time on the forums lately...it seems almost every time I try to help someone, I learn something cool myself. 😀 The "cascading list sequencing" trick is something I hadn't encountered before, and it's great. I'm definitely adding that and ColdCoffee's "MAX() OVER" trick to my toolbox.

    I've been playing with both solutions this morning trying to figure out which has the better performance. It actually looks like ColdCoffee's MAX() OVER version is more expensive than using the extra ROW_NUMBER(), at least from an I/O perspective; the execution plan shows a couple of extra Lazy Spools being aggregated and joined back to the final result set (I assume the result of the partitioned MAX() ) and this results in a significant amount of Worktable logical reads. I wonder if there's a way to get rid of that overhead, or if partitioning the MAX() function is inherently more expensive?

    I have no idea at this point which is more effective from a CPU time perspective, though...I can't determine a clear winner with the small amount of test data.

    Anyway, very cool stuff all the way around!

  • Jeff Moden

    SSC Guru

    Points: 996645

    JonFox (11/21/2011)


    Thanks for the encouragement Jeff! Frankly, [font="Arial Black"]this is why I've been trying to spend time on the forums lately...it seems almost every time I try to help someone, I learn something cool myself[/font]. 😀 The "cascading list sequencing" trick is something I hadn't encountered before, and it's great. I'm definitely adding that and ColdCoffee's "MAX() OVER" trick to my toolbox.

    I've been playing with both solutions this morning trying to figure out which has the better performance. It actually looks like ColdCoffee's MAX() OVER version is more expensive than using the extra ROW_NUMBER(), at least from an I/O perspective; the execution plan shows a couple of extra Lazy Spools being aggregated and joined back to the final result set (I assume the result of the partitioned MAX() ) and this results in a significant amount of Worktable logical reads. I wonder if there's a way to get rid of that overhead, or if partitioning the MAX() function is inherently more expensive?

    I have no idea at this point which is more effective from a CPU time perspective, though...I can't determine a clear winner with the small amount of test data.

    Anyway, very cool stuff all the way around!

    The BOLD highlight in the above was an incredible and totally unexpected discovery for me when I first started posting. You just can't find a source of more real-world problems to solve and practice on and then learn from others than on these and similar forums.

    And you definitely are made of the right stuff... look at the analysis you did above and realize that it's going to get a lot of people who read this interested in different ways of performance analysis and you'll never know just how much you've actually helped. Perhaps years down the road, you'll run into someone who says "Jon Fox, eh? Man, because of you and an old post about numbering repeating groups of yours and how you did an analysis, I was able to...". Well done, Jon.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • homebrew01

    SSC Guru

    Points: 55163

    Jeff Moden (11/19/2011)


    First, nicely done there Cold Coffee. I came up with an almost identical solution and saw yours when I was getting ready to post it. Your solution uses MAX() OVER (brilliant idea, BTW) which I believe is going to be more effecient than the method I'm using to calculate a pseudo-group number because I use an extra ROW_NUMBER() OVER to do it. That's provided that the RecordSeq column always reflects the correct order by Account and Date. If it doesn't, then there will be a problem. Replace the 8/6 date for the second row of "Jim" with 9/6 and see what I mean.

    Sorry to revive an old thread, but wanted to say "Thanks"

    I was doing something similar, with crazy loop processing running for hours and hours, and your code ran in about 15 seconds !

Viewing 5 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply