Group Islands of Contiguous Dates (SQL Spackle)

  • tommyh (12/6/2010)


    Nice article as a whole... a bit weak/confusing intro though. But thx to Craigs little comment it became much more clear what/why was being done.

    /T

    The folks that know me know where the detail of my articles usually is... in the comments in the code. I should probably get into the habit of stating that, though. :pinch: Thanks for the constructive feedback, Tommy. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • autoexcrement (12/6/2010)


    Okay, I think this is starting to make sense now. Contiguous is defined here as "at least one date entry per day during a range". Okay. And now the rest of the article makes sense too. Cool.

    Wow, now that I understand what's going on, this article is freaking awesome. Jeff rocks. (Craig, too.)

    Now there's a huge compliment. Nah... I don't mean the "Jeff rocks" thing. I mean the huge lightbulb that just turned on. Thanks for the awesome feedback.

    Craig... it was your code that clarified it all here. Thanks again for the wonderful "cover" while I was busy.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • nigel. (12/7/2010)


    Nice Jeff, very nice indeed.

    Now, all I need is to find somewhere I can use this 🙂

    Thanks for the feed. With a bit of extrapolation, you can also do a similar thing to find "islands" of just about any numeric sequence. Think the inverse of "find missing identies".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • JJ B (12/7/2010)


    This was new to me and you did an excellent job of explaining how it works. I was with you every step of the way. 5 stars.

    Thanks for the great feedback, JJ. Good to "see" you again. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mtassin (12/7/2010)


    As always when I read an article by Jeff two things happen.

    1. I find a very useful tip that I'm sure I'll need within 1 month of reading the article

    2. I am amazed at how Jeff thinks. That's such an elegant solution to that sort of problem.

    You're too kind, Mark. :blush: Thanks for the awesome compliment. Like I said at the end of the article, it appears that someone else thought of the same thing and published it in a book so I'm certainly not alone in that way of thinking.

    What most people don't understand is that I learned a huge amount of what I know and the way I think from this very site and I'm just trying to return the favor.

    Glad to see that "no RBAR" flag is still flying high. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/7/2010)


    Craig... it was your code that clarified it all here. Thanks again for the wonderful "cover" while I was busy.

    It was my pleasure. No code though, just a different phrasing. After the whole End Of Week calculation that I think 10 people had to chime in on just to get a different phrasing that finally made sense to me and make it all click, the value of a different voice with a slightly different viewpoint to explain the same thing has become embedded in me again. 🙂

    P.S. What's this stuff about you needing to work and not being on SSC 24/7?! BAH! Free internet forums pay the bills!

    Don't they?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • yet another GREAT spackle article. (Polyfilla for the people on the other side of the pond) 😛

    thanks Jeff.

  • For the original question, how about this:

    [font="Courier New"]

    select a.SomeDate,

    (select max(b.SomeDate)

    from #MyHead b

    -- where difference of dates equals

    -- count of dates in between

    where datediff(d,a.SomeDate,b.SomeDate) =

    (select count(distinct c.SomeDate)

    from #MyHead c

    where c.SomeDate > a.SomeDate

    and c.SomeDate <= b.SomeDate)) EndDate

    from #MyHead a

    [/font]

  • RStarr, while your code seems functional (I didn't extensively test it), take a look at the million row test.

    If you look near the bottom of the article, Jeff gives a way to build out a very large sample dataset for testing. Try your code out vs. his. You'll (most likely) notice a large discrepency. This will be primarily because of the necessary data manipulation you'll be performing. I think that will actually run the aggregation structure *per row*.

    This is as much about optimization as anything else. A while/cursor loop doing a single pass on the data might outperform that.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (12/7/2010)


    RStarr, while your code seems functional (I didn't extensively test it), take a look at the million row test.

    If you look near the bottom of the article, Jeff gives a way to build out a very large sample dataset for testing. Try your code out vs. his. You'll (most likely) notice a large discrepency. This will be primarily because of the necessary data manipulation you'll be performing. I think that will actually run the aggregation structure *per row*.

    This is as much about optimization as anything else. A while/cursor loop doing a single pass on the data might outperform that.

    Just a follow up... The code from the article is nearly instantaneous even on the first run on 100,000 rows without an index. I stopped the other code after 6-1/2 minutes. Looking at the actual execution plan on a 100 row table, it had to "touch" more than 164,000 internal rows because, as you said, the aggregations are fully executed on a "per row" basis.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Geoff A (12/7/2010)


    yet another GREAT spackle article. (Polyfilla for the people on the other side of the pond) 😛

    thanks Jeff.

    Thanks for the feedback and for stopping by, Geoff. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Great post Jeff!

    I had a thought... this could be extended to account for Times in addition to dates... just move the calculation to the lowest significant field (from day to second). I had to code an algorithm in VB to find the "islands" and the "water" for a scheduling application, this may well drop the need for that...

    Again, Great Job!

  • venoym (12/8/2010)


    Great post Jeff!

    I had a thought... this could be extended to account for Times in addition to dates... just move the calculation to the lowest significant field (from day to second). I had to code an algorithm in VB to find the "islands" and the "water" for a scheduling application, this may well drop the need for that...

    Again, Great Job!

    You might, but my recommendation would be to add a calculated column that is a datediff(ss, <datefield>, Min(date)), and use this technique off the resultant BIGINT.

    Don't do this from date 0 or something, you want to make it tight as possible. There's gonna be a LOT of seconds over the course of a year or three. 🙂


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks Craig, Jeff for pointing out the inefficiency. I didn't realize that using the aggregation like that would be so slow.

    On the other hand the original question was: "How can I determine overlapping dates?" That is not a totally trivial problem. My purpose was to suggest a solution that was relatively simple, but yet got the job done.

    I ran a test on 500 dates over the span of four years. It returned in 6 seconds. In the environment I come from that would be sufficient for any practical problems we might come up with.

    But I enjoyed reading through the other solutions as well.

  • rstarr-916208 (12/8/2010)


    Thanks Craig, Jeff for pointing out the inefficiency. I didn't realize that using the aggregation like that would be so slow.

    On the other hand the original question was: "How can I determine overlapping dates?" That is not a totally trivial problem. My purpose was to suggest a solution that was relatively simple, but yet got the job done.

    I ran a test on 500 dates over the span of four years. It returned in 6 seconds. In the environment I come from that would be sufficient for any practical problems we might come up with.

    But I enjoyed reading through the other solutions as well.

    Not sure where the original question of overlapping dates is from, I must have missed it even in the re-read. However, easiest way to find that is to do a DENSE_RANK on your datefield, then do an aggregate COUNT(*) by the rank. Anything with HAVING COUNT(*) > 1 will be a duplicate.

    EDIT: You know, I'm apparently not on the ball today, there's no reason to even use the dense_rank to find your duplicates. Just group by your date/other qualifiers and look for a having count(*) > 1.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 15 posts - 16 through 30 (of 62 total)

You must be logged in to reply to this topic. Login to reply