Group Islands of Contiguous Dates (SQL Spackle)

  • Comments posted to this topic are about the item Group Islands of Contiguous Dates (SQL Spackle)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry, I'm dumb. But I completely don't understand how the "test data" dates at the start of the article make any sense without the accompanying comments. Would you mind clarifying please? If so, thanks!

    Edit: It's probably not even clear why I'm confused. But I mean, how does anyone know if any of the dates is supposed to be a "start and end date" or just a "start date" or just an "end date" without the comments? I assume I'm missing something obvious here...


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • Relax auto, you're not dumb. If you haven't had a need for it the test data doesn't make sense out of the gate. This comes up a lot, though, for people trying to deal with, say, employee work logging or information from a mechanical switch reporting status, and you want to find 'misses'.

    Basically, you're looking for start/end dates for each contiguous grouping of the dates in the test data.

    I have a feeling Jeff you're going to be fielding a few questions about the subtracting from the date, but a solid spackle walkthrough. Thanks as always!


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • But how can you tell which dates are "start", "end" or "start and end" without any additional information (like the comments)? I don't see any rhyme or reason here, aside from the dates being in chronological order.


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • Nice article as a whole... a bit weak/confusing intro though. But thx to Craigs little comment it became much more clear what/why was being done.

    /T

  • Okay, I think this is starting to make sense now. Contiguous is defined here as "at least one date entry per day during a range". Okay. And now the rest of the article makes sense too. Cool.

    Wow, now that I understand what's going on, this article is freaking awesome. Jeff rocks. (Craig, too.)


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • Alright, let's make the data a little more real world scenario.

    You have a contractor who comes by and does work as needed. You're looking for a per problem average of how long it takes for them to fix whatever issue they were brought in for. You've got a log of said person's hours:

    1/1/2000 8

    1/2/2000 4

    1/4/2000 8

    1/5/2000 8

    1/6/2000 8

    1/9/2000 3

    1/12/2000 8

    1/13/2000 8

    1/14/2000 6

    So, you need to find a way to 'group' this data, to know which sections are continuous. Using the code and teh data above, there are four distinct groups:

    1/1 - 1/2, 1/4 - 1/6, 1/9, and 1/12 - 1/14

    Now that you have them grouped, you can group on the result of the code from the article, and SUM( hours). Now you have a per group sum, which you can then average afterwards.

    This is also useful if you have non-contiguous ranges and you need to find gaps. For example, instead of grouping hours above, let's say my consultant was lax in remembering to file his paperwork. I can tell him he's missing information on 1/3, 1/7-1/8, and 1/10 - 1/12.

    There's a number of applications to this, the above simply being a couple of examples.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Awesome, got it. Thanks! I've taken up a good deal of your time today, Craig!


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • Nice Jeff, very nice indeed.

    Now, all I need is to find somewhere I can use this 🙂

  • Nice article.

    When Jeff said, "even though the DateGroup looks like a date, it really doesn't mean anything," he was fibbing a bit, probably to keep unnecessary detail out of the article.

    To see what the DateGroup means, we can watch what happens to it as we move through the test results from the first set of test data. It starts out as '2009-12-31', and it would keep that value on every row if there were no gaps in the dates. Each time we pass a gap, the number of days in the gap are added to the DateGroup. So the DateGroup means:

    The day prior to the first day in our input data, adjusted by the number of days skipped along the way to our current row.

    Or, put another way:

    The day prior to what the first day in our input data would be, if we shuffled all the dates forward to fill in the gaps up to the current row.

    OK, so Jeff was probably right to gloss over this, but hang on a minute. Now that we know what DateGroup is, we can use it to count the skipped days. It's just the number of days between our earliest date and DateGroup. Here I changed Jeff's query to group by DaysSkipped instead of DateGroup.

    WITH

    cteGroupedDates AS

    (SELECT UniqueDate = SomeDate,

    DaysSkipped = DATEDIFF(dd, MinDate, DATEADD(dd, - ROW_NUMBER() OVER (ORDER BY SomeDate), SomeDate) + 1)

    FROM #MyHead

    CROSS JOIN (SELECT MIN(SomeDate) AS MinDate FROM #MyHead) m

    GROUP BY MinDate, SomeDate

    )

    SELECT StartDate = MIN(UniqueDate),

    EndDate = MAX(UniqueDate),

    Days = DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1,

    DaysSkipped

    FROM cteGroupedDates

    GROUP BY DaysSkipped

    ORDER BY StartDate

    ;

    This gives:

    StartDate EndDate Days DaysSkipped

    ----------------------- ----------------------- ----------- -----------

    2010-01-01 00:00:00.000 2010-01-01 00:00:00.000 1 0

    2010-01-03 00:00:00.000 2010-01-03 00:00:00.000 1 1

    2010-01-05 00:00:00.000 2010-01-06 00:00:00.000 2 2

    2010-01-10 00:00:00.000 2010-01-12 00:00:00.000 3 5

    DaysSkipped gives us a running total of the number of days in the gaps, which might be useful in some circumstances.

  • This was new to me and you did an excellent job of explaining how it works. I was with you every step of the way. 5 stars.

  • As always when I read an article by Jeff two things happen.

    1. I find a very useful tip that I'm sure I'll need within 1 month of reading the article

    2. I am amazed at how Jeff thinks. That's such an elegant solution to that sort of problem.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • dadam (12/7/2010)


    When Jeff said, "even though the DateGroup looks like a date, it really doesn't mean anything," he was fibbing a bit, probably to keep unnecessary detail out of the article.

    Agh... Thanks for the correction... I should have said,

    "...even though the DateGroup looks like a date, it [font="Arial Black"]simply being a date [/font]doesn't really mean anything,"

    You are correct about the significance of the date and, for important a concept as contiguous dates all have the same date in the column, I did gloss over it a bit to keep the "spackle" short and mostly left that realization up to the reader by reading the results and the comments from that one query.

    And the addition of DaysSkipped is a clever bit of code. Thanks for posting it. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Craig Farrell (12/7/2010)


    Alright, let's make the data a little more real world scenario.

    You have a contractor who comes by and does work as needed. You're looking for a per problem average of how long it takes for them to fix whatever issue they were brought in for. You've got a log of said person's hours:...

    Very cool explanation on that post. Thanks a bunch for covering for me, Craig. I was busy at work at just couldn't get the time to respond.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • autoexcrement (12/6/2010)


    But how can you tell which dates are "start", "end" or "start and end" without any additional information (like the comments)? I don't see any rhyme or reason here, aside from the dates being in chronological order.

    I know Craig covered this, but you actually got it right there. The dates are simply in chronolgical order and the StartDate and EndDates identify the "islands" of dates where there are no missing dates.

    I think a lot of folks were looking for a much more complicated answer to this simple problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 62 total)

You must be logged in to reply to this topic. Login to reply