Stuck on "continuous date spans"

  • I am stuck on this one for a set based solution, and I think I have it almost working, something is escaping me.

    I have a table with a group id, and a series of start and end dates, each span of dates being it's own record. The dates are not always consecutive, and there can be gaps of any length between one end date and another start date. There can not be overlaping date spans within a Group Id.

    For the requirements, I need to arrive at one record for each group, with the latest end date the group has, and the earliest continously covered start date. A difference between one start date and the next end date of one day is considered continuous, while any other difference is considered a break.

    So quick example, for below

    Group_IDStart_DateEnd_Date

    AAA2005010120050930

    AAA2005100199991231

    BBB2005010120061231

    BBB2007010120070630

    BBB2008010120091231

    I need the results to be

    Group_IDStart_DateEnd_Date

    AAA2005010199991231

    BBB2008010120091231

    Attached are table defs, sample data, desired result data, and what I have reached so far. I can get a compare between first and second records for a given group, yet I have not managed to imagine how to then take that result forward some unknown number of times.

    Any help would be greatly appreciated. :w00t:

  • Look not for continuous spans, looks for breaks longer than 1 day.

    The end of the latest of such breaks will be the beginning of the latest continuous span.

    _____________
    Code for TallyGenerator

  • Hi,

    Take a look at my blog entry here.

    If you are on 7 or 2000 , you will have to use an identity column but the theory is the same.



    Clear Sky SQL
    My Blog[/url]

  • Thanks to both of you. I should get sometime to work on this again today, when I get this I am going to have to remember to write it down somewhere, strikes me as one of those things which will show up again someday.

  • Dave Ballantyne (12/30/2009)


    Hi,

    Take a look at my blog entry here.

    If you are on 7 or 2000 , you will have to use an identity column but the theory is the same.

    Nice Dave. Thanks for that.

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • David Lester (12/29/2009)


    I am stuck on this one for a set based solution, and I think I have it almost working, something is escaping me.

    I have a table with a group id, and a series of start and end dates, each span of dates being it's own record. The dates are not always consecutive, and there can be gaps of any length between one end date and another start date. There can not be overlaping date spans within a Group Id.

    For the requirements, I need to arrive at one record for each group, with the latest end date the group has, and the earliest continously covered start date. A difference between one start date and the next end date of one day is considered continuous, while any other difference is considered a break.

    So quick example, for below

    Group_IDStart_DateEnd_Date

    AAA2005010120050930

    AAA2005100199991231

    BBB2005010120061231

    BBB2007010120070630

    BBB2008010120091231

    I need the results to be

    Group_IDStart_DateEnd_Date

    AAA2005010199991231

    BBB2008010120091231

    Attached are table defs, sample data, desired result data, and what I have reached so far. I can get a compare between first and second records for a given group, yet I have not managed to imagine how to then take that result forward some unknown number of times.

    Any help would be greatly appreciated. :w00t:

    I agree... "any help" would be greatly appreciated. For better, quicker answers, please read and heed the information located at the article in the first link in my signature below. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff, I see I forgot to mention in my post the attached file has the table defs, and data create statements etc.

    Today I will be working on this, and I believe I have the right direction after reading the above blog entry. Now just to make sure I get it working the right way.

  • Try this, not particularly efficient though.

    WITH Start_Dates AS(

    SELECT s1.Group_ID,s1.Start_Date

    FROM Source_Data s1

    WHERE NOT EXISTS(SELECT * FROM Source_Data s2

    WHERE s1.Group_ID=s2.Group_ID

    AND s1.Start_Date > s2.Start_Date AND CAST(s1.Start_Date AS DATETIME) <= CAST(s2.End_Date AS DATETIME)+1)

    ),

    End_Dates AS (

    SELECT t1.Group_ID,t1.End_Date

    FROM Source_Data t1

    WHERE NOT EXISTS(SELECT * FROM Source_Data t2

    WHERE t1.Group_ID=t2.Group_ID

    AND CAST(t1.End_Date AS DATETIME) >= CAST(t2.Start_Date AS DATETIME)-1 AND t1.End_Date < t2.End_Date)

    )

    SELECT s1.Group_ID,s1.Start_Date,MIN(t1.End_Date) AS End_Date

    FROM Start_Dates s1

    INNER JOIN End_Dates t1 ON t1.Group_ID=s1.Group_ID

    AND s1.Start_Date<t1.End_Date

    GROUP BY s1.Group_ID,s1.Start_Date

    ORDER BY s1.Group_ID,s1.Start_Date;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark-101232 (1/4/2010)


    Try this, not particularly efficient though.

    WITH Start_Dates AS(

    SELECT s1.Group_ID,s1.Start_Date

    FROM Source_Data s1

    WHERE NOT EXISTS(SELECT * FROM Source_Data s2

    WHERE s1.Group_ID=s2.Group_ID

    AND s1.Start_Date > s2.Start_Date AND CAST(s1.Start_Date AS DATETIME) <= CAST(s2.End_Date AS DATETIME)+1)

    ),

    End_Dates AS (

    SELECT t1.Group_ID,t1.End_Date

    FROM Source_Data t1

    WHERE NOT EXISTS(SELECT * FROM Source_Data t2

    WHERE t1.Group_ID=t2.Group_ID

    AND CAST(t1.End_Date AS DATETIME) >= CAST(t2.Start_Date AS DATETIME)-1 AND t1.End_Date < t2.End_Date)

    )

    SELECT s1.Group_ID,s1.Start_Date,MIN(t1.End_Date) AS End_Date

    FROM Start_Dates s1

    INNER JOIN End_Dates t1 ON t1.Group_ID=s1.Group_ID

    AND s1.Start_Date<t1.End_Date

    GROUP BY s1.Group_ID,s1.Start_Date

    ORDER BY s1.Group_ID,s1.Start_Date;

    Oops, should be for SQL Server 2000

    SELECT s1.Group_ID,s1.Start_Date,MIN(t1.End_Date) AS End_Date

    FROM (

    SELECT s1.Group_ID,s1.Start_Date

    FROM Source_Data s1

    WHERE NOT EXISTS(SELECT * FROM Source_Data s2

    WHERE s1.Group_ID=s2.Group_ID

    AND s1.Start_Date > s2.Start_Date AND CAST(s1.Start_Date AS DATETIME) <= CAST(s2.End_Date AS DATETIME)+1)

    ) s1

    INNER JOIN (

    SELECT t1.Group_ID,t1.End_Date

    FROM Source_Data t1

    WHERE NOT EXISTS(SELECT * FROM Source_Data t2

    WHERE t1.Group_ID=t2.Group_ID

    AND CAST(t1.End_Date AS DATETIME) >= CAST(t2.Start_Date AS DATETIME)-1 AND t1.End_Date < t2.End_Date)

    ) t1 ON t1.Group_ID=s1.Group_ID

    AND s1.Start_Date<t1.End_Date

    GROUP BY s1.Group_ID,s1.Start_Date

    ORDER BY s1.Group_ID,s1.Start_Date

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Thank you Mark, this works well. It is what I was initially attempting to get to, I was not getting the sub-queries correctly.

    I will have to file this one away as a reminder.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply