Finding gaps within date ranges

  • Hi, I have a group of date ranges and wanted to identify all of the date gaps within the ranges, outputting the dates as another date range dataset.

    Example dataset SQL below:

    CREATE TABLE #test (daterow int identity, obj_id int, datestart DATETIME, dateend DATETIME)

    INSERT INTO #test

    SELECT 1, '20130428', '20130523'

    UNION

    SELECT 1, '20130526', '20130823'

    UNION

    SELECT 1, '20130728', '20130728'

    UNION

    SELECT 1, '20130826', '20130830'

    UNION

    SELECT 2, '20130501', '20130515'

    UNION

    SELECT 2, '20130525', '20130830'

    select * from #test

    I would expect a dataset to be returned consisting of:

    1, 24/05/2013, 25/05/2013

    1, 24/08/2013, 25/08/2013

    2, 16/05/2013, 24/05/2013

    Does anyone have a good way of doing this? I have found a lot of examples of problems where I have just a single date column, and then I find the gaps in between that, but I'm having difficulty finding examples where it works with start and end date columns...

    Thanks!

  • I'm not sure where your expected data for "2" is coming from ... but you should be able to expand this if necessary to groups

    This is quick and dirty but it shows the logic. A recursive CTE was not used to create the date library due to the possibility of large date spans

    /* Create a Table of all Possible Dates fro mthe First Start to the last end */

    DECLARE @dateToAdd DATETIME

    DECLARE @dateLast DATETIME

    DECLARE @tDates TABLE (dateDate DATETIME)

    SELECT @dateToAdd = MIN(datestart), @dateLast = MAX(dateEnd) FROM #test

    WHILE @dateToAdd <= @dateLast

    BEGIN

    INSERT INTO @tDates SELECT @dateToAdd

    SET @dateToadd = @dateToadd + 1

    END

    /* Find All Dates Not in a range */

    WITH CTEMissingDates AS

    (SELECT

    dateDate

    FROM

    @tDates t1

    WHERE

    NOT EXISTS (SELECT daterow FROM #test WHERE t1.dateDate >= datestart AND t1.dateDate <= dateend)

    ),

    /* The start date of the new range will be any without a date before it */

    CTEMissingRange1 AS

    (

    SELECT

    d1.dateDate

    FROM

    CTEMissingDates d1

    LEFT JOIN CTEMissingDates d2 ON DATEADD(dd,-1,d1.dateDate) = d2.dateDate

    WHERE

    d2.dateDate IS NULL)

    /* considering all of the mussing dates after the range start, the end of the range is the day before the first next found date */

    SELECT

    r1.dateDate dtStart,

    DATEADD(d,-1,MIN(d.dateDate)) dtEnd

    FROM

    CTEMissingRange1 r1

    INNER JOIN @tDates d ON r1.dateDate < d.dateDate

    WHERE

    d.dateDate NOT IN (SELECT dateDate FROM CTEMissingDates)

    GROUP BY

    r1.dateDate

    Hope this helps,

    Russ Baker

  • Sorry, the 2 was to represent that I could potentially have multiple items, and each one would have their own gaps represented, but the updated dataset never left my management studio.

    I'll take a look at the example SQL, thanks.

  • I suggest that you read this article by Jeff Moden

    http://www.sqlservercentral.com/articles/T-SQL/71550/

    For an excellent article including sample data and the code that will do what you seem to asking how to do.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Hi,

    I read that article before, but from my understanding, that's more like the inverse of what I was doing, as I want to find the gaps in between the date ranges, while the article was referring to highlighting the "islands" of dates. Is there an easy way to convert one problem into the other?

  • Start with my code for finding "CTEMissingDates" ... but add a group ID for your groups (1,2, etc)

    At that point you can group them into ranges using the technique from that article.

  • The approach I would use is first to pack overlapping intervals and then find the gaps.

    You can read about both methods in the last book from Itzik Ben-Gan about "High Performance T-SQL using Window Functions".

    http://shop.oreilly.com/product/0790145323088.do

    SET NOCOUNT ON;

    USE tempdb;

    GO

    CREATE TABLE #test (

    daterow int IDENTITY,

    obj_id int,

    datestart datetime,

    dateend datetime

    );

    INSERT INTO #test

    SELECT

    1,

    '20130428',

    '20130523'

    UNION

    SELECT

    1,

    '20130526',

    '20130823'

    UNION

    SELECT

    1,

    '20130728',

    '20130728'

    UNION

    SELECT

    1,

    '20130826',

    '20130830'

    UNION

    SELECT

    2,

    '20130501',

    '20130515'

    UNION

    SELECT

    2,

    '20130525',

    '20130830';

    -- Itzik's solution for packing intervals previous to SS2012

    WITH C1 AS (

    SELECT

    daterow,

    obj_id,

    datestart AS dt,

    1 AS [type],

    NULL AS e,

    ROW_NUMBER() OVER(PARTITION BY obj_id ORDER BY datestart, daterow) AS s

    FROM

    #test

    UNION ALL

    SELECT

    daterow,

    obj_id,

    dateend AS dt,

    -1 AS [type],

    ROW_NUMBER() OVER(PARTITION BY obj_id ORDER BY dateend, daterow) AS e,

    NULL AS s

    FROM

    #test

    )

    , C2 AS (

    SELECT

    *,

    ROW_NUMBER() OVER(PARTITION BY obj_id ORDER BY dt, [type] DESC, daterow) AS se

    FROM

    C1

    )

    , C3 AS (

    SELECT

    *,

    ((ROW_NUMBER() OVER(PARTITION BY obj_id ORDER BY dt) - 1) / 2) + 1 AS grpnum

    FROM

    C2

    WHERE

    COALESCE(s - (se - s) - 1, (se - e) - e) = 0

    )

    SELECT

    obj_id,

    MIN(dt) AS datestart,

    MAX(dt) AS dateend

    INTO

    #T

    FROM

    C3

    GROUP BY

    obj_id,

    grpnum;

    -- Finding gaps

    SELECT

    A.obj_id,

    DATEADD([day], 1, A.dateend) AS dtstart,

    DATEADD([day], -1, B.datestart) AS dtend

    FROM

    #T AS A

    CROSS APPLY

    (

    SELECT TOP (1)

    T.datestart

    FROM

    #T AS T

    WHERE

    T.obj_id = A.obj_id AND T.datestart > A.dateend

    ORDER BY

    T.datestart

    ) AS B

    WHERE

    DATEDIFF([day], A.dateend, B.datestart) > 1

    ORDER BY

    A.obj_id,

    dtstart;

    GO

    DROP TABLE #test, #T;

    GO

    Result:

    /*

    obj_iddtstartdtend

    12013-05-24 00:00:00.0002013-05-25 00:00:00.000

    12013-08-24 00:00:00.0002013-08-25 00:00:00.000

    22013-05-16 00:00:00.0002013-05-24 00:00:00.000

    */

  • Sorry for the formatting but I have no idea how to post T-SQL code.

  • Like hunchback, I'll start with Mr. Ben-Gan's interval packing approach but I believe the last part diverges from what he proposes:

    WITH C1 AS (

    SELECT obj_id, ts, Type

    ,e=CASE Type WHEN 1 THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY obj_id, Type ORDER BY dateend) END

    ,s=CASE Type WHEN -1 THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY obj_id, Type ORDER BY datestart) END

    FROM #test

    CROSS APPLY (

    VALUES (1, datestart), (-1, dateend)) a(Type, ts)

    ),

    C2 AS (

    SELECT C1.*

    ,se=ROW_NUMBER() OVER (PARTITION BY obj_id ORDER BY ts, Type DESC)

    FROM C1),

    C3 AS (

    SELECT obj_id, ts

    ,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY obj_id ORDER BY ts)-1) / 2 + 1)

    FROM C2

    WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0),

    -- C1, C2, C3, C4 combined remove the overlapping date periods

    C4 AS (

    SELECT obj_id, datestart=MIN(ts), dateend=MAX(ts)

    FROM C3

    GROUP BY obj_id, grpnm)

    SELECT obj_id, datestart=MIN(newdate), dateend=MAX(newdate)

    FROM (

    SELECT obj_id, newdate

    ,rn=ROW_NUMBER() OVER (PARTITION BY obj_id ORDER BY newdate) / 2

    FROM C4 a

    CROSS APPLY (

    VALUES (datestart-1),(dateend+1)) b(newdate)

    ) a

    GROUP BY obj_id, rn

    HAVING COUNT(*) = 2

    ORDER BY obj_id, datestart;

    1. The first process (combination of CTEs C1-C4) is a way of removing the overlapping date intervals, which I first saw here: Interval Packing by Itzik Ben-Gan. You end up with "islands" of date ranges where the overlaps have been removed.

    2. The final step is what I call the CROSS APPLY VALUES approach to Gaps from Islands, and that is described here: The SQL of Gaps and Islands in Sequences[/url]. While this article applies the approach to sequence numbers, it is equally applicable to DATETIME data. This differs a bit from what hunchback proposes (eliminates the temp table) but also works based on one pass through the C4 table.

    Should be pretty fast performing too.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • bitbucket-25253 (9/13/2013)


    I suggest that you read this article by Jeff Moden

    http://www.sqlservercentral.com/articles/T-SQL/71550/

    For an excellent article including sample data and the code that will do what you seem to asking how to do.

    Ron - That was my initial thought also, however Jeff's technique (also described in SQL MVP Deep Dives) doesn't work when there are date duplicates and is predicated on there being a single date column. The overlapping aspect of the intervals nullifies this approach, so it could not be used to generate the islands from which I calculated the gaps.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • hunchback (9/14/2013)


    Sorry for the formatting but I have no idea how to post T-SQL code.

    When you reply to the post, look on the left at the IFCode Shortcuts. Use the one that says code="sql"

    My edits above were to acknowledge the fact the we both used Mr. Ben-Gan's approach to packing intervals, which I didn't notice initially in the code you posted.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks, Dwain!

    --

    AMB

  • To: dwain.c

    For this data:

    INSERT INTO #test

    SELECT

    1,

    '20130428',

    '20130523'

    UNION

    SELECT

    1,

    '20130526',

    '20130823'

    UNION

    SELECT

    1,

    '20130728',

    '20130728'

    UNION

    SELECT

    1,

    '20130826',

    '20130830'

    UNION

    SELECT

    2,

    '20130501',

    '20130515'

    UNION

    SELECT

    2,

    '20130516',

    '20130520'

    UNION

    SELECT

    2,

    '20130525',

    '20130830';

    i got wrong result:

    12013-05-24 00:00:00.0002013-05-25 00:00:00.000

    12013-08-24 00:00:00.0002013-08-25 00:00:00.000

    22013-05-15 00:00:00.0002013-05-16 00:00:00.000

    22013-05-21 00:00:00.0002013-05-24 00:00:00.000

    there should not be gap between 2013-05-15 and 2013-05-16

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply