Please help - Date Overlapping

  • Hi All,

    I need some help in finding the overlaps in the date range using sql. below is the sample data.

    ID StartDate EndDate

    id1 20110101 20130930

    id1 20131001 20141231

    id1 20140101

    id2 20110101 20130930

    id2 20131001

    id3 20110101 20130930

    id3 20131001 20131231

    id3 20140101

    id4 20110101 20130930

    id4 20131001 20131231

    id5 20131001 20141231

    id5 20140101

    The result would returns the following overlapping rows:

    id1 20131001 20141231

    id1 20140101

    id5 20131001 20141231

    id5 20140101

    Thanking in advance.

  • Looks like this works, but be sure to test especially for performance. Please not how I provided the test data in an easily consumable format so others can just grab it and test:

    DECLARE @dates TABLE

    (

    id CHAR(3),

    StartDate DATE,

    EndDate DATE

    );

    INSERT INTO @dates

    (id, StartDate, EndDate)

    VALUES

    ('id1', '20110101', '20130930'),

    ('id1', '20131001', '20141231'),

    ('id1', '20140101', NULL),

    ('id2', '20110101', '20130930'),

    ('id2', '20131001', NULL),

    ('id3', '20110101', '20130930'),

    ('id3', '20131001', '20131231'),

    ('id3', '20140101', NULL),

    ('id4', '20110101', '20130930'),

    ('id4', '20131001', '20131231'),

    ('id5', '20131001', '20141231'),

    ('id5', '20140101', NULL);

    WITH dates

    AS (

    SELECT

    *,

    ROW_NUMBER() OVER (PARTITION BY id ORDER BY D.StartDate) AS rowNo

    FROM

    @dates AS D

    )

    SELECT

    *

    FROM

    dates AS D

    JOIN dates AS D2

    ON D.id = D2.id AND

    /* don't match self */

    D.rowNo <> D2.rowNo AND

    (

    /* later one over laps earlier one */

    D2.StartDate BETWEEN D.StartDate AND D.EndDate OR

    /* earlier one overlaps later one */

    (

    D.StartDate BETWEEN D2.StartDate AND ISNULL(D2.EndDate, '20990101') OR

    D.EndDate BETWEEN D2.StartDate AND ISNULL(D2.EndDate, '20990101')

    )

    );

  • I like to Thank You Mr. Corbett very much for taking the time to read and answer my question. Your help is most appreciated. Your query is too advance for a newbie like me to fully understanding it. I am going to read and study more to fully understand your query.

    Thanks again,

    Tom.

  • Quick solution for educational and entertainment purposes (read "spanner in the works") using slight modification of Jack Corbett's data preparation.

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.TBL_DATES') IS NOT NULL DROP TABLE dbo.TBL_DATES;

    CREATE TABLE dbo.TBL_DATES

    (

    DATE_ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED

    ,id CHAR(3) NOT NULL

    ,StartDate DATE NOT NULL

    ,EndDate DATE

    );

    INSERT INTO dbo.TBL_DATES

    (id, StartDate, EndDate)

    VALUES

    ('id1', '20110101', '20130930'),

    ('id1', '20131001', '20141231'),

    ('id1', '20140101', NULL),

    ('id2', '20110101', '20130930'),

    ('id2', '20131001', NULL),

    ('id3', '20110101', '20130930'),

    ('id3', '20131001', '20131231'),

    ('id3', '20140101', NULL),

    ('id4', '20110101', '20130930'),

    ('id4', '20131001', '20131231'),

    ('id5', '20131001', '20141231'),

    ('id5', '20140101', NULL);

    ;WITH BASE_DATA AS

    (

    SELECT

    ROW_NUMBER() OVER

    (

    PARTITION BY D.id

    ORDER BY ISNULL(D.EndDate,D.StartDate)

    ) AS END_RID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY D.id

    ORDER BY D.StartDate

    ) AS START_RID

    ,D.id

    ,D.StartDate

    ,D.EndDate

    FROM dbo.TBL_DATES D

    )

    ,GROUPED_DATA AS

    (

    SELECT

    BD.id

    ,BD.START_RID + BD.END_RID AS GR_ID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY BD.id

    ORDER BY BD.START_RID

    ) AS OL_RID

    ,BD.StartDate

    ,BD.EndDate

    FROM BASE_DATA BD

    WHERE BD.START_RID <> BD.END_RID

    )

    SELECT

    GD.id

    ,GD.GR_ID

    ,GD.OL_RID

    ,GD.StartDate

    ,GD.EndDate

    FROM GROUPED_DATA GD;

    Results

    id GR_ID OL_RID StartDate EndDate

    ---- ------ ------- ---------- ----------

    id1 5 1 2013-10-01 2014-12-31

    id1 5 2 2014-01-01 NULL

    id5 3 1 2013-10-01 2014-12-31

    id5 3 2 2014-01-01 NULL

  • Eirikur Eiriksson (11/12/2014)


    Quick solution for educational and entertainment purposes (read "spanner in the works") using slight modification of Jack Corbett's data preparation.

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.TBL_DATES') IS NOT NULL DROP TABLE dbo.TBL_DATES;

    CREATE TABLE dbo.TBL_DATES

    (

    DATE_ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED

    ,id CHAR(3) NOT NULL

    ,StartDate DATE NOT NULL

    ,EndDate DATE

    );

    INSERT INTO dbo.TBL_DATES

    (id, StartDate, EndDate)

    VALUES

    ('id1', '20110101', '20130930'),

    ('id1', '20131001', '20141231'),

    ('id1', '20140101', NULL),

    ('id2', '20110101', '20130930'),

    ('id2', '20131001', NULL),

    ('id3', '20110101', '20130930'),

    ('id3', '20131001', '20131231'),

    ('id3', '20140101', NULL),

    ('id4', '20110101', '20130930'),

    ('id4', '20131001', '20131231'),

    ('id5', '20131001', '20141231'),

    ('id5', '20140101', NULL);

    ;WITH BASE_DATA AS

    (

    SELECT

    ROW_NUMBER() OVER

    (

    PARTITION BY D.id

    ORDER BY ISNULL(D.EndDate,D.StartDate)

    ) AS END_RID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY D.id

    ORDER BY D.StartDate

    ) AS START_RID

    ,D.id

    ,D.StartDate

    ,D.EndDate

    FROM dbo.TBL_DATES D

    )

    ,GROUPED_DATA AS

    (

    SELECT

    BD.id

    ,BD.START_RID + BD.END_RID AS GR_ID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY BD.id

    ORDER BY BD.START_RID

    ) AS OL_RID

    ,BD.StartDate

    ,BD.EndDate

    FROM BASE_DATA BD

    WHERE BD.START_RID <> BD.END_RID

    )

    SELECT

    GD.id

    ,GD.GR_ID

    ,GD.OL_RID

    ,GD.StartDate

    ,GD.EndDate

    FROM GROUPED_DATA GD;

    Results

    id GR_ID OL_RID StartDate EndDate

    ---- ------ ------- ---------- ----------

    id1 5 1 2013-10-01 2014-12-31

    id1 5 2 2014-01-01 NULL

    id5 3 1 2013-10-01 2014-12-31

    id5 3 2 2014-01-01 NULL

    Defintely a better solution than mine. Thanks Erikur.

  • Hi,

    Both of you seem to forget that this is the 2K5 group (SQL Server 2005), which means that fields of type DATE isn't supported (we have to use DATETIME fields). Also the use of VALUES as a means to insert data isn't supported. Not a big issue, but "newbies" may stumble at that point already.

    I don't agree that Eirikur's solution is the better one. It only works with the example data supplied by the OP. As soon as one starts to tweek the data a little his solution falls apart.

    I tried both your solutions with a slightly altered input dataset:

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.TBL_DATES') IS NOT NULL DROP TABLE dbo.TBL_DATES;

    CREATE TABLE dbo.TBL_DATES

    (

    DATE_ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED

    ,id CHAR(3) NOT NULL

    ,StartDate DATETIME NOT NULL

    ,EndDate DATETIME

    );

    INSERT INTO dbo.TBL_DATES

    (id, StartDate, EndDate)

    SELECT

    'id1', '20110101', NULL UNION ALL SELECT -- changed enddate to Null, previously '20130930'

    'id1', '20131001', '20141231' UNION ALL SELECT

    'id1', '20140101', NULL UNION ALL SELECT

    'id2', '20110101', '20130930' UNION ALL SELECT

    'id2', '20131001', NULL UNION ALL SELECT

    'id3', '20110101', '20130930' UNION ALL SELECT

    'id3', '20110301', '20131130' UNION ALL SELECT -- Added row

    'id3', '20131001', '20131231' UNION ALL SELECT

    'id3', '20140101', NULL UNION ALL SELECT

    'id4', '20110101', '20130930' UNION ALL SELECT

    'id4', '20131001', '20131231' UNION ALL SELECT

    'id5', '20131001', '20141231' UNION ALL SELECT

    'id5', '20140101', NULL;

    I then ran Eirikur's SQL code, which gave me this result dataset:

    [font="Courier New"]

    id GR_ID OL_RID StartDate EndDate

    ---- -------------------- -------------------- ----------------------- -----------------------

    id1 5 1 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000

    id1 5 2 2014-01-01 00:00:00.000 NULL

    id5 3 1 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000

    id5 3 2 2014-01-01 00:00:00.000 NULL

    [/font]

    This is obviously wrong! The changes in the input data are not reflected in the result set. None of the newly introduced overlaps have been caught.

    I then ran Jack's code and got this resultset:

    [font="Courier New"]

    DATE_ID id StartDate EndDate rowNo DATE_ID id StartDate EndDate rowNo

    ----------- ---- ----------------------- ----------------------- -------------------- ----------- ---- ----------------------- ----------------------- --------------------

    2 id1 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000 2 1 id1 2011-01-01 00:00:00.000 NULL 1

    3 id1 2014-01-01 00:00:00.000 NULL 3 1 id1 2011-01-01 00:00:00.000 NULL 1

    3 id1 2014-01-01 00:00:00.000 NULL 3 2 id1 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000 2

    2 id1 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000 2 3 id1 2014-01-01 00:00:00.000 NULL 3

    7 id3 2011-03-01 00:00:00.000 2013-11-30 00:00:00.000 2 6 id3 2011-01-01 00:00:00.000 2013-09-30 00:00:00.000 1

    6 id3 2011-01-01 00:00:00.000 2013-09-30 00:00:00.000 1 7 id3 2011-03-01 00:00:00.000 2013-11-30 00:00:00.000 2

    8 id3 2013-10-01 00:00:00.000 2013-12-31 00:00:00.000 3 7 id3 2011-03-01 00:00:00.000 2013-11-30 00:00:00.000 2

    7 id3 2011-03-01 00:00:00.000 2013-11-30 00:00:00.000 2 8 id3 2013-10-01 00:00:00.000 2013-12-31 00:00:00.000 3

    13 id5 2014-01-01 00:00:00.000 NULL 2 12 id5 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000 1

    12 id5 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000 1 13 id5 2014-01-01 00:00:00.000 NULL 2

    [/font]

    Now this is better! All the overlapping date combinations have been caught.

    But the format of the output data isn't all that logical if one only wants to know what rows/dates overlap. To show only input rows that are involved in overlapping date intervals, one could employ a union as a last step:

    WITH dates

    AS (

    SELECT

    *,

    ROW_NUMBER() OVER (PARTITION BY id ORDER BY D.StartDate) AS rowNo

    FROM

    dbo.TBL_DATES AS D

    ),

    overlaps

    AS (

    SELECT

    d.DATE_ID, d.id, d.StartDate, d.EndDate, d.rowNo,

    d2.DATE_ID DATE_ID2, d2.id id2, d2.StartDate StartDate2, d2.EndDate EndDate2, d2.rowNo rowNo2

    FROM

    dates AS D

    JOIN dates AS D2

    ON D.id = D2.id AND

    /* don't match self */

    D.rowNo <> D2.rowNo AND

    (

    /* later one over laps earlier one */

    D2.StartDate BETWEEN D.StartDate AND D.EndDate OR

    /* earlier one overlaps later one */

    (

    D.StartDate BETWEEN D2.StartDate AND ISNULL(D2.EndDate, '20990101') OR

    D.EndDate BETWEEN D2.StartDate AND ISNULL(D2.EndDate, '20990101')

    )

    )

    )

    SELECT DISTINCT id, StartDate, EndDate

    from overlaps

    UNION

    SELECT DISTINCT id2, StartDate2, EndDate2

    FROM overlaps

    ORDER BY id, StartDate, EndDate;

    ... which gives us:

    [font="Courier New"]

    id StartDate EndDate

    ---- ----------------------- -----------------------

    id1 2011-01-01 00:00:00.000 NULL

    id1 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000

    id1 2014-01-01 00:00:00.000 NULL

    id3 2011-01-01 00:00:00.000 2013-09-30 00:00:00.000

    id3 2011-03-01 00:00:00.000 2013-11-30 00:00:00.000

    id3 2013-10-01 00:00:00.000 2013-12-31 00:00:00.000

    id5 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000

    id5 2014-01-01 00:00:00.000 NULL

    [/font]

    That about covers it, i think... 🙂

  • Personally I wouldn't go with a home-grown solution for this "packing intervals" problem.

    Instead I'd seek out the literature and find that SQL MVP Itzik Ben-Gan has proposed a very fast and elegant solution: Packing Intervals[/url]

    For some reason however, that original link is not currently directing to his article (maybe it's gone into archives), but I used his technique (with attribution of course) in this article: Calculating Gaps Between Overlapping Time Intervals in SQL[/url] (see the section on "Packing the Overlapping Login Intervals" sample query #5).

    A quick review of that suggests it should server you well in SQL 2005.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply