How can I identify gaps in coverage dates

  • I need to look through 200,000 IDs with multiple dates and identify just the ID’s that have a gap. My first step was going to be to pull all the dates together to see gaps, but all I did was get the Min and Max and ignored the gaps.  How can I identify something that isn’t there And then somehow show ID and missing date.

    I need to look through 200,000 IDs with multiple dates and identify just the ID’s that have a gap. My first step was going to be to pull all the dates together to see gaps, but all I did was get the Min and Max and ignored the gaps.  How can I identify something that isn’t there And then somehow show ID and missing date.


    QUERY
    CREATE TABLE #1A
    (id VARCHAR(100),
    effdate date,
    termdate date)

    INSERT INTO #1a
    (ID, EffDate,Termdate)
    VALUES
    ('0123456789','2016-01-01','2016-01-31'),
    ('0123456789','2016-02-01','2016-04-30'),
    ('0123456789','2016-05-01','2016-12-31'),
    ('0123456789','2017-01-01','2017-04-30'),
    ('0123456789','2017-06-01','2017-10-31'),--Missing 2017-05-01-2017-05-31
    ('1111111111','2016-01-01','2016-01-31'),
    ('1111111111','2016-02-01','2017-05-31'),
    ('1111111111','2017-06-01','2017-12-31'),-- no gaps
    ('2222222222','2016-01-01','2016-01-31'),
    ('2222222222','2016-03-01','2017-05-31'),--missing 2016-02-01-2016-02-28
    ('2222222222','2017-06-01','2017-12-31'),
    ('3333333333','2016-01-01','2016-01-31'),
    ('3333333333','2016-02-01','2016-04-30'),
    ('3333333333','2016-05-01','2016-12-31'),
    ('3333333333','2017-01-01','2017-04-30'),
    ('3333333333','2017-06-01','2017-10-31'),-- missing 2017-05-01
    ('4444444444','2016-01-01','2016-01-31'),
    ('4444444444','2016-02-01','2017-05-31'),
    ('4444444444','2017-06-01','2017-12-31'),
    ('4444444444','2018-01-01','2018-12-31'), -- no gaps
    ('5555555555','2016-01-01','2016-01-31'),
    ('5555555555','2016-01-01','2016-02-28'),
    ('5555555555','2016-01-01','2016-03-31'),--Mutilple overlapping
    ('5555555555','2016-02-01','2017-05-31'),
    ('5555555555','2017-06-01','2017-12-31'),
    ('5555555555','2018-01-01','2018-11-30'),
    ('5555555555','2018-12-01','2018-12-31'),
    ('5555555555','2019-01-01','2019-01-31'),--missing 2019-02-01
    ('5555555555','2019-03-01','2019-04-30'),
    ('5555555555','2019-05-01','2020-01-31')

    Select *
    into #1b
    from #1a
    as combine
    where effdate < Termdate
    order by Effdate;

    with combine as
    (
    select id, effdate, Termdate, -- find the gap and flag it
    case when lag(Termdate)
    over (partition by ID
    order by Termdate) = dateadd(day,-1, effdate) --put dates without gaps together
    then 0
    else 1
    end as flag
    from #1A
    )
    , groups2 as
    (
    Select id, effdate, Termdate,
    -- cumulative sum over 0/1 to assign the same group number for row without gaps
    sum(flag)
    over (partition by ID
    order by effdate) as grp
    from combine
    )
    select id, min(Effdate) as MinEfdate, max(termdate) as MaxTermdate
    Into #Combine2
    from groups2
    group by ID
    order by ID, MinEfdate;

    Select * from #Combine2
    order by ID, MinEfdate;
    I was expecting:
    idMinEfdateMaxTermdate
    01234567892016-01-012017-04-30
    01234567892017-06-012017-10-31

    But got:
    idMinEfdateMaxTermdate
    01234567892016-01-012017-10-31
    11111111112016-01-012017-12-31
    22222222222016-01-012017-12-31
    33333333332016-01-012017-10-31
    44444444442016-01-012018-12-31
    55555555552016-01-012020-01-31
  • Difficult to read when everything is in a code block, however, if you need every date group, look at a calendar table, or similar. Otherwise you can use a Tally Table to generate the dates on the fly.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • I wouldn't use a tally table for this problem.  It's a variation on the interval packing problem.  I believe that this works for the data given, but it might not work for all combinations with overlapping intervals.  You can always use the standard interval packing to ensure that there are no overlapping intervals.

     

    WITH ranges AS
    (
    SELECT *, LAG(termdate, 1, DATEADD(DAY, -1, effdate)) OVER(PARTITION BY ID ORDER BY effdate, termdate) AS prevtermdate
    FROM #1A
    )
    SELECT id, prevtermdate as gapbegin, effdate AS gapend
    FROM ranges
    WHERE effdate > DATEADD(DAY, 1, prevtermdate)

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply