Help in Date spans for continuoes and regular dates spans

  • Hello all,

    I would like to take input scripts and generate output data as given below. output is expected as if member is continuously enrolled without any gap in enrollment get min and max and if there is enrollment gap leave as is. for any given member there are no multiple spans just leave as is. for id 333 there are no multiple spans just leave as is and for id 222 bring one final line as given in the output data. for id 111 merge one span but not other span because one span is continues where as other spans is not continous , please refer output table for reference in the below scripts.

    Any help is appreciated.

    create table #test1

    (id int,startdate datetime ,enddate datetime)

    insert into #test1

    select 111,'2016-12-01 00:00:00','2020-03-31 00:00:00'

    insert into #test1

    select 111,'2020-05-01 00:00:00','2020-05-31 00:00:00'

    insert into #test1

    select 111,'2020-06-01 00:00:00','2078-12-31 00:00:00'

    insert into #test1

    select 222,'2020-01-01 00:00:00','2020-02-29 00:00:00'

    insert into #test1

    select 222,'2020-03-01 00:00:00','2078-12-31 00:00:00'

    insert into #test1

    select 333,'2015-01-01 00:00:00', '2078-12-31 00:00:00'

    --input table

    select * from #test1

    create table #test2

    (id int,startdate datetime ,enddate datetime)

    insert into #test2

    select 111,'2016-12-01 00:00:00','2020-03-31 00:00:00'

    insert into #test2

    select 111,'2020-05-01 00:00:00','2078-12-31 00:00:00'

    insert into #test2

    select 222,'2020-01-01 00:00:00','2078-12-31 00:00:00'

    insert into #test2

    select 333,'2015-01-01 00:00:00', '2078-12-31 00:00:00'

    --output table

    select * from #test2?

     

  • The first thing i could think of seems to work but its not pretty i guess and im sure someone has a better solution, so lets see:

    My Approach:

    • get the next startdate partitioned by ID
    • get difference between Enddate of current Row and Startdate of next Row, set differece = 1 if there is no next row so it counts as next day (continuesly)
    • Group the data by ID and Datedifference
    select id,startdate=min(startdate),enddate=max(enddate)
    from (select *,Diff=isnull(DATEDIFF(DD,enddate,Test),1)
    from (
    select *,Test=LEAD(startdate) OVER(PARTITION BY ID ORDER BY ID,startdate)
    from #test1
    ) Previous_Date
    ) Date_Difference

    group by id,Diff
    order by 1,3
  • https://www.sqlservercentral.com/forums/topic/consolidate-overlapping-date-periods

     

    SELECT ID, StartDate = MIN(ts), EndDate = MAX(ts)
    FROM (
    SELECT ID, ts, Snapper = (ROW_NUMBER() OVER(ORDER BY ID, ts)+1)/2
    FROM (
    SELECT *, se = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ts2, Type DESC)
    FROM (
    SELECT ID, ts = StartDate, ts2 = StartDate, Type = 1
    ,e = NULL
    ,s = (2*ROW_NUMBER() OVER (PARTITION BY ID ORDER BY StartDate))-1
    FROM #test1
    UNION ALL
    SELECT ID, ts = EndDate, ts2 = DATEADD(DAY,1,EndDate), Type = -1
    ,e = (2*ROW_NUMBER() OVER (PARTITION BY ID ORDER BY EndDate))
    ,s = NULL
    FROM #test1
    ) d
    ) e
    WHERE se IN (e,s)
    ) f
    GROUP BY ID, Snapper
    ORDER BY id, StartDate

     

     

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thank you for your reply and help, this is working but I cannot understand why we are multiplying with 2* to the row number for e and s columns and then again /2 row number for snapper columns, it is so confusing.

    Also I have 3 miliion rows of actual data, do you think this approach is good to follow and will work fine with any issues. Thanks again

  • dhanekulakalyan wrote:

    Thank you for your reply and help, this is working but I cannot understand why we are multiplying with 2* to the row number for e and s columns and then again /2 row number for snapper columns, it is so confusing.

    Also I have 3 miliion rows of actual data, do you think this approach is good to follow and will work fine with any issues. Thanks again

    Try testing with say 100,000 rows first to get an idea of performance - select them out into a temp table ordered by ID so that you get all members of each ID partition except perhaps the last. A clustered index on ID will help a little with the query. I can't say how long it will take. The query is explained in the link I posted earlier, but the 2* for a and s columns is self explanatory if you execute the innermost query - I wrote it like this rather than as chained CTE's so you could do this. Any further questions, just holler.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • To resolve for arbitrary relationships in the data requires hierarchical recursion, no?  The OP's question and Phil's too

    drop table if exists dbo.test_1;
    go
    create table dbo.test_1(id int,startdate datetime ,enddate datetime)
    go

    insert dbo.test_1(id, startdate, enddate) values
    (111,'2016-12-01 00:00:00','2020-03-31 00:00:00'),
    (111,'2020-05-01 00:00:00','2020-05-31 00:00:00'),
    (111,'2020-06-01 00:00:00','2078-12-31 00:00:00'),
    (111,'2079-06-01 00:00:00','2080-12-31 00:00:00'),
    (111,'2081-01-01 00:00:00','2083-11-21 00:00:00'),
    (111,'2083-11-22 00:00:00','2086-12-31 00:00:00'),
    (111,'2089-06-01 00:00:00','2090-12-31 00:00:00'),
    (222,'2020-01-01 00:00:00','2020-02-29 00:00:00'),
    (222,'2020-03-01 00:00:00','2078-12-31 00:00:00'),
    (333,'2015-01-01 00:00:00', '2078-12-31 00:00:00');

    drop function if exists dbo.test_downlines;
    go
    create function dbo.test_downlines(
    @idint,
    @startdate datetime)
    returns table as
    return
    with
    base_cte(parent_id, match_dt, id, startdate, enddate) as (
    select
    lag(id) over (partition by id order by id, startdate),
    iif((lag(id) over (partition by id order by id, startdate) is null), null, dateadd(dd, -1, startdate)),
    t1.*
    from
    dbo.test_1 t1),
    recur_cte(parent_id, match_dt, id, startdate, enddate, h_level) as (
    select
    *,
    cast(0 as int)
    from
    base_cte t1
    where
    id=@id
    and startdate=@startdate
    union all
    select
    t1.*,
    rc.h_level+1
    from
    base_cte t1
    join
    recur_cte rc on t1.parent_id=rc.id
    and t1.match_dt=rc.enddate)
    select
    max(rc.enddate) enddate
    from
    recur_cte rc;
    go

    ;with
    base_cte(id, startdate, enddate) as (
    select t1.*
    from
    dbo.test_1 t1
    where not exists
    (select 1 from dbo.test_1 in_t1 where t1.id=in_t1.id and t1.startdate=dateadd(dd, 1, in_t1.enddate)))
    select
    bc.*, td.enddate range_end_dt
    from
    base_cte bc
    cross apply
    dbo.test_downlines(bc.id, bc.startdate) td;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • This seems to be a typical gaps & islands issue...here is another method:

    Declare @test1 Table (id int, startdate datetime, enddate datetime);

    Insert Into @test1 (id, startdate, enddate)
    Values (111, '2016-12-01 00:00:00', '2020-03-31 00:00:00')
    , (111,'2020-05-01 00:00:00','2020-05-31 00:00:00')
    , (111,'2020-06-01 00:00:00','2078-12-31 00:00:00')
    , (222,'2020-01-01 00:00:00','2020-02-29 00:00:00')
    , (222,'2020-03-01 00:00:00','2020-05-31 00:00:00')
    , (222,'2020-07-01 00:00:00','2078-12-31 00:00:00')
    , (333,'2015-01-01 00:00:00', '2015-05-31 00:00:00')
    , (333,'2015-06-01 00:00:00', '2015-12-31 00:00:00')
    , (333,'2016-01-01 00:00:00', '2019-10-31 00:00:00')
    , (333,'2020-01-01 00:00:00', '2078-12-31 00:00:00');

    Declare @minStartDate datetime = (Select min(t.startdate) From @test1 t)
    , @maxEndDate datetime = (Select max(t.enddate) From @test1 t);

    With t (n)
    As (
    Select t.n
    From (Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
    )
    , dates (FromDate)
    As (
    Select Top (datediff(day, @minStartDate, @maxEndDate) + 1)
    dateadd(day, checksum(row_number() over(Order By @@spid)) - 1, @minStartDate)
    From t t1, t t2, t t3, t t4
    )
    , dateGroups
    As (
    Select *
    , dateGroup = dateadd(day, -row_number() over(Partition By t1.id Order By d.FromDate), d.FromDate)
    From @test1 t1
    Inner Join dates d On d.FromDate Between t1.startdate And t1.enddate
    )
    Select dg.id
    , StartDate = min(dg.startdate)
    , EndDate = max(dg.enddate)
    From dateGroups dg
    Group By
    dg.id
    , dg.dateGroup
    Order By
    dg.id
    , StartDate

    I added a few additional rows for 222 and 333 - for 333 there are 3 rows that are continuous and the 4th row with a gap.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Here's a little sample data generator for anyone wishing to play along:

    IF OBJECT_ID('TEMPDB..#test1') IS NOT NULL DROP TABLE #test1
    SELECT RowID = ROW_NUMBER() OVER(ORDER BY ID, Startdate, Enddate),
    ID, Startdate, Enddate
    INTO #test1
    FROM (VALUES
    (3565, '20111101', '20130128'),(3565, '20120101', '20130128'),(3565, '20120405', '20120630'),
    (3565, '20120525', '20120630'),(3565, '20130125', '20130128'),(3565, '20140220', '20140313'),
    (3577, '20101202', '20150120'),(3612, '20111101', '20130206'),(3612, '20111101', '20130807'),
    (3612, '20111101', '20140731'),(3612, '20120823', '20120905'),(3612, '20120914', '20121031'),
    (3612, '20121101', '20121128'),(3612, '20121206', '20121224'),(3612, '20121226', '20130109'),
    (3612, '20130207', '20140731'),(3724, '20111101', '20130807'),(3724, '20120426', '20120508'),
    (3724, '20120515', '20120531'),(3724, '20120524', '20120531'),(3724, '20120607', '20120706'),
    (3724, '20120614', '20120713'),(3724, '20120802', '20120831'),(3724, '20120901', '20120930'),
    (3730, '20130516', '20130605'),(3730, '20130516', '20130807'),(3730, '20130516', '20131219'),
    (3730, '20130620', '20130704'),(3730, '20130711', '20130724'),(3730, '20130808', '20131219'),
    (3730, '20131212', '20140612'),(3730, '20140501', '20140528'),(3730, '20140619', '20141217'),
    (3730, '20140724', '20140820'),(3730, '20140814', '20140910'),(3730, '20140904', '20141001'),
    (3730, '20140911', '20141008'),(3730, '20141002', '20141029'),(3730, '20141106', '20141203'),
    (3730, '20141218', '20150617'),(3730, '20150129', '20150225'),(3730, '20150205', '20150304'),
    (3730, '20150305', '20150401'),(3730, '20150402', '20150429'),(3730, '20150430', '20150527'),
    (3730, '20150618', '20151223'),(3730, '20150723', '20150819'),(3730, '20150820', '20150916'),
    (3730, '20150917', '20151014'),(3730, '20151022', '20151118'),(3730, '20160107', '20160203'),
    (3730, '20160107', '20160629'),(3730, '20160204', '20160302'),(3730, '20160303', '20160316'),
    (3730, '20160303', '20160406'),(3730, '20160317', '20160330'),(3730, '20160414', '20160629'),
    (3730, '20160707', '20160803'),(3730, '20160804', '20160817'),(3730, '20160804', '20160831'),
    (3730, '20160804', '20160914'),(3730, '20160901', '20160914'),(3730, '20160901', '20160928'),
    (3730, '20160915', '20170201'),(3730, '20160915', '20170215'),(3730, '20161027', '20161109'),
    (3730, '20161027', '20161123'),(3730, '20161124', '20161207'),(3730, '20161124', '20161221'),
    (3730, '20170105', '20170201'),(3730, '20170119', '20170215'),(3730, '20170202', '20170215'),
    (3730, '20170216', '20170510'),(3730, '20170427', '20170524'),(3730, '20170511', '20170719'),
    (3730, '20170629', '20170630'),(3730, '20170720', '20170802'),(3730, '20170728', '20170729'),
    (3730, '20170803', '20171025'),(3730, '20170817', '20171025'),(3730, '20171026', '20171206'),
    (3730, '20171207', '20180103'),(3730, '20180104', '20180228'),(3730, '20180301', '20180523'),
    (3730, '20180524', '20180801'),(3730, '20180802', '20181024'),(3730, '20181025', '20190619')
    ) d (ID, Startdate, Enddate)

    CREATE CLUSTERED INDEX cx_Stuff ON #test1 (ID, StartDate)

    SELECT * FROM #test1

    My query generates the following output which looks correct:

    IDStartDateEndDate
    35652011-11-012013-01-28
    35652014-02-202014-03-13
    35772010-12-022015-01-20
    36122011-11-012014-07-31
    37242011-11-012013-08-07
    37302013-05-162014-06-12
    37302014-06-192015-12-23
    37302016-01-072016-06-29
    37302016-07-072019-06-19

     

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • dhanekulakalyan wrote:

    Thank you for your reply and help, this is working but I cannot understand why we are multiplying with 2* to the row number for e and s columns and then again /2 row number for snapper columns, it is so confusing.

    Also I have 3 miliion rows of actual data, do you think this approach is good to follow and will work fine with any issues. Thanks again

     

    A sample data set of 4,000 rows completes in about 40ms using my query, on this arbitrary system.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • The OP wrote: "output is expected as if member is continuously enrolled without any gap in enrollment get min and max and if there is enrollment gap leave as is."  Is the sample data above representative of the OP's question?  None of the sample data provided by the OP has overlapping date ranges.  Can a "membership" overlap itslef?  The OP's definition of "continuous" appears to be the difference between startdate and enddate (between "memberships") is 1 day.  If that condition isn't met then leave the row(s) as is.  According to that definition my code appears to work and the result is:

    idstartdateenddate        range_end_dt
    35652011-11-012013-01-282013-01-28
    35652012-01-012013-01-282013-01-28
    35652012-04-052012-06-302012-06-30
    35652012-05-252012-06-302012-06-30
    35652013-01-252013-01-282013-01-28
    35652014-02-202014-03-132014-03-13
    35772010-12-022015-01-202015-01-20
    36122011-11-012013-02-062014-07-31
    36122011-11-012013-08-072014-07-31
    36122011-11-012014-07-312014-07-31
    36122012-08-232012-09-052012-09-05
    36122012-09-142012-10-312012-11-28
    36122012-12-062012-12-242012-12-24
    36122012-12-262013-01-092013-01-09
    37242011-11-012013-08-072013-08-07
    37242012-04-262012-05-082012-05-08
    37242012-05-152012-05-312012-05-31
    37242012-05-242012-05-312012-05-31
    37242012-06-072012-07-062012-07-06
    37242012-06-142012-07-132012-07-13
    37242012-08-022012-08-312012-09-30
    37302013-05-162013-06-052013-12-19
    37302013-05-162013-08-072013-12-19
    37302013-05-162013-12-192013-12-19
    37302013-06-202013-07-042013-07-04
    37302013-07-112013-07-242013-07-24
    37302013-12-122014-06-122014-06-12
    37302014-05-012014-05-282014-05-28
    37302014-06-192014-12-172015-12-23
    37302014-07-242014-08-202014-08-20
    37302014-08-142014-09-102014-10-08
    37302014-09-042014-10-012014-10-29
    37302014-11-062014-12-032014-12-03
    37302015-01-292015-02-252015-02-25
    37302015-02-052015-03-042015-05-27
    37302015-07-232015-08-192015-10-14
    37302015-10-222015-11-182015-11-18
    37302016-01-072016-02-032016-06-29
    37302016-01-072016-06-292016-06-29
    37302016-04-142016-06-292016-06-29
    37302016-07-072016-08-032019-06-19
    37302016-10-272016-11-092016-12-21
    37302016-10-272016-11-232016-12-21
    37302017-01-052017-02-012019-06-19
    37302017-01-192017-02-152019-06-19
    37302017-04-272017-05-242017-05-24
    37302017-06-292017-06-302017-06-30
    37302017-07-282017-07-292017-07-29
    37302017-08-172017-10-252019-06-19

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • The query could be without the cte

    select
    t1.id, cast(t1.startdate as date) startdate, cast(t1.enddate as date) enddate, cast(td.enddate as date) range_end_dt
    from
    dbo.test_1 t1
    cross apply
    dbo.test_downlines(t1.id, t1.startdate) td
    where not exists
    (select 1 from dbo.test_1 in_t1 where t1.id=in_t1.id and t1.startdate=dateadd(dd, 1, in_t1.enddate))
    order by 1, 2;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Steve Collins wrote:

    The query could be without the cte

    select
    t1.id, cast(t1.startdate as date) startdate, cast(t1.enddate as date) enddate, cast(td.enddate as date) range_end_dt
    from
    dbo.test_1 t1
    cross apply
    dbo.test_downlines(t1.id, t1.startdate) td
    where not exists
    (select 1 from dbo.test_1 in_t1 where t1.id=in_t1.id and t1.startdate=dateadd(dd, 1, in_t1.enddate))
    order by 1, 2;

    Can you write it without the function?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work wrote:

    Can you write it without the function?

    In theory it should be possible but maybe not easily.  The tvf is parameterized so I think to remove it would require splitting it into two nested CROSS APPLY statements.  Yes the OP asked to resolve the example in temp table and my code only works against a physical table.  Fudged it because the OP also said "I have 3 miliion rows of actual data" which seems likely 🙂 to be in a physical table somewhere 🙂

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Did you know that a table must have a key. by definition? What you posted can never have a key goes all the columns can be NULL! Furthermore, an identifier cannot be numeric because you don't do calculations on it and it is measured on a nominal scale. We also have a date data type now, so there's no need to use the old Sybase datetime from 30 years ago. Nor should you be using the old row at a time. Insertion syntax from Sybase. Finally, I'm going to assume that you don't want an event to end before it starts. So you need a constraint. Remember, SQL is a declarative language. Most of the work is done in the DDL.

    In short, you just described a deck of old punchcards written in SQL. Let's fix that:

    CREATE TABLE Events

    (event_id CHAR(3) NOT NULL,

    event_start_date DATE NOT NULL,

    PRIMARY KEY (event_id, event_start_date), -- not an option!!

    event_end_date DATE NOT NULL,

    CHECK(event_start_date <= event_end_date));

    INSERT INTO Events

    VALUES

    ('111', '2016-12-01', '2020-03-31'),

    ('111', '2020-05-01', '2020-05-31'),

    ('111', '2020-06-01', '2078-12-31'),

    ('222', '2020-01-01', '2020-02-29'),

    ('222', '2020-03-01', '2078-12-31')'

    ('333', '2015-01-01', '2078-12-31'),

    Your mindset is still thinking of punchcards or magnetic tape files, where records are inserted one at a time. But SQL is based on tables and rose which are completely different. The table constructor, values, puts them all in at once and the insertion can be optimized.

    Please read some of the books by Chris Date. He discusses why you should never have two tables with the same structure in the same schema. What you are doing is mimicking the way we did magnetic tape files back the 1960s. In SQL we can do a view, or update a base table. You are using SQL as if you were hanging a new tape on a new tape drive and doing a file merge. Your mindset is still locked in physical models of data.

    My guess is you don't want that second table, but would rather update the existing Events base table. Google around for "islands and gaps" for various solutions. The lead () and lag () functions will be very useful for you. You might want to write a stored procedure that will find an existing event duration and do an update to it. If it qualifies

    Please post DDL and follow ANSI/ISO standards when asking for help. 

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply