Merge spans with Dates Logic

  • --for a given member if the startdate and endate is continous we need to keep in single record and if start date and end date is not continous i need to keep in separate record for a given member.

     

    drop table #test

    create table #test

    (ID int,

    startdate datetime,

    enddate datetime

    )

    insert into #test

    values (1,'01/01/2024','01/31/2024'),

    (1,'02/01/2024','04/30/2024'),

    (2,'01/01/2024','01/31/2024'),

    (2,'02/01/2024','02/29/2024'),

    (2,'04/01/2024','04/30/2024'),

    (2,'06/01/2024','06/30/2024'),

    (3,'07/01/2024','12/31/2024')

    Select * from #test

     

    --expected output

    (1,'01/01/2024','04/30/2024'),

    (2,'01/01/2024','02/29/2024'),

    (2,'04/01/2024','04/30/2024'),

    (2,'06/01/2024','06/30/2024'),

    (3,'07/01/2024','12/31/2024')

  • What have you tested ?

    Here is an example ( but you'd still have to validate performancewise in your environment )

    drop table #test

    CREATE TABLE #test
    (MemberID INT
    , startdate DATETIME
    , enddate DATETIME
    );

    insert into #test
    values (1,'01/01/2024','01/31/2024'),
    (1,'02/01/2024','04/30/2024'),
    (2,'01/01/2024','01/31/2024'),
    (2,'02/01/2024','02/29/2024'),
    (2,'03/01/2024','03/31/2024'),
    (2,'04/01/2024','04/30/2024'),
    (2,'05/01/2024','05/31/2024'),

    (2,'07/01/2024','07/31/2024'),

    (3,'07/01/2024','08/31/2024'),
    (3,'09/01/2024','12/31/2024')
    ;

    ;WITH ctePrevEnddate AS (
    SELECT
    MemberID,
    startdate,
    enddate,
    LAG(enddate) OVER (PARTITION BY MemberID ORDER BY startdate) AS prev_enddate
    FROM #test
    ),
    cteCheckContinous AS (
    SELECT
    MemberID,
    startdate,
    enddate,
    CASE
    WHEN prev_enddate IS NULL THEN 1 -- First record for member
    WHEN DATEDIFF(DAY, prev_enddate, startdate) = 1 THEN 0 -- Continuous
    ELSE 1 -- Not Continuous
    END AS IsNonContinuous
    FROM ctePrevEnddate
    ),
    cteAssemblies AS (
    SELECT
    MemberID,
    startdate,
    enddate,
    SUM(IsNonContinuous) OVER (PARTITION BY MemberID ORDER BY startdate) AS Ranges
    FROM cteCheckContinous
    )
    -- Select * from cteAssemblies order by MemberID, startdate, Ranges
    SELECT
    MemberID,
    MIN(startdate) AS startdate,
    MAX(enddate) AS enddate
    FROM cteAssemblies
    GROUP BY MemberID, Ranges
    ORDER BY MemberID, startdate;

    Keep in min I've added rows to your test data ! ( and altered ID to MemberID )

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Here is a different approach based on an article by Itzik Ben-Gan.

    There are three types of intervals that you can work with:

    • Open intervals -- Neither endpoint is included
    • Half-closed (or half-open) intervals -- One endpoint is included

      • When working with time intervals, you almost always include the start date and exclude the end date.

    • Closed intervals -- Both endpoints are included.

    While it's easier for humans to understand closed intervals, it's much easier for computers to work with half-closed intervals.  So the first step I take is to convert your closed intervals to half-closed intervals while simultaneously changing your intervals to dates of interest and the type of change.

    The next step is to determine whether this is a new interval or a continuation of a previous interval.  This depends on whether you want to merge adjacent intervals or keep them separate.  You've indicated that you want to merge them, so we're sorting start dates (change +1) before end dates (change -1).  If you wanted to keep them separate, it's a simple matter of changing t.changeValue DESCto t.changeValue ASC.

    WITH TestGrouped AS  -- Look for new groups
    (
    SELECT t.ID
    , t.startdate
    , t.enddate
    , v.changeDate
    , v.changeValue
    , CASE WHEN SUM(v.changeValue) OVER(PARTITION BY t.ID ORDER BY v.changeDate, v.changeValue DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) > 0 -- Not a new instance
    THEN 0
    ELSE 1 -- Sum is 0 or NULL
    END AS IsNewGroup
    FROM #test AS t
    /* Unpivot intervals changing closed intervals to half-closed intervals. */
    CROSS APPLY (VALUES (t.startdate, 1), (DATEADD(DAY, 1, t.enddate), -1)) v (changeDate, changeValue)
    )
    , GroupNums AS -- Assign Group Numbers
    (
    SELECT t.ID
    , t.startdate
    , t.enddate
    , t.changeDate
    , t.changeValue
    , SUM(t.IsNewGroup) OVER(PARTITION BY t.ID ORDER BY t.changeDate, t.changeValue DESC ROWS UNBOUNDED PRECEDING) AS GroupNum
    FROM TestGrouped AS t
    )
    SELECT g.ID
    , MIN(g.changeDate) AS StartDate
    , DATEADD(DAY, -1, MAX(g.changeDate)) AS EndDate -- Change back to closed intervals.
    FROM GroupNums AS g
    GROUP BY g.ID, g.GroupNum
    ORDER BY g.ID, g.GroupNum;

    This produces the same results as Johan's, but with far fewer reads on the 'Worktable'.

    -- Drew's query
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table '#test__00000003BAED'. Scan count 1, logical reads 1, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    -- Johan's query
    Table 'Worktable'. Scan count 10, logical reads 43, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table '#test___00000003BAED'. Scan count 1, logical reads 1, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    • This reply was modified 4 weeks ago by drew.allen. Reason: Comment about the SUM being zero or NULL was on the wrong line
    • This reply was modified 4 weeks ago by drew.allen.

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Another thing I should point out is that Johan's code assumes that your data is "clean", that is, that there are no overlaps.  Mine will handle overlaps.  An easy way to test this is to change the end date on the first record to from 1/31 to 3/31.  With this change, Johan's code will not merge the two records for ID 1, but mine will.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply