Is there a way to combine/merge multiple continuous rows by date into a single row without using a loop/cursor?

  • I work with a lot of data that is based on membership enrollment segments. Each segment has an effective and term date and in many cases one starts right after the next. One of the things I have to do is put together a table of continuous enrollments by member. I already have code that works fine using a loop, but I was wondering if there was another way to handle it that I'm overlooking. I just have not been able to come up with a way to do it with a single query. I was thinking that it might be possible to use a common table expression with recursion, but have been unsuccessful in my attempts. Below is a sample of the data for one member. If you look closely, you will see there is a gap in enrollment between the 5th and 6th segment so obviously you can't use a simple group by with a min and max to get the job done. If you have a solution for this type of query I would really like to see it. My goal is to create a view that returns what I want.

    [font="Courier New"]

    Member_ID Eff_Date Term_Date Sequence_ID

    --------- ---------- ---------- --------------------

    123456789 2005-01-01 2006-01-31 1

    123456789 2006-02-01 2007-02-14 2

    123456789 2007-02-15 2007-05-11 3

    123456789 2007-05-12 2008-02-28 4

    123456789 2008-02-29 2008-12-31 5

    123456789 2009-11-01 2009-12-31 6

    123456789 2010-01-01 2010-08-11 7

    123456789 2010-08-12 2017-12-31 8

    [/font]

    The end result I'm looking for would be two records for this member like this.

    [font="Courier New"]

    Member_ID Eff_Date Term_Date Sequence_ID

    --------- ---------- ---------- --------------------

    123456789 2005-01-01 2008-12-31 1

    123456789 2009-11-01 2017-12-31 2

    [/font]

  • Search this site for "gaps and islands" - that should get you started.

    John

  • Thanks, I'll take a look at that. I seem to have found a solution using a series of CTEs that build upon each other. I did it by identifying the gaps and then figuring out which gap date (if any) is closest to each segment. Basically, once you link each segment with a gap then you can group by that and take the min effective date.

  • Try this

    SELECT s1.Member_ID,

    s1.Eff_Date,

    MIN(t1.Term_Date) AS Term_Date,

    ROW_NUMBER() OVER(ORDER BY s1.Eff_Date) AS Sequence_ID

    FROM MyTable s1

    INNER JOIN MyTable t1 ON t1.Member_ID=s1.Member_ID

    AND s1.Eff_Date <= t1.Term_Date

    AND NOT EXISTS(SELECT * FROM MyTable t2

    WHERE t2.Member_ID=t1.Member_ID

    AND (t1.Term_Date+1) >= t2.Eff_Date AND t1.Term_Date < t2.Term_Date)

    WHERE NOT EXISTS(SELECT * FROM MyTable s2

    WHERE s2.Member_ID=s1.Member_ID

    AND s1.Eff_Date > s2.Eff_Date AND (s1.Eff_Date-1) <= s2.Term_Date)

    GROUP BY s1.Member_ID,s1.Eff_Date

    ORDER BY s1.Member_ID,s1.Eff_Date;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • awesome solution!

  • I never did post my solution so here it is. Again, this was an attempt to do this with a single select statement. It uses CTEs much like you would with a series of steps and temp tables. You can achieve probably achieve better performance in a procedure or function, but this does work. Another thing I did not mention in my original post is that the same member can have multiple records that overlap with one another. The solution below takes care of all these issues. I did not test the other code that was posted, however I don't see how it could handle all of those scenarios. Thanks to everyone who chimed in.

    ;WITH cteAnchor AS(

    --This provides our anchor per member. We know that any changes

    --in enrollment must have occurred between those dates. The anchor

    --is important and can be used to easily change the logic in case

    --you are only interested in certain date ranges.

    SELECT

    e.Member_ID

    , Eff_Date = min(e.Eff_Date)

    , Term_Date = max(e.Term_Date)

    FROM dbo.tbl_Member_Enrollment e

    GROUP BY e.Member_ID

    )

    , cteChanges AS(

    --This step establishes all the segment change dates per member

    --that occurred during the anchor period. The term date being

    --returned will alway be from the anchor record. The actual term

    --date for each segment will be determined in the next step.

    --A change point is defined as any new effective date during the

    --anchor period. We use term dates to establish new effective

    --dates as well by adding a day to it. This gives us the ability

    --to establish all of the segments we need.

    SELECT

    dt.Member_ID

    , dt.Eff_Date

    , dt.Term_Date

    , Sequence_ID = ROW_NUMBER() OVER(PARTITION BY dt.Member_ID ORDER BY dt.Eff_Date)

    FROM

    (--derived table

    SELECT

    a.Member_ID

    , e.Eff_Date

    , a.Term_Date

    FROM

    cteAnchor a

    INNER JOIN dbo.tbl_Member_Enrollment e

    ON e.Member_ID = a.Member_ID

    AND e.Eff_Date BETWEEN a.Eff_Date AND a.Term_Date

    --==============

    UNION

    --==============

    SELECT

    a.Member_ID

    , Eff_Date = dateadd(day, 1, e.Term_Date)

    , a.Term_Date

    FROM

    cteAnchor a

    INNER JOIN dbo.tbl_Member_Enrollment e

    ON e.Member_ID = a.Member_ID

    AND e.Term_Date BETWEEN a.Eff_Date AND a.Term_Date

    AND e.Term_Date < a.Term_Date

    ) dt

    )

    , cteSegments AS(

    --This step establishes all of the continuous non-overlapping segments

    --of the anchor period. It uses the sequence ID to look at the next record

    --to establish the term date by subtracting one day from the effective

    --date of the next segment. This ensures there are no overlapping records.

    SELECT

    c1.Member_ID

    , c1.Eff_Date

    , Term_Date = (

    CASE

    WHEN c2.Member_ID IS NULL THEN

    c1.Term_Date

    ELSE

    dateadd(day, -1, c2.Eff_Date)

    END

    )

    , c1.Sequence_ID

    , Enrollment_Count = (

    SELECT count(*)

    FROM dbo.tbl_Member_Enrollment e

    WHERE

    e.Member_ID = c1.Member_ID

    AND c1.Eff_Date BETWEEN e.Eff_Date AND e.Term_Date

    )

    FROM

    cteChanges c1

    LEFT OUTER JOIN cteChanges c2

    ON c2.Member_ID = c1.Member_ID

    AND c2.Sequence_ID = c1.Sequence_ID + 1 --join with the next record

    )

    , cteContinuousChanges AS(

    --This step establishes all the segment gap dates per member

    --that occurred during the anchor period. The term date being

    --returned will always be from the anchor record. The actual term

    --date for each segment will be determined in the next step.

    SELECT

    dt.Member_ID

    , dt.Eff_Date

    , dt.Term_Date

    , Sequence_ID = ROW_NUMBER() OVER(PARTITION BY dt.Member_ID ORDER BY dt.Eff_Date)

    FROM

    (--derived table

    SELECT

    a.Member_ID

    , a.Eff_Date

    , a.Term_Date

    FROM cteAnchor a

    --==============

    UNION

    --==============

    SELECT

    a.Member_ID

    , s.Eff_Date

    , a.Term_Date

    FROM

    cteAnchor a

    INNER JOIN cteSegments s

    ON s.Member_ID = a.Member_ID

    AND s.Enrollment_Count = 0

    --==============

    UNION

    --==============

    SELECT

    a.Member_ID

    , Eff_Date = dateadd(day, 1, s.Term_Date)

    , a.Term_Date

    FROM

    cteAnchor a

    INNER JOIN cteSegments s

    ON s.Member_ID = a.Member_ID

    AND s.Enrollment_Count = 0

    AND s.Term_Date BETWEEN a.Eff_Date AND a.Term_Date

    AND s.Term_Date < a.Term_Date

    ) dt

    )

    , cteContinuousSegments AS(

    --This step establishes all of the continuous non-overlapping segments

    --of the anchor period using the gaps above. It uses the sequence ID

    --to look at the next record to establish the term date by subtracting

    --one day from the effective date of the next segment. This ensures there

    --are no overlapping records.

    SELECT

    c1.Member_ID

    , c1.Eff_Date

    , Term_Date = (

    CASE

    WHEN c2.Member_ID IS NULL THEN

    c1.Term_Date

    ELSE

    dateadd(day, -1, c2.Eff_Date)

    END

    )

    , c1.Sequence_ID

    , s.Enrollment_Count

    FROM

    cteContinuousChanges c1

    LEFT OUTER JOIN cteContinuousChanges c2

    ON c2.Member_ID = c1.Member_ID

    AND c2.Sequence_ID = c1.Sequence_ID + 1 --join with the next record

    INNER JOIN cteSegments s

    ON s.Member_ID = c1.Member_ID

    AND c1.Eff_Date BETWEEN s.Eff_Date AND s.Term_Date

    )

    SELECT *

    FROM cteContinuousSegments cs

  • In my problem I had a Queue of work to be performed by an SSIS package, we wanted an efficient way to group these request to lessen the burden on the ssis package (basically we didn't want it to work on duplicate requests)

    CampaignIDStartDateEndDate

    16177/22/2011 00:00:007/24/2011 00:00:00

    16177/22/2011 00:00:007/24/2011 00:00:00

    16177/22/2011 00:00:007/24/2011 00:00:00

    16177/22/2011 00:00:007/24/2011 00:00:00

    16177/22/2011 00:00:007/24/2011 00:00:00

    16177/22/2011 00:00:007/24/2011 00:00:00

    16177/25/2011 00:00:007/25/2011 00:00:00

    16177/22/2011 00:00:007/24/2011 00:00:00

    16177/25/2011 00:00:007/25/2011 00:00:00

    16177/25/2011 00:00:007/25/2011 00:00:00

    16177/23/2011 00:00:007/23/2011 00:00:00

    16177/25/2011 00:00:007/25/2011 00:00:00

    16177/22/2011 00:00:007/24/2011 00:00:00

    16177/25/2011 00:00:007/25/2011 00:00:00

    16177/24/2011 00:00:007/24/2011 00:00:00

    16177/25/2011 00:00:007/25/2011 00:00:00

    Using the solution this is what I see:

    16172011-07-22 00:00:00.0002011-07-25 00:00:00.000

    which is perfect. Given a Group By gets me almost there. @dawalker thanks for the explanation around your solution.

  • Glad you could use it. Unfortunately, I had to come up with it on my own because nothing else I found worked the way I needed it to. It was definitely one of the more challenging queries I've had to write. I'm constantly having to deal with date segments in my job so this logic is going to have a lot of uses for us especially in our new data warehouse. I verified it against the craziest set of records I could find in our system and it worked perfectly. If I ever improve upon it I will repost any updates. It is actually a very flexible query because you can easily return only the gaps or the covered periods just by changing the where clause on the final select. Considering what it does, it actually performs pretty well in our system. My stored procedure that uses a loop to do it is still faster though. I think that's mainly because I'm using several temp tables that I index and I only have to loop enough times to cover that max sequence ID.

  • Mark Cowne - Friday, March 25, 2011 2:03 PM

    Try thisSELECT s1.Member_ID, s1.Eff_Date, MIN(t1.Term_Date) AS Term_Date, ROW_NUMBER() OVER(ORDER BY s1.Eff_Date) AS Sequence_IDFROM MyTable s1 INNER JOIN MyTable t1 ON t1.Member_ID=s1.Member_ID AND s1.Eff_Date <= t1.Term_Date AND NOT EXISTS(SELECT * FROM MyTable t2 WHERE t2.Member_ID=t1.Member_ID AND (t1.Term_Date+1) >= t2.Eff_Date AND t1.Term_Date < t2.Term_Date) WHERE NOT EXISTS(SELECT * FROM MyTable s2 WHERE s2.Member_ID=s1.Member_ID AND s1.Eff_Date > s2.Eff_Date AND (s1.Eff_Date-1) <= s2.Term_Date) GROUP BY s1.Member_ID,s1.Eff_DateORDER BY s1.Member_ID,s1.Eff_Date;

    Thanks @MarkCowne your solution worked like a charm after 8 years for me 🙂
    Would be great if you or someone here could explain this solution as I am not able to get my head around this approach.
    Thanks in advance

  • Hi,

    I have recently come across this problem, but none of the solutions posted here convinced me as I have millions of records to process and I believe the solutions present here couldn't scale (may be I was wrong in my assumption) and also I am looking for a simple code.

    I tried different options, but nothing did match my expectation. But, after working on this problem for couple of days, finally found a solution when tried to solve this as a logical puzzle and more than 13 million rows have been processed under less than 40 seconds.

    Thought of posting this solutions as it would help someone who is facing this issue.

    Share your thoughts.

    IF OBJECT_ID('TEMPDB..#TEST') IS NOT NULL
    DROP TABLE #TEST
    CREATE TABLE #TEST(ID INT, FromDate DATETIME, ToDate DATETIME)
    GO

    INSERT #TEST
    VALUES
    (1, '01-01-2018', '01-10-2018'),
    (1, '01-11-2018', '01-25-2018'),
    (1, '01-26-2018', '01-28-2018'),
    (1, '01-29-2018', '01-30-2018'),
    (1, '02-01-2018', '02-10-2018'),
    (1, '02-11-2018', '02-20-2018'),
    (1, '02-21-2018', '02-24-2018'),
    (1, '03-01-2018', '03-10-2018'),
    (1, '03-11-2018', '03-20-2018'),
    (2, '01-01-2019', '01-28-2019'),
    (2, '01-29-2019', '01-31-2019')

    GO


    SELECT ID, FromDate, ToDate
    FROM
    (
    SELECT ID,
    CASE
    WHEN FromDate1 IS NOT NULL THEN FromDate1
    ELSE LAG(FromDate1, 1) OVER(PARTITION BY ID ORDER BY RowNum) END FromDate,
    ToDate1 AS ToDate
    FROM
    (
    SELECT ID, FromDate1, ToDate1, RowNum
    FROM
    (
    SELECT ID, FromDate, ToDate, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY FromDate,ToDate) RowNum,
    CASE
    WHEN LAG(ToDate,1) OVER(PARTITION BY ID ORDER BY FromDate,ToDate) IS NOT NULL
    AND DATEDIFF(D, LAG(ToDate, 1) OVER(PARTITION BY ID ORDER BY FromDate,ToDate),FromDate) <= 1 THEN NULL
    ELSE FromDate
    END FromDate1,
    CASE
    WHEN LEAD(ToDate,1) OVER(PARTITION BY ID ORDER BY FromDate,ToDate) IS NOT NULL
    AND DATEDIFF(D,ToDate, LEAD(FromDate, 1) OVER(PARTITION BY ID ORDER BY FromDate,ToDate)) <= 1 THEN NULL
    ELSE ToDate
    END ToDate1
    FROM #test
    ) a WHERE FromDate1 IS NOT NULL OR ToDate1 IS NOT NULL
    ) b
    ) c WHERE ToDate IS NOT NULL

    GO

     

     

     

     

     

  • This was removed by the editor as SPAM

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply