Merging date based on start and end date

  • I have tried to group/merge dates (Min/Max) but due to same data at line 1,2 and 4 , its not working as expected. Thank for looking into it.

    create table #Temp
    (
    ID INT,
    Status_ID INT,
    StartDate DATE,
    EndDate DATE
    )

    INSERT INTO #TEMP
    SELECT 101,1,'01/01/2020','01/28/2020'
    UNION ALL
    SELECT 101,1,'01/29/2020','01/31/2020'
    UNION ALL
    SELECT 101,5,'02/01/2020','02/29/2020'
    UNION ALL
    SELECT 101,1,'03/01/2020','03/28/2020'


    --OUTPUT
    101,1,'01/01/2020','01/31/2020'
    101,5,'02/01/2020','02/29/2020'
    101,1,'03/01/2020','03/28/2020'

     

     

  • This looks like a gaps and islands type problem to me.  I have a question about your data though - how many days after "EndDate" must a "StartDate" be to be considered a gap?  With the data you provided it looks like 1 day gap means it is not a gap (January 28th to January 29th), but a 1 month gap is a gap (January 31st to March 1st).

    Basically 3 scenarios come up in my mind:

    1 - EndDate is less than the following StartDate

    2 - EndDate is equal to the following StartDate

    3 - EndDate is larger than the following StartDate

    With case 1, how many days difference is there allowed to be before it becomes a "gap"?

    With case 2, should this be a gap or no?

    With case 3, should this be a gap or no?

    My assumptions are that in case 1, 1 day difference is the maximum acceptable value.  Case 2 is not a gap and case 3 is not a gap.

    Before digging too deep into this, are these assumptions correct?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • It's very good questions, all assumptions are looks correct. one thing is there could be possible start and end date is same but next value one day gap is there, below is the example.. hops this helps

    1/1/2020   1/1/2020

    1/2/2020  1/31/2020

  • How about something like this:

    CREATE TABLE [#Temp]
    (
    [ID] INT
    , [Status_ID] INT
    , [StartDate] DATE
    , [EndDate] DATE
    );

    INSERT INTO [#Temp]
    SELECT
    101
    , 1
    , '01/02/2020'
    , '01/28/2020'
    UNION ALL
    SELECT
    101
    , 1
    , '01/29/2020'
    , '01/31/2020'
    UNION ALL
    SELECT
    101
    , 5
    , '02/01/2020'
    , '02/29/2020'
    UNION ALL
    SELECT
    101
    , 1
    , '03/01/2020'
    , '03/28/2020'
    UNION ALL
    SELECT
    101
    , 1
    , '01/02/2020'
    , '01/25/2020'
    UNION ALL
    SELECT
    101
    , 1
    , '01/01/2020'
    , '01/02/2020';

    WITH [InitialDataWithRowNumber]
    AS
    (
    SELECT
    [ID]
    , [Status_ID]
    , [StartDate]
    , [EndDate]
    , ROW_NUMBER() OVER (PARTITION BY
    [ID]
    , [Status_ID]
    ORDER BY [StartDate]
    , [EndDate]
    ) AS [RN]
    FROM[#Temp]
    )
    , [DataWithPreviousRow]
    AS
    (
    SELECT
    [InitialDataWithRowNumber].[ID]
    , [InitialDataWithRowNumber].[Status_ID]
    , [InitialDataWithRowNumber].[StartDate]
    , [InitialDataWithRowNumber].[EndDate]
    , [InitialDataWithRowNumber].[RN]
    , [cteloop].[EndDate]AS [ed2]
    FROM[InitialDataWithRowNumber]
    LEFT JOIN[InitialDataWithRowNumber] AS [cteloop]
    ON [cteloop].[RN] + 1 = [InitialDataWithRowNumber].[RN]
    AND [cteloop].[ID] = [InitialDataWithRowNumber].[ID]
    AND [cteloop].[Status_ID] = [InitialDataWithRowNumber].[Status_ID]
    )
    , [IslandMarkers]
    AS
    (
    SELECT
    [DataWithPreviousRow].[ID]
    , [DataWithPreviousRow].[Status_ID]
    , [DataWithPreviousRow].[StartDate]
    , [DataWithPreviousRow].[EndDate]
    , SUM( CASE
    WHEN [DataWithPreviousRow].[ed2] >= DATEADD(DAY
    , -1
    , [DataWithPreviousRow].[StartDate]
    )
    THEN 0
    ELSE 1
    END
    ) OVER (PARTITION BY
    [DataWithPreviousRow].[Status_ID]
    ORDER BY [DataWithPreviousRow].[RN]
    ) AS [islandID]
    FROM[DataWithPreviousRow]
    )
    SELECT
    [IslandMarkers].[ID]
    , [IslandMarkers].[Status_ID]
    , MIN([IslandMarkers].[StartDate]) AS [startdate]
    , MAX([IslandMarkers].[EndDate]) AS [enddate]
    FROM[IslandMarkers]
    GROUP BY[IslandMarkers].[islandID]
    , [IslandMarkers].[ID]
    , [IslandMarkers].[Status_ID];

    DROP TABLE [#Temp];

    A bit of a long query, but I think it captures what you are trying to do and I think I captured all 3 cases listed above as well as the example you gave where start date and end date match.

    quick breakdown of the code.  The first bit (before the WITH) is just creating the temporary table and populating it with some sample data.  Then the fun begins.  The first CTE (InitialDataWithRowNumber) is the initial data but I added a row number onto it so I can ensure the order of the data will be consistent across runs, plus then I have a way to define what is the 'previous" row.  The next CTE (DataWithPreviousRow) is grabbing the data from the first CTE, joining it to itself with the comparison making sure we get the previous row.  From the previous row, all we really care about is the EndDate.  The other values are not very useful. At this point, each row has the current row data from the original table and the value of the previous row as defined by the row number calculated in the first CTE.

    Nothing too crazy so far.  Next one gets a bit more fun as we want to group off the islands.  Here we call a group an island if the previous rows end date is greater than or equal to the current row's Start Date minus 1 day.  So if the current row StartDate is 1/2/2020 and the previous row EndDate is 1/1/2020, we want those to match.  If the comparison is true, we add 0 to the island value.  If the comparison is false, we add 1 to the island value.  This way we can ensure that the island vales are going to match up.

    Now that we have all of the islands, we just grab the min and max values per island.  We also need to group by ID and Status_ID as we are doing aggregate calculations and we (likely) want to group by that anyway.

    Does that query make sense?

    This site has some good information on the gaps and islands problem with respect to dates:

    https://bertwagner.com/2019/03/12/gaps-and-islands/

    It uses LAG, which is what I essentially did in that second CTE, just a bit differently.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply