Discarding records whith date gaps

  • I have read the post regarding identifying gaps in dates but it hasn't helped me come up with a solution.

    I have a table like the following:

    --===== All Inserts into the IDENTITY column

    SET IDENTITY_INSERT #mytable ON

    --===== Insert the test data into the test table

    INSERT INTO #mytable (ID,clientID, enrolldate, exitDate, branchID)

    SELECT 1,'1','2008-11-05 00:00:00','2009-09-04 00:00:00','12'

    UNION ALL SELECT 2,'1','2006-03-13 00:00:00','2006-03-13 00:00:00','12'

    UNION ALL SELECT 3,'1','2006-06-05 00:00:00','2006-07-18 00:00:00','12'

    UNION ALL SELECT 4,'2','2009-04-01 00:00:00','2009-09-30 00:00:00','12'

    UNION ALL SELECT 5,'2','2009-10-02 00:00:00','2009-10-30 00:00:00','12'

    UNION ALL SELECT 6,'2','2007-02-20 00:00:00','2007-05-18 00:00:00','12'

    UNION ALL SELECT 7,'2','2009-11-09 00:00:00','2010-01-15 00:00:00','12'

    --===== Set the identity insert back to normal

    SET IDENTITY_INSERT #mytable OFF

    I need to calculate the clients length of stay by determining the earliest valid enroll date and the last exit date. A valid enroll date is defined like this: If a record's exit date is within 8 weeks of the next enroll date then the preceding enroll date is used. So for my example for client 1 the enroll date of 1 11/05/2008 is not within 8 weeks of the exit date of record 3 7/18/2006 so the length of stay would be calculated on record 1's enroll date and record 1's exit date. For client 2, his earlist valid enroll date is 4/1/2009 and the exit date is 1/15/2010. The record id 6 for client 2 is invalid because it's exit date is greater then 8 weeks prior to the next enroll date.

    I can calculate the earliest enroll date and the latest exit date without a problem, I just cannot figure out how to apply the eight week rule.

  • Does it cascade? If, for example, you had an enroll date that was within 8 weeks of a prior enroll date, which was within 8 weeks of an even earlier enroll date, what would you do in that case?

    If it doesn't, you can do an outer join from each row to the same table, the same clientID, and an enroll date that's within the 8-week limit, and use Coalesce or IsNull to pull the the right one.

    If it does cascade, you'll need to build a recursive CTE to keep doing that till it doesn't find a valid record. It's going to be an inefficient query, but it'll work.

    If you need help with either one of those, ask. But clarify the business-rule first.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks for the quick reply. It does indeed cascade. And I would like an example to build upon if possible.

  • See if this will do what you need:

    SELECT

    1 AS ID,

    1 AS clientID,

    CAST('2008-11-05 00:00:00' AS DATETIME) AS enrolldate,

    CAST('2009-09-04 00:00:00' AS DATETIME) AS exitDate,

    12 AS branchID

    INTO

    #mytable

    UNION ALL

    SELECT

    2,

    '1',

    '2006-03-13 00:00:00',

    '2006-03-13 00:00:00',

    '12'

    UNION ALL

    SELECT

    3,

    '1',

    '2006-06-05 00:00:00',

    '2006-07-18 00:00:00',

    '12'

    UNION ALL

    SELECT

    4,

    '2',

    '2009-04-01 00:00:00',

    '2009-09-30 00:00:00',

    '12'

    UNION ALL

    SELECT

    5,

    '2',

    '2009-10-02 00:00:00',

    '2009-10-30 00:00:00',

    '12'

    UNION ALL

    SELECT

    6,

    '2',

    '2007-02-20 00:00:00',

    '2007-05-18 00:00:00',

    '12'

    UNION ALL

    SELECT

    7,

    '2',

    '2009-11-09 00:00:00',

    '2010-01-15 00:00:00',

    '12' ;

    SELECT

    clientID,

    enrollDate,

    branchID,

    ISNULL(LastExit, exitDate) AS exitDate

    FROM

    #mytable

    OUTER APPLY (SELECT

    MT1.ID,

    MAX(MT2.exitDate) AS LastExit

    FROM

    #mytable AS MT1

    LEFT OUTER JOIN #mytable AS MT2

    ON MT1.clientID = MT2.clientID

    AND MT2.enrolldate BETWEEN MT1.exitDate

    AND DATEADD(week, 8, MT1.exitDate)

    AND MT1.ID != MT2.ID

    WHERE

    MT1.ID = #mytable.ID

    GROUP BY

    MT1.ID) AS Q1 ;

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thank you very much for the work put into this. It is close to what I need, and I will see if I can work out how to get it closer. Of course I am still willing to accept suggestions <grin>

    Looking at the results I see that record 1 and record 4 are providing the correct answers. What I need to do now is to eliminate the rest of the records from the result set or to somehow store the earliest viable enroll date and the last viable exit date

    12008-11-05 00:00:00.000122009-09-04 00:00:00.000

    12006-03-13 00:00:00.000122006-03-13 00:00:00.000

    12006-06-05 00:00:00.000122006-07-18 00:00:00.000

    22009-04-01 00:00:00.000122010-01-15 00:00:00.000

    22009-10-02 00:00:00.000122010-01-15 00:00:00.000

    22007-02-20 00:00:00.000122007-05-18 00:00:00.000

    22009-11-09 00:00:00.000122010-01-15 00:00:00.000

    So for this example set of data I should return clientid 1 with an enroll date of 2008-11-05, with an exit date of 2009-09-04 as shown in row 1. For clientid 2 I should return an enroll date of 2009-04-01 and en exit date of 2010-01-15 as shown in row 4. The other enroll and exit dates returned are extraneous

  • Change the Outers to Inners on the join and apply, see if that does what you need.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Using cross apply and changing the outer join to inner got me very close. The problem is that the results are excluding clientid 1. I believe this is failing because there are no records for clientid 1 that fit this part of the join

    AND MT2.enrolldate BETWEEN MT1.exitDate

    AND DATEADD(week, 8, MT1.exitDate)

    To add more definition to my problem, here is what I am trying to accomplish overall. I have a table of enrollments. The business rule is if a client leaves the program but rejoins the program within 8 weeks it's as if they never left, and the length of stay in the program continues from the previous enroll date. This can cascade to many enrollments over the course of several years. If however, the client leaves and returns after 8 weeks its as if the client started over and the length of stay starts again at the new enroll date. I am building a report that lists every client who exited the program within a specific time period (typically a calendar month but this is not guarenteed) and I need to identify how many months the client was enrolled in the program.

    So I could have many enrollment records and I need to pick the clients who exited within the report period and determine their lenght of stay utilizing the 8 week rule.

  • Can you change it back to Outer and add a Where clause to the outer query that gets you what you need? Something like "where LastExit between X and Y or exitDate between X and Y", maybe?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I just wonder, how set based solution would scale for thousands of rows. I have impression that for data islands problem loop based solution might scale better.

    Piotr

    ...and your only reply is slàinte mhath

  • Thanks for all the input thankfully it is unlikely that I'll need to iterate thousands of records since I will be running this report once a month and only be looking at clients that left during the month.

    Here is what I think I need to do.

    Order record set by clientid then by exitdate descending

    read enroll and exitdate

    store enrolldate in variable firstViableEntryDate

    store exitdate in variable workingExitDate

    Read next record

    Determine if new enrolldate is within 8 weeks of workingExitDate - if yes then store new enroldlate in firstViableEntryDate and jump back to read next record - if no exit the loop for this clientid

    Does this logic sound right? If so how would I do this in SQL?

  • That would be a cursor. It will work.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Could some one provide me with an example?

  • The logic that you're trying to use is very confusing to me, so I'll just show you how to get the next/prior record for a customer. Hopefully you can extract what you need from that. Post back if this doesn't help - but if you do, please try to be as clear as possible with your requirements / expected results.

    ;WITH CTE AS

    (

    SELECT ID, clientID, enrolldate, exitDate, branchID,

    RN = ROW_NUMBER() OVER (PARTITION BY clientID ORDER BY exitDate DESC)

    FROM #mytable

    )

    select CTEBase.*,

    NextExitDate = CTENext.exitDate,

    PriorExitDate = CTEPrior.exitDate,

    DaysDiff = DATEDIFF(day, CTEPrior.exitDate, CTENext.exitDate)

    FROM CTE CTEBase

    LEFT JOIN CTE CTENext

    ON CTEBase.clientID = CTENext.clientID

    AND CTEBase.RN = CTENext.RN+1

    LEFT JOIN CTE CTEPrior

    ON CTEBase.clientID = CTEPrior.clientID

    AND CTEBase.RN = CTEPrior.RN-1;

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Thank you sir for a working example. This gets me closer.

    The requirement is I need to take a look at clients who have exited a particular social service program. If a client goes through the exit process then the social service program gets to bill for the services rendered to this client. If however the client enrolls again within 8 weeks of their previous exit then it is as if the client has never left.

    The calculation that I am trying to get is a term called length of stay. So If I run a report of all clients who exited in January of 2010, I need to determine what their enroll date is for the length of stay calculation. So if we look at my sample data, client 2's last exit date was January 15th, 2010. His enroll date for that record (record id 7) was November 9, 2009.

    My calculation would now look like length of stay = difference between 1-15-2010 and 11-9-2009 in days.

    I now need to determine if client 2 was enrolled in the eight weeks prior to November 9, 2009. Record id 5 for this same client has an exit date of October 30, 2009 with an enroll date of October 2, 2009. The exit date of October 30, 2009 is within eight weeks of the enroll date of 11-9-2009, so now my calculation looks like

    length of stay = difference between 10-2-2009 and 1-15-2010 in days.

    Continuing on, we see that client 2 has another record where the enroll date is 4-1-2009 and the exit date is 9-30-2009. Following the same logic as above we see that the exit date of 9-30-2009 is within eight weeks of the entry date of 10-2-2009 so now my calculation looks like

    length of stay = difference between 4-1-2009 and 1-15-2010 in days.

    The earliest record for this client in this example shows an enroll date of 2-20-2007 and an exit date of 5-18-2007. The exit date of 5-18-2007 is not within 8 weeks prior of 4-1-2009 so we can ignore this record giving us the final calculation of

    length of stay = difference between 4-1-2009 and 1-15-2010 in days.

    I need to reprot on the fisrt enroll date used in the calculation, in this case enroldlate = 4-1-2009, the exit date used in this calculattion, in this case 1-15-2010 and the length of stay.

    Thanks again. I will take your example and see if I can work out more of what I am looking for but I wanted to clarify so that maybe we can get closer.

  • Does this do it?

    WITH CTE AS

    (

    SELECT ID, clientID, enrolldate, exitDate, branchID,

    RN = ROW_NUMBER() OVER (PARTITION BY clientID ORDER BY exitDate DESC)

    FROM #mytable

    ), CTE2 AS

    (

    SELECT CTEBase.*,

    NextExitDate = CTENext.exitDate,

    PriorExitDate = CTEPrior.exitDate,

    WeeksDiff = DATEDIFF(DAY, CTEPrior.exitDate, CTEBase.enrolldate)/7,

    NextEnrollDate = CTENext.enrolldate,

    PriorEnrollDate = CTEPrior.enrolldate

    FROM CTE CTEBase

    LEFT JOIN CTE CTENext

    ON CTEBase.clientID = CTENext.clientID

    AND CTEBase.RN = CTENext.RN+1

    LEFT JOIN CTE CTEPrior

    ON CTEBase.clientID = CTEPrior.clientID

    AND CTEBase.RN = CTEPrior.RN-1

    )

    SELECT CTE.ID,

    CTE.clientID,

    CTE.enrolldate,

    CTE.exitDate,

    CTE.branchID,

    FirstEnrollDate = COALESCE(ds.FirstEnrollDate, CTE.enrollDate),

    LengthOfStay = DATEDIFF(DAY, COALESCE(ds.FirstEnrollDate, CTE.enrollDate), CTE.exitDate)

    FROM CTE

    CROSS APPLY (SELECT FirstEnrollDate = MIN(PriorEnrollDate)

    FROM CTE2

    WHERE CTE2.clientID = CTE.clientID

    AND CTE2.RN > CTE.RN

    AND CTE2.WeeksDiff <= 8) ds

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply