Evaluating Rows to identify >=3 months of consecutive dates and exclude first and last month

  • Hi all,

    I need to evaluate the records to:

    1. Find persons who have >=3 months of consecutive data

    2. Exclude the first and last month of the consecutive dates

    3. The dates span multiple years

    Here's some test data:

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#test','U') IS NOT NULL

    DROP TABLE #test

    --Create the test table

    CREATE TABLE #test

    (

    PERSON_ID nvarchar(20),

    DOS_SPAN DATETIME,

    MONTH INT,

    YEAR INT

    )

    --POPULATE WITH TEST DATA

    SELECT '108398123','8','2014','Aug 1 2014 12:00AM' UNION ALL

    SELECT '108398123','9','2014','Sep 1 2014 12:00AM' UNION ALL

    SELECT '106300298','1','2014','Jan 1 2014 12:00AM' UNION ALL

    SELECT '106300298','2','2014','Feb 1 2014 12:00AM' UNION ALL

    SELECT '106300298','3','2014','Mar 1 2014 12:00AM' UNION ALL

    SELECT '106300298','4','2014','Apr 1 2014 12:00AM' UNION ALL

    SELECT '106497664','1','2014','Jan 1 2014 12:00AM' UNION ALL

    SELECT '106497664','1','2015','Jan 1 2015 12:00AM' UNION ALL

    SELECT '106497664','2','2015','Feb 1 2015 12:00AM' UNION ALL

    SELECT '106497664','3','2014','Mar 1 2014 12:00AM' UNION ALL

    SELECT '106497664','3','2015','Mar 1 2015 12:00AM' UNION ALL

    SELECT '106497664','4','2014','Apr 1 2014 12:00AM' UNION ALL

    SELECT '106497664','5','2014','May 1 2014 12:00AM' UNION ALL

    SELECT '106497664','6','2014','Jun 1 2014 12:00AM' UNION ALL

    SELECT '106497664','7','2014','Jul 1 2014 12:00AM' UNION ALL

    SELECT '106497664','8','2014','Aug 1 2014 12:00AM' UNION ALL

    SELECT '106497664','9','2014','Sep 1 2014 12:00AM' UNION ALL

    SELECT '106497664','10','2014','Oct 1 2014 12:00AM' UNION ALL

    SELECT '106497664','11','2014','Nov 1 2014 12:00AM' UNION ALL

    SELECT '106497664','12','2014','Dec 1 2014 12:00AM'

    I've tried the code below and it works when I am only evaluating the month within a given year, however the data spans 2 years.

    WITH CTE AS(

    SELECT *,

    DOS_MONTH - ROW_NUMBER() OVER(PARTITION BY PERSON_ID ORDER BY DOS_MONTH) grouper,

    ROW_NUMBER() OVER(PARTITION BY PERSON_ID ORDER BY DOS_MONTH) rnfirst,

    ROW_NUMBER() OVER(PARTITION BY PERSON_ID ORDER BY DOS_MONTH DESC) rnlast

    FROM #test

    )

    SELECT PERSON_ID,

    DOS_MONTH

    INTO #test_1

    FROM CTE

    WHERE rnfirst > 1

    AND rnlast > 1

    AND PERSON_ID IN(SELECT PERSON_ID

    FROM CTE

    GROUP BY PERSON_ID, grouper

    HAVING COUNT(*) >= 3)

    ORDER BY PERSON_ID, DOS_MONTH

    When I substitute field DOS_MONTH to DOS_SPAN using the above code, I get 0 records.

  • Thanks for taking a shot at posting readily consumable data. Unfortunately, you're missing the INSERT and the test data is actually in a different order than the table. Here's the corrected version.

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#Test','U') IS NOT NULL

    DROP TABLE #Test

    ;

    --===== Create the test table

    CREATE TABLE #Test

    (

    Person_ID NVARCHAR(20)

    ,DOS_Span DATETIME

    ,[Month] INT

    ,[Year] INT

    )

    ;

    --===== Populate the test table with data (Different order than the table)

    INSERT INTO #Test

    (Person_ID,[Month],[Year],DOS_Span )

    SELECT '108398123','8','2014' ,'Aug 1 2014 12:00AM' UNION ALL

    SELECT '108398123','9','2014' ,'Sep 1 2014 12:00AM' UNION ALL

    SELECT '106300298','1','2014' ,'Jan 1 2014 12:00AM' UNION ALL

    SELECT '106300298','2','2014' ,'Feb 1 2014 12:00AM' UNION ALL

    SELECT '106300298','3','2014' ,'Mar 1 2014 12:00AM' UNION ALL

    SELECT '106300298','4','2014' ,'Apr 1 2014 12:00AM' UNION ALL

    SELECT '106497664','1','2014' ,'Jan 1 2014 12:00AM' UNION ALL

    SELECT '106497664','1','2015' ,'Jan 1 2015 12:00AM' UNION ALL

    SELECT '106497664','2','2015' ,'Feb 1 2015 12:00AM' UNION ALL

    SELECT '106497664','3','2014' ,'Mar 1 2014 12:00AM' UNION ALL

    SELECT '106497664','3','2015' ,'Mar 1 2015 12:00AM' UNION ALL

    SELECT '106497664','4','2014' ,'Apr 1 2014 12:00AM' UNION ALL

    SELECT '106497664','5','2014' ,'May 1 2014 12:00AM' UNION ALL

    SELECT '106497664','6','2014' ,'Jun 1 2014 12:00AM' UNION ALL

    SELECT '106497664','7','2014' ,'Jul 1 2014 12:00AM' UNION ALL

    SELECT '106497664','8','2014' ,'Aug 1 2014 12:00AM' UNION ALL

    SELECT '106497664','9','2014' ,'Sep 1 2014 12:00AM' UNION ALL

    SELECT '106497664','10','2014','Oct 1 2014 12:00AM' UNION ALL

    SELECT '106497664','11','2014','Nov 1 2014 12:00AM' UNION ALL

    SELECT '106497664','12','2014','Dec 1 2014 12:00AM'

    ;

    Also, I couldn't tell for sure if requirement #2 was to be applied before or after requirement #1. I went with the order that things were listed.

    WITH

    cteMinMaxDates AS

    (

    SELECT Person_ID

    ,DOS_Span

    ,Grp = DATEADD(mm,-ROW_NUMBER() OVER (PARTITION BY Person_ID ORDER BY DOS_Span),DOS_Span)

    FROM #Test

    )

    SELECT Person_ID

    ,MinDate = MIN(DATEADD(mm,+1,DOS_Span))

    ,MaxDate = MAX(DATEADD(mm,-1,DOS_Span))

    FROM cteMinMaxDates

    GROUP BY Person_ID,Grp

    HAVING COUNT(*) >= 3

    ;

    Note that if a given Person_ID has 2 or more separate periods (at least 1 month is missing) that meet the requirements, the Person_ID will be returned more than once along with the related Min/Max dates.

    Shifting gears a bit, the Month and Year columns in the table are, IMHO, useless for just about any temporal manipulation or calculation. They're not even real handy for reporting purposes. I usually recommend that the columns be removed from the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Thank you SO both fixing up my code and providing the solution. I tested and it works wonderfully!

    Thanks again 😀

  • Great solution Jeff. I had to think about it a bit to understand the why, but I get it.

  • Thank you both for the feedback.

    For those that may not understand how the formula for the "Grp" column works, you can find an explanation of the principle in the article at the following URL.

    http://www.sqlservercentral.com/articles/T-SQL/71550/

    The data in the article is based on contiguous days but as you can see in the code above, the principle applies to virtually any regular temporal resolution.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    How do I modify the code you provided to now include the first and last month of the consecutive service?

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply