Count of consecutive years of participation (islands and gaps)

  • I have a data set (snippet below) and I need to count the number of consecutive years based on a date in time for each ID as represented below.

    ID DATE

    ------ --------

    1 2000-05-03

    1 2001-06-10

    1 2002-04-02

    1 2005-07-29

    1 2010-12-15

    4 2001-05-07

    4 1999-08-01

    4 2000-07-05

    4 2001-08-01

    9 2002-05-01

    9 2000-04-02

    My result set needs to be something like:

    ID Count of Consecutive Years

    ------- -----------------------------

    1 2

    4 2

    9 0

    I know this is a gaps and islands type problem but nothing I have been able to find is working once I attempt modification so that it can fit my dataset. Any help or pointers would be appreciated. Please note that I am going to use the data return to populate another table that is currently being populated using a cursor that utilizes an insert statement based on different codes.

  • Well how about you post what you have tried and some test data (see the link in my signature), then we can help you?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I have included a snippet of my dataset. Unfortunately, I would need to try and find the specific things I have tried as I blew them away when they did not work. If you would like me to do that I can.

  • OLSONEJ (6/29/2015)


    I have a data set (snippet below) and I need to count the number of consecutive years based on a date in time for each ID as represented below.

    ID DATE

    ------ --------

    1 2000-05-03

    1 2001-06-10

    1 2002-04-02

    1 2005-07-29

    1 2010-12-15

    4 2001-05-07

    4 1999-08-01

    4 2000-07-05

    4 2001-08-01

    9 2002-05-01

    9 2000-04-02

    My result set needs to be something like:

    ID Count of Consecutive Years

    ------- -----------------------------

    1 2

    4 2

    9 0

    I know this is a gaps and islands type problem but nothing I have been able to find is working once I attempt modification so that it can fit my dataset. Any help or pointers would be appreciated. Please note that I am going to use the data return to populate another table that is currently being populated using a cursor that utilizes an insert statement based on different codes.

    Here is the type of thing we would like to see. Notice how we can just copy and paste this to start working on an answer.

    with Something as

    (

    select 1 as ID, cast('2000-05-03' as datetime) as MyDate union all

    select 1, '2001-06-10' union all

    select 1, '2002-04-02' union all

    select 1, '2005-07-29' union all

    select 1, '2010-12-15' union all

    select 4, '2001-05-07' union all

    select 4, '1999-08-01' union all

    select 4, '2000-07-05' union all

    select 4, '2001-08-01' union all

    select 9, '2002-05-01' union all

    select 9, '2000-04-02'

    )

    What I don't understand is how you came up with the results you state you want. What is the business logic for determining if it is a consecutive year? If the subsequent entry is made within a year of the previous one? Assuming that is the case your desired output does not seem to match the data.

    This part of your question scares me:

    Please note that I am going to use the data return to populate another table that is currently being populated using a cursor that utilizes an insert statement based on different codes.

    Cursors are generally very slow and for inserting data it is almost never the best way to do that. Why do you need a cursor? It sounds like perhaps a case expression would be much better. We can help with that part too if you give us some details.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • DDL & consumable test data is the important thing. Something more akin to the following...

    IF OBJECT_ID('tempdb..#temp') IS NOT NULL

    DROP TABLE #temp;

    CREATE TABLE #temp (

    ID INT NOT NULL,

    [Date] DATE NOT NULL

    );

    INSERT #temp (ID, [Date]) VALUES

    (1, '2000-05-03'),

    (1, '2001-06-10'),

    (1, '2002-04-02'),

    (1, '2005-07-29'),

    (1, '2010-12-15'),

    (4, '2001-05-07'),

    (4, '1999-08-01'),

    (4, '2000-07-05'),

    (4, '2001-08-01'),

    (9, '2002-05-01'),

    (9, '2000-04-02');

    SELECT * FROM #temp t

  • Based on your desired output, I think the following is what you're looking for...

    WITH AddRN AS (

    SELECT

    t.ID,

    t.Date,

    ROW_NUMBER() OVER (PARTITION BY t.ID ORDER BY t.Date) AS RN

    FROM

    #temp t

    )

    SELECT

    a.ID,

    SUM(CASE WHEN DATEDIFF(yy, b.Date, a.Date) = 1 THEN 1 ELSE 0 END) AS ConsecYears

    FROM

    AddRN a

    LEFT JOIN AddRN b

    ON a.ID = b.ID

    AND a.RN = b.RN + 1

    GROUP BY

    a.ID

    the results...

    ID ConsecYears

    --------- -----------

    1 2

    4 2

    9 0

    Edit... Note that the above solution works in the earlier versions of SQL Server. SQL Server 2012 introduced the LAG & LEAD functions which would allow you to get the same results in a single query and without having to do a self join.

  • I assume that "from #temp t" is due to you using a temp table to run the cte query and that it would not be a part of a final solution?

  • Not really. It's only in TempDB because for him, it's not a permanent part of any database. you would put it in your database and not in TempDB.

    The reason people here do that is so that they don't bloat their databases with a bunch of tables they will only use once (to answer a question).

  • pietlinden (6/29/2015)


    Not really. It's only in TempDB because for him, it's not a permanent part of any database. you would put it in your database and not in TempDB.

    The reason people here do that is so that they don't bloat their databases with a bunch of tables they will only use once (to answer a question).

    Yep... Exactly that... I put my test data in a temp table called #temp, so that's what I queried. 😉

  • OLSONEJ (6/29/2015)


    I have a data set (snippet below) and I need to count the number of consecutive years based on a date in time for each ID as represented below.

    ID DATE

    ------ --------

    1 2000-05-03

    1 2001-06-10

    1 2002-04-02

    1 2005-07-29

    1 2010-12-15

    4 2001-05-07

    4 1999-08-01

    4 2000-07-05

    4 2001-08-01

    9 2002-05-01

    9 2000-04-02

    My result set needs to be something like:

    ID Count of Consecutive Years

    ------- -----------------------------

    1 2

    4 2

    9 0

    I know this is a gaps and islands type problem but nothing I have been able to find is working once I attempt modification so that it can fit my dataset. Any help or pointers would be appreciated. Please note that I am going to use the data return to populate another table that is currently being populated using a cursor that utilizes an insert statement based on different codes.

    I have to agree with Sean, the desired output does not match the data, specifically ID 4 surely has 3 or maybe even 4) consecutive years, unless you have a third column which determines the order?

    How do you classify consecutive years?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Despite the good code that has shown up on this thread and to add to the questions that Sean and Maggo have asked, I'm not sure that the definition of the problem is correct.

    For example, adding the following data will result in ID=2 showing up as having a consecutive year even though it's a span of just 1 day. Are you sure that's correct?

    (2, '2000-12-31'),

    (2, '2001-01-01')

    ;

    I also agree that if you're considering using a cursor to populate another table from this, then you need to come back so we can show how to get around that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Whoops, that is my fault. 4 should have three upon looking at it again. My apologies.

  • OLSONEJ (6/29/2015)


    I have a data set (snippet below) and I need to count the number of consecutive years based on a date in time for each ID as represented below.

    ID DATE

    ------ --------

    1 2000-05-03

    1 2001-06-10

    1 2002-04-02

    1 2005-07-29

    1 2010-12-15

    4 2001-05-07

    4 1999-08-01

    4 2000-07-05

    4 2001-08-01

    9 2002-05-01

    9 2000-04-02

    My result set needs to be something like:

    ID Count of Consecutive Years

    ------- -----------------------------

    1 2

    4 2

    9 0

    I know this is a gaps and islands type problem but nothing I have been able to find is working once I attempt modification so that it can fit my dataset. Any help or pointers would be appreciated. Please note that I am going to use the data return to populate another table that is currently being populated using a cursor that utilizes an insert statement based on different codes.

    I'm sure that the cursor could probably be replaced, but to answer what I think you asked, try this:

    DECLARE @INPUT_DATA AS TABLE (

    ID INT NOT NULL,

    [Date] DATE NOT NULL

    );

    INSERT INTO @INPUT_DATA (ID, [Date]) VALUES

    (1, '2000-05-03'),

    (1, '2001-06-10'),

    (1, '2002-04-02'),

    (1, '2005-07-29'),

    (1, '2010-12-15'),

    (4, '2001-05-07'),

    (4, '1999-08-01'),

    (4, '2000-07-05'),

    (4, '2001-08-01'),

    (9, '2002-05-01'),

    (9, '2000-04-02');

    WITH TRANSFORMED AS (

    SELECT D.ID, YEAR(D.[Date]) AS THE_YEAR

    FROM @INPUT_DATA AS D

    )

    SELECT T1.ID, ISNULL(NULLIF(COUNT(DISTINCT T2.THE_YEAR), 0) + 1, 0) AS CONSECUTIVE_YEARS

    FROM TRANSFORMED AS T1

    LEFT OUTER JOIN TRANSFORMED AS T2

    ON T1.ID = T2.ID

    AND T1.THE_YEAR = T2.THE_YEAR - 1

    GROUP BY T1.ID

    Let us know what works, and if we can help eliminate the cursor.

    EDIT: Corrected the code when I saw your last post about ID 4 should have 3, ... if that's true, however, so should ID 1, unless I don't understand the rules for consecutive years. THAT really needs to be spelled out in exact detail.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • OLSONEJ (6/30/2015)


    Whoops, that is my fault. 4 should have three upon looking at it again. My apologies.

    Still, what do you want to do for the scenario in my previous post?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • To expand on Jeff's (and others) point... It would be helpful if you were to give some explanation as to what constitutes "consecutive" in this case.

    Letting us know which of the rows should be included, which rows should be excluded and why, would help us create rules that won't fall apart as soon as to apply the code to a different dataset... Because, at this point, we're all still just guessing.

  • Viewing 15 posts - 1 through 15 (of 46 total)

    You must be logged in to reply to this topic. Login to reply