Convert consecutive rows with a date to date range

  • OK this is more of an academic question but having looked all over I haven't been able to find a solution to a seemingly easy problem. I have a table that has a date field as the primary key and then some data field eg:

    Keydate ValueField

    -------------------------

    2012-01-01 A

    2012-08-14 B

    2013-04-12 C

    2013-12-05 D

    I wonder if it is possible without cursors, using just a SQL statement to create a result set like this:

    DateFrom DateTo Valuefield

    ----------------------------------------

    2012-01-01 2012-08-14 A

    2012-08-14 2013-04-12 B

    2013-04-12 2013-12-05 C

    2013-12-05 (current date) D

    essentially each row uses its key value as a DateFrom value and then gets the DateTo value from the next row in order. Since the magical word is order, I have experimented with ROW_NUMBER creating serial number for the rows using the ordering of the dates but it came to a whole lot of nulls. I got *almost* there but I could not get the last record to display the current date using GETDATE().

    Then I tried to use OVER and PARTITION but I have yet a lot to learn in order to use them and had no luck there either.

    I am not necessarily looking for a full and complete answer (although that would be welcome). Any comment to point me to a possible solution would be appreciated.

  • Use ROW_NUMBER, and join on a.RowNo = b.RowNo+1. Use COALESCE or ISNULL to deal with the NULL in the final row.

    John

  • d viz (2/27/2014)


    OK this is more of an academic question but having looked all over I haven't been able to find a solution to a seemingly easy problem. I have a table that has a date field as the primary key and then some data field eg:

    Keydate ValueField

    -------------------------

    2012-01-01 A

    2012-08-14 B

    2013-04-12 C

    2013-12-05 D

    I wonder if it is possible without cursors, using just a SQL statement to create a result set like this:

    DateFrom DateTo Valuefield

    ----------------------------------------

    2012-01-01 2012-08-14 A

    2012-08-14 2013-04-12 B

    2013-04-12 2013-12-05 C

    2013-12-05 (current date) D

    essentially each row uses its key value as a DateFrom value and then gets the DateTo value from the next row in order. Since the magical word is order, I have experimented with ROW_NUMBER creating serial number for the rows using the ordering of the dates but it came to a whole lot of nulls. I got *almost* there but I could not get the last record to display the current date using GETDATE().

    Then I tried to use OVER and PARTITION but I have yet a lot to learn in order to use them and had no luck there either.

    I am not necessarily looking for a full and complete answer (although that would be welcome). Any comment to point me to a possible solution would be appreciated.

    Just a hint... you'll get coded answers if you post your data as readily consumable data in the future. Please see the first link under "Helpful Links" in my signature line below for the best way to do such a thing.

    I'll do it for you this first time. This is slightly different than what is done in the article but the table is simple so I took a shortcut.

    --===== One way to post readily consumable test data

    SELECT KeyDate = CAST(d.KeyDate AS DATETIME)

    ,d.ValueField

    INTO #TestTable

    FROM (

    SELECT '2012-01-01','A' UNION ALL

    SELECT '2012-08-14','B' UNION ALL

    SELECT '2013-04-12','C' UNION ALL

    SELECT '2013-12-05','D'

    ) d (KeyDate,ValueField)

    ;

    Here's the solution to your problem like John Mitchell stated above. I'm assuming that you wanted a coded answer because you explicitly stated that you "tried to use OVER and PARTITION but I have yet a lot to learn in order to use them and had no luck there either". That's what happens when you don't post readily consumable data... people will, many times, just make a suggestion rather than writing code that demonstrates how to solve the problem.

    WITH

    cteEnumerateRowsByKeyDate AS

    ( --=== Number the rows in order by date

    SELECT RowNum = ROW_NUMBER()OVER(ORDER BY KeyDate)

    ,KeyDate

    ,ValueField

    FROM #TestTable

    ) --=== Do a self join with a "1 offset" to get the date

    -- from the next row.

    SELECT DateFrom = lo.KeyDate

    ,DateTo = ISNULL(hi.KeyDate,GETDATE())

    ,lo.ValueField

    FROM cteEnumerateRowsByKeyDate lo

    LEFT JOIN cteEnumerateRowsByKeyDate hi

    ON lo.RowNum+1 = hi.RowNum

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • thank you very much...

    with a minor change to get the order I wanted, it worked perfectly:

    SELECT a.keyDate AS DateFrom

    ,ISNULL(b.keyDate ,GETDATE()) AS DateTo

    ,a.ValueField

    ,a.RowNo

    FROM [SomeTable] a

    LEFT OUTER JOIN [SomeTable] b

    ON a.RowNo = b.RowNo -1

    ORDER BY

    a.keyDate

  • Jeff thank you for the advice and the solution. Using John's suggestion I managed to come up with the code that works doing a little brainwork myself.

    I will keep in mind though for any future posts to post readily consumable data because I admit, I never read the forum etiquette... sorry!

  • d viz (2/27/2014)


    thank you very much...

    with a minor change to get the order I wanted, it worked perfectly:

    SELECT a.keyDate AS DateFrom

    ,ISNULL(b.keyDate ,GETDATE()) AS DateTo

    ,a.ValueField

    ,a.RowNo

    FROM [SomeTable] a

    LEFT OUTER JOIN [SomeTable] b

    ON a.RowNo = b.RowNo -1

    ORDER BY

    a.keyDate

    In that code, is it partial code or do you have a column call "RowNo" in the table? I ask because if it's built into the table even as an IDENTITY column, there is no guarantee that it won't have gaps which will cause "silent" failures of the code above in the form of missing data in the result set.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, I am aware of the gaps that may present themselves when using identity columns. In this case though, RowNo is a serial number kept in another table that manages number ranges so that no gaps can exist.

    But the curiosity's sake, I also tested this by creating a view which assigns serial number to each using ROW_NUMBER and then querying this view instead of the table directly and it also worked.

    Also, since this is a table that holds very few records (less than 50), I didn't consider any performance implications...

  • d viz (2/27/2014)


    Yes, I am aware of the gaps that may present themselves when using identity columns. In this case though, RowNo is a serial number kept in another table that manages number ranges so that no gaps can exist.

    But the curiosity's sake, I also tested this by creating a view which assigns serial number to each using ROW_NUMBER and then querying this view instead of the table directly and it also worked.

    Also, since this is a table that holds very few records (less than 50), I didn't consider any performance implications...

    Sounds like you're all set.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi All,

    Maybe I'm too late to the party but I can add a bit of value here.

    I recently did some testing of this in SQL 2012 (where there is a faster solution). However there is an option that's available in SQL 2005 that is faster than the JOIN on ROW_NUMBER() if you have a INDEX that includes the date.

    --===== One way to post readily consumable test data

    SELECT KeyDate = ISNULL(CAST(d.KeyDate AS DATETIME), 0)

    ,d.ValueField

    INTO #TestTable

    FROM (

    SELECT '2012-01-01','A' UNION ALL

    SELECT '2012-08-14','B' UNION ALL

    SELECT '2013-04-12','C' UNION ALL

    SELECT '2013-12-05','D'

    ) d (KeyDate,ValueField)

    ;

    GO

    ALTER TABLE #TestTable ADD PRIMARY KEY(KeyDate);

    GO

    SELECT FromDate=KeyDate, ToDate=ISNULL(EffectiveEndDT, GETDATE())

    ,ValueField

    FROM #TestTable a

    OUTER APPLY

    (

    SELECT TOP 1 KeyDate

    FROM #TestTable b

    WHERE b.KeyDate > a.KeyDate

    ORDER BY a.KeyDate

    ) b (EffectiveEndDT)

    ORDER BY KeyDate;

    GO

    DROP TABLE #TestTable;

    I've got an article coming out on SSC (accepted but not yet scheduled) that compares four methods (Jeff's self-join, correlated sub-query, LEAD and OA) and this OUTER APPLY (OA) with either the date in the clustered index or in a non-clustered index seemed to be slightly faster than the other SQL 2005 alternatives. It was nearly as fast as LEAD (the overall winner) in the non-clustered index scenario.

    I make no guarantees though that it is faster in SQL 2005 as I can't test it there.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply