Query help filling in dates and data

  • I have a data table and a calendar table. I need to fill in the gaps to have a row for each day. I need the dates to be filled in as well as to use the previous valid value for sequence and action. NULL is a valid action type. I've gotten close but the NULLS have been screwing with my results.

    I'm using SQL Server 2014. Your help is much appreciated.

    Data:

    +---------------------------------------------+

    | ID | Sequence| Date| Action| |

    +---------------------------------------------+

    | ---|---------|--------------------|-------| |

    | A | 1 | 4/6/2016 12:09:23 | yield| |

    | A | 2 | 4/6/2016 12:25:16 | stop| |

    | A | 3 | 4/12/2016 11:25:42| NULL| |

    | A | 4 | 4/18/2016 11:25:42| go| |

    | B | 1 | 2/17/2016 14:15:10| yield| |

    | B | 2 | 3/1/2016 7:56:37 | stop| |

    | B | 3 | 4/1/2016 9:24:46 | go| |

    | B | 4 | 5/4/2016 12:25:16 | exit| |

    +---------------------------------------------+

    Calendar:

    +---------+-------------------------+

    | Key | Date |

    +---------+-------------------------+

    | 2838 | 2016-04-06 00:00:00.000 |

    | 2839 | 2016-04-07 00:00:00.000 |

    | 2840 | 2016-04-08 00:00:00.000 |

    | 2841 | 2016-04-09 00:00:00.000 |

    | 2842 | 2016-04-10 00:00:00.000 |

    | 2843 | 2016-04-11 00:00:00.000 |

    | 2844 | 2016-04-12 00:00:00.000 |

    | 2845 | 2016-04-13 00:00:00.000 |

    | 2846 | 2016-04-14 00:00:00.000 |

    | 2847 | 2016-04-15 00:00:00.000 |

    | 2848 | 2016-04-16 00:00:00.000 |

    | 2849 | 2016-04-17 00:00:00.000 |

    | 2850 | 2016-04-18 00:00:00.000 |

    +---------+-------------------------+

    Desired results:

    ID | Sequence| Date| Action|

    ----|---------|--------------------|----------|

    A | 1| 4/6/2016 12:09:23| yield|

    A | 2| 4/6/2016 12:25:16| stop|

    A | 2| 4/7/2016 0:00:00| stop|

    A | 2| 4/8/2016 0:00:00| stop|

    A | 2| 4/9/2016 0:00:00| stop|

    A | 2| 4/10/2016 0:00:00| stop|

    A | 2| 4/11/2016 0:00:00| stop|

    A | 3| 4/12/2016 10:35:34| NULL|

    A | 3| 4/13/2016 0:00:00| NULL|

    A | 3| 4/14/2016 0:00:00| NULL|

    A | 3| 4/15/2016 0:00:00| NULL|

    A | 3| 4/16/2016 0:00:00| NULL|

    A | 3| 4/17/2016 0:00:00| NULL|

    A | 4| 4/18/2016 11:25:4| go|

    TSQL to get test tables up:

    IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Data')

    DROP TABLE Data;

    CREATE TABLE Data (ID varchar(20), SEQ INT, Date datetime, Action varchar(20));

    INSERT INTO data (ID, SEQ, Date, Action) VALUES

    ('A', 1, '4/5/2016 12:09:23','yield' ),

    ('A', 2, '4/7/2016 12:25:16','stop' ),

    ('A', 3, '4/12/2016 11:25:42',NULL ),

    ('A', 4, '4/18/2016 11:25:42','go' )

    IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Calendar')

    DROP TABLE Calendar;

    CREATE TABLE Calendar (DateID int, Date datetime);

    INSERT INTO Calendar (DateID, Date) VALUES

    ('2838','4/6/2016 0:00:00'),

    ('2839','4/7/2016 0:00:00' ),

    ('2840', '4/8/2016 0:00:00' ),

    ('2841', '4/9/2016 0:00:00' ),

    ('2842', '4/10/2016 0:00:00'),

    ('2843', '4/11/2016 0:00:00' ),

    ('2844', '4/12/2016 0:00:00' ),

    ('2845', '4/13/2016 0:00:00' ),

    ('2846', '4/14/2016 0:00:00' ),

    ('2847', '4/15/2016 0:00:00'),

    ('2848', '4/16/2016 0:00:00' ),

    ('2849', '4/17/2016 0:00:00' ),

    ('2850', '4/18/2016 0:00:00' )

  • Pretty sure you want NULL and not the string 'NULL'. Not quite seeing what the issue is you are having here though. It is probably right in front me I am just not seeing it.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Your DateKey is defined as INT. Don't provide your sample data as CHAR, because it forces an implicit conversion.

    You say that your data contains NULLs which are causing you problems, but your sample data contains 'NULL' rather than NULL. Which does your data actually contain?

    Your calendar table start with 2016-04-06 and ends with 2016-04-18, but your data starts with 2016-04-05 12:09:23 and ends with 2016-04-18 11:25:42.

    You should provide dates in locale neutral formats (YYYY-MM-DD).

    ;

    WITH data_cte AS (

    SELECT *, LEAD(d.Date, 1, '9999-12-30') OVER(PARTITION BY d.ID ORDER BY d.Date) AS next_dt

    FROM Data d

    )

    SELECT d.ID, d.Seq, c.Date, d.Action -- I used c.Date instead of d.Date

    FROM data_cte d

    INNER JOIN Calendar c

    ON CAST(d.Date AS DATE) <= c.Date

    AND d.next_dt > c.Date

    I used LEAD to find the next date specified and used 9999-12-30 to represent +infinity.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • This is based on an old(ish) Itzik Ben-Gan solution... The Last non NULL Puzzle

    IF OBJECT_ID('tempdb..#Data', 'U') IS NOT NULL

    DROP TABLE #Data;

    CREATE TABLE #Data (

    ID VARCHAR(20),

    SEQ INT,

    Date DATETIME,

    Action VARCHAR(20)

    );

    INSERT INTO #Data (ID, SEQ, Date, Action) VALUES

    ('A', 1, '4/5/2016 12:09:23','yield' ),

    ('A', 2, '4/7/2016 12:25:16','stop' ),

    ('A', 3, '4/12/2016 11:25:42',NULL ), -- made actually null.

    ('A', 4, '4/18/2016 11:25:42','go' );

    IF OBJECT_ID('tempdb..#Calendar', 'U') IS NOT NULL

    DROP TABLE #Calendar;

    CREATE TABLE #Calendar (DateKey int, Date datetime);

    INSERT INTO #Calendar (DateKey, Date) VALUES

    ('2838','4/6/2016 0:00:00'),

    ('2839','4/7/2016 0:00:00' ),

    ('2840', '4/8/2016 0:00:00' ),

    ('2841', '4/9/2016 0:00:00' ),

    ('2842', '4/10/2016 0:00:00'),

    ('2843', '4/11/2016 0:00:00' ),

    ('2844', '4/12/2016 0:00:00' ),

    ('2845', '4/13/2016 0:00:00' ),

    ('2846', '4/14/2016 0:00:00' ),

    ('2847', '4/15/2016 0:00:00'),

    ('2848', '4/16/2016 0:00:00' ),

    ('2849', '4/17/2016 0:00:00' ),

    ('2850', '4/18/2016 0:00:00' );

    --=============================================

    WITH

    cte_AddCalendar AS (

    SELECT

    d.ID,

    d.SEQ,

    Date = ISNULL(d.Date, c.Date),

    Action = CASE WHEN d.ID IS NOT NULL AND d.Action IS NULL THEN 'XXX' ELSE d.Action end

    FROM

    #Data d

    FULL JOIN #Calendar c

    ON CAST(d.Date AS DATE) = c.Date

    AND d.ID = 'A'

    )

    SELECT

    ac.ID,

    ac.SEQ,

    ac.Date,

    Action = NULLIF(ac.Action, 'XXX'),

    SmearedValue = CAST(

    NULLIF(

    SUBSTRING(

    MAX(CAST(ac.Date AS BINARY(8)) + CAST(ac.Action AS BINARY(20))) OVER (ORDER BY ac.Date ROWS UNBOUNDED PRECEDING)

    , 9, 10)

    , 0x5858580000000000000000000000000000000000) -- CAST('XXX' AS BINARY(20))

    AS VARCHAR(20)

    )

    FROM

    cte_AddCalendar ac;

    Results...

    ID SEQ Date Action SmearedValue

    -------------------- ----------- ----------------------- -------------------- ------------

    A 1 2016-04-05 12:09:23.000 yield yield

    NULL NULL 2016-04-06 00:00:00.000 NULL yield

    A 2 2016-04-07 12:25:16.000 stop stop

    NULL NULL 2016-04-08 00:00:00.000 NULL stop

    NULL NULL 2016-04-09 00:00:00.000 NULL stop

    NULL NULL 2016-04-10 00:00:00.000 NULL stop

    NULL NULL 2016-04-11 00:00:00.000 NULL stop

    A 3 2016-04-12 11:25:42.000 NULL NULL

    NULL NULL 2016-04-13 00:00:00.000 NULL NULL

    NULL NULL 2016-04-14 00:00:00.000 NULL NULL

    NULL NULL 2016-04-15 00:00:00.000 NULL NULL

    NULL NULL 2016-04-16 00:00:00.000 NULL NULL

    NULL NULL 2016-04-17 00:00:00.000 NULL NULL

    A 4 2016-04-18 11:25:42.000 go go

  • Thanks for the quick reply Drew. I've updated the post. How would I go about it with multiple actions in the same day?

  • Just another way to tackle it, using CROSS APPLY:

    IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Data')

    DROP TABLE Data;

    CREATE TABLE Data (ID varchar(20), SEQ INT, Date datetime, Action varchar(20));

    INSERT INTO Data (ID, SEQ, [Date], [Action])

    VALUES('A', 1, '4/5/2016 12:09:23','yield'),

    ('A', 2, '4/7/2016 12:25:16','stop'),

    ('A', 3, '4/12/2016 11:25:42','NULL'),

    ('A', 4, '4/18/2016 11:25:42','go');

    IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Calendar')

    DROP TABLE Calendar;

    CREATE TABLE Calendar (DateKey int, [Date] datetime);

    INSERT INTO Calendar (DateKey, [Date])

    VALUES(2837, '04/05/2016'),

    (2838, '04/06/2016'),

    (2839, '04/07/2016'),

    (2840, '04/08/2016'),

    (2841, '04/09/2016'),

    (2842, '04/10/2016'),

    (2843, '04/11/2016'),

    (2844, '04/12/2016'),

    (2845, '04/13/2016'),

    (2846, '04/14/2016'),

    (2847, '04/15/2016'),

    (2848, '04/16/2016'),

    (2849, '04/17/2016'),

    (2850, '04/18/2016');

    WITH LEAD_DATA AS (

    SELECT D.*,

    CAST(D.[Date] AS date) AS DATE_ONLY,

    CASE

    WHEN CAST(LEAD(D.[Date], 1, '9999-12-31') OVER(ORDER BY D.[Date]) AS date) = '9999-12-31' THEN '9999-12-31'

    ELSE DATEADD(day, -1, CAST(LEAD(D.[Date], 1, '9999-12-31') OVER(ORDER BY D.[Date]) AS date))

    END AS LEAD_DATE,

    ROW_NUMBER() OVER(ORDER BY D.[Date]) AS RN

    FROM Data AS D

    )

    SELECT D.ID,

    D.SEQ,

    CASE

    WHEN C.[Date] = D.DATE_ONLY THEN D.[Date]

    ELSE C.[Date]

    END AS [Date],

    D.[Action]

    FROM LEAD_DATA AS D

    CROSS APPLY (

    SELECT *

    FROM Calendar AS CI

    WHERE CI.[Date] >= D.DATE_ONLY

    AND CI.[Date] <= D.LEAD_DATE

    ) AS C

    ORDER BY D.RN, C.[Date]

    DROP TABLE Calendar;

    DROP TABLE Data;

    I haven't looked at performance on this, so you'll want to test it for that.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Jason A. Long (12/20/2016)


    This is based on an old(ish) Itzik Ben-Gan solution... The Last non NULL Puzzle

    IF OBJECT_ID('tempdb..#Data', 'U') IS NOT NULL

    DROP TABLE #Data;

    CREATE TABLE #Data (

    ID VARCHAR(20),

    SEQ INT,

    Date DATETIME,

    Action VARCHAR(20)

    );

    INSERT INTO #Data (ID, SEQ, Date, Action) VALUES

    ('A', 1, '4/5/2016 12:09:23','yield' ),

    ('A', 2, '4/7/2016 12:25:16','stop' ),

    ('A', 3, '4/12/2016 11:25:42',NULL ), -- made actually null.

    ('A', 4, '4/18/2016 11:25:42','go' );

    IF OBJECT_ID('tempdb..#Calendar', 'U') IS NOT NULL

    DROP TABLE #Calendar;

    CREATE TABLE #Calendar (DateKey int, Date datetime);

    INSERT INTO #Calendar (DateKey, Date) VALUES

    ('2838','4/6/2016 0:00:00'),

    ('2839','4/7/2016 0:00:00' ),

    ('2840', '4/8/2016 0:00:00' ),

    ('2841', '4/9/2016 0:00:00' ),

    ('2842', '4/10/2016 0:00:00'),

    ('2843', '4/11/2016 0:00:00' ),

    ('2844', '4/12/2016 0:00:00' ),

    ('2845', '4/13/2016 0:00:00' ),

    ('2846', '4/14/2016 0:00:00' ),

    ('2847', '4/15/2016 0:00:00'),

    ('2848', '4/16/2016 0:00:00' ),

    ('2849', '4/17/2016 0:00:00' ),

    ('2850', '4/18/2016 0:00:00' );

    --=============================================

    WITH

    cte_AddCalendar AS (

    SELECT

    d.ID,

    d.SEQ,

    Date = ISNULL(d.Date, c.Date),

    Action = CASE WHEN d.ID IS NOT NULL AND d.Action IS NULL THEN 'XXX' ELSE d.Action end

    FROM

    #Data d

    FULL JOIN #Calendar c

    ON CAST(d.Date AS DATE) = c.Date

    AND d.ID = 'A'

    )

    SELECT

    ac.ID,

    ac.SEQ,

    ac.Date,

    Action = NULLIF(ac.Action, 'XXX'),

    SmearedValue = CAST(

    NULLIF(

    SUBSTRING(

    MAX(CAST(ac.Date AS BINARY(8)) + CAST(ac.Action AS BINARY(20))) OVER (ORDER BY ac.Date ROWS UNBOUNDED PRECEDING)

    , 9, 10)

    , 0x5858580000000000000000000000000000000000) -- CAST('XXX' AS BINARY(20))

    AS VARCHAR(20)

    )

    FROM

    cte_AddCalendar ac;

    Results...

    ID SEQ Date Action SmearedValue

    -------------------- ----------- ----------------------- -------------------- ------------

    A 1 2016-04-05 12:09:23.000 yield yield

    NULL NULL 2016-04-06 00:00:00.000 NULL yield

    A 2 2016-04-07 12:25:16.000 stop stop

    NULL NULL 2016-04-08 00:00:00.000 NULL stop

    NULL NULL 2016-04-09 00:00:00.000 NULL stop

    NULL NULL 2016-04-10 00:00:00.000 NULL stop

    NULL NULL 2016-04-11 00:00:00.000 NULL stop

    A 3 2016-04-12 11:25:42.000 NULL NULL

    NULL NULL 2016-04-13 00:00:00.000 NULL NULL

    NULL NULL 2016-04-14 00:00:00.000 NULL NULL

    NULL NULL 2016-04-15 00:00:00.000 NULL NULL

    NULL NULL 2016-04-16 00:00:00.000 NULL NULL

    NULL NULL 2016-04-17 00:00:00.000 NULL NULL

    A 4 2016-04-18 11:25:42.000 go go

    I started using a similar approach, but the ID isn't smeared and you would run into problems when there is more than one ID if you try to smear the ID as well.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • sgmunson (12/20/2016)


    Just another way to tackle it, using CROSS APPLY:

    -- code snipped to pinpoint this section

    SELECT D.*,

    CAST(D.[Date] AS date) AS DATE_ONLY,

    CASE

    WHEN CAST(LEAD(D.[Date], 1, '9999-12-31') OVER(ORDER BY D.[Date]) AS date) = '9999-12-31' THEN '9999-12-31'

    ELSE DATEADD(day, -1, CAST(LEAD(D.[Date], 1, '9999-12-31') OVER(ORDER BY D.[Date]) AS date))

    END AS LEAD_DATE,

    ROW_NUMBER() OVER(ORDER BY D.[Date]) AS RN

    FROM Data AS D

    You don't need that complicated CASE expression.

    SELECT D.*,

    CAST(D.[Date] AS date) AS DATE_ONLY,

    -- Replace the CASE here

    LEAD(DATEADD(DAY, -1, CAST(D.[Date] AS DATE)), 1, '9999-12-31') OVER(ORDER BY D.[Date]) AS LEAD_DATE,

    ROW_NUMBER() OVER(ORDER BY D.[Date]) AS RN

    FROM #DATA AS D

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (12/20/2016)


    I started using a similar approach, but the ID isn't smeared and you would run into problems when there is more than one ID if you try to smear the ID as well.

    Drew

    All valid points... All solvable with a slight modification though.

    This should do the trick...

    IF OBJECT_ID('tempdb..#Data', 'U') IS NOT NULL

    DROP TABLE #Data;

    CREATE TABLE #Data (

    ID VARCHAR(20),

    SEQ INT,

    Date DATETIME,

    Action VARCHAR(20)

    );

    INSERT INTO #Data (ID, SEQ, Date, Action) VALUES

    ('A', 1, '4/5/2016 12:09:23','yield' ),

    ('A', 2, '4/7/2016 12:25:16','stop' ),

    ('A', 3, '4/12/2016 11:25:42',NULL ), -- made actually null.

    ('A', 4, '4/18/2016 11:25:42','go' ),

    ('B', 1, '4/5/2016 12:09:23','yield' ),

    ('B', 2, '4/7/2016 12:25:16','stop' ),

    ('B', 3, '4/12/2016 11:25:42',NULL ), -- made actually null.

    ('B', 4, '4/18/2016 11:25:42','go' );

    IF OBJECT_ID('tempdb..#Calendar', 'U') IS NOT NULL

    DROP TABLE #Calendar;

    CREATE TABLE #Calendar (DateKey int, Date datetime);

    INSERT INTO #Calendar (DateKey, Date) VALUES

    ('1234','4/5/2016 0:00:00'),

    ('2838','4/6/2016 0:00:00'),

    ('2839','4/7/2016 0:00:00' ),

    ('2840', '4/8/2016 0:00:00' ),

    ('2841', '4/9/2016 0:00:00' ),

    ('2842', '4/10/2016 0:00:00'),

    ('2843', '4/11/2016 0:00:00' ),

    ('2844', '4/12/2016 0:00:00' ),

    ('2845', '4/13/2016 0:00:00' ),

    ('2846', '4/14/2016 0:00:00' ),

    ('2847', '4/15/2016 0:00:00'),

    ('2848', '4/16/2016 0:00:00' ),

    ('2849', '4/17/2016 0:00:00' ),

    ('2850', '4/18/2016 0:00:00' );

    --=============================================

    WITH

    cte_DistinctID AS ( SELECT DISTINCT d.ID FROM #Data d ),

    cte_ApplyCalendarToID AS (

    SELECT

    di.ID,

    c.Date

    FROM

    cte_DistinctID di

    CROSS JOIN #Calendar c

    ),

    cte_AddCalendar AS (

    SELECT

    aci.ID,

    d.SEQ,

    Date = ISNULL(d.Date, aci.Date),

    Action = CASE WHEN d.ID IS NOT NULL AND d.Action IS NULL THEN 'XXX' ELSE d.Action end

    FROM

    #Data d

    FULL JOIN cte_ApplyCalendarToID aci

    ON d.ID = aci.ID

    AND CAST(d.Date AS DATE) = aci.Date

    )

    SELECT

    ac.ID,

    ac.SEQ,

    ac.Date,

    Action = NULLIF(ac.Action, 'XXX'),

    SmearedValue = CAST(

    NULLIF(

    SUBSTRING(

    MAX(CAST(ac.Date AS BINARY(8)) + CAST(ac.Action AS BINARY(20))) OVER (PARTITION BY ac.ID ORDER BY ac.Date ROWS UNBOUNDED PRECEDING)

    , 9, 10)

    , 0x5858580000000000000000000000000000000000) -- CAST('XXX' AS BINARY(20))

    AS VARCHAR(20)

    )

    FROM

    cte_AddCalendar ac;

    Results...

    ID SEQ Date Action SmearedValue

    -------------------- ----------- ----------------------- -------------------- --------------------

    A 1 2016-04-05 12:09:23.000 yield yield

    A NULL 2016-04-06 00:00:00.000 NULL yield

    A 2 2016-04-07 12:25:16.000 stop stop

    A NULL 2016-04-08 00:00:00.000 NULL stop

    A NULL 2016-04-09 00:00:00.000 NULL stop

    A NULL 2016-04-10 00:00:00.000 NULL stop

    A NULL 2016-04-11 00:00:00.000 NULL stop

    A 3 2016-04-12 11:25:42.000 NULL NULL

    A NULL 2016-04-13 00:00:00.000 NULL NULL

    A NULL 2016-04-14 00:00:00.000 NULL NULL

    A NULL 2016-04-15 00:00:00.000 NULL NULL

    A NULL 2016-04-16 00:00:00.000 NULL NULL

    A NULL 2016-04-17 00:00:00.000 NULL NULL

    A 4 2016-04-18 11:25:42.000 go go

    B 1 2016-04-05 12:09:23.000 yield yield

    B NULL 2016-04-06 00:00:00.000 NULL yield

    B 2 2016-04-07 12:25:16.000 stop stop

    B NULL 2016-04-08 00:00:00.000 NULL stop

    B NULL 2016-04-09 00:00:00.000 NULL stop

    B NULL 2016-04-10 00:00:00.000 NULL stop

    B NULL 2016-04-11 00:00:00.000 NULL stop

    B 3 2016-04-12 11:25:42.000 NULL NULL

    B NULL 2016-04-13 00:00:00.000 NULL NULL

    B NULL 2016-04-14 00:00:00.000 NULL NULL

    B NULL 2016-04-15 00:00:00.000 NULL NULL

    B NULL 2016-04-16 00:00:00.000 NULL NULL

    B NULL 2016-04-17 00:00:00.000 NULL NULL

    B 4 2016-04-18 11:25:42.000 go go

  • DK13 (12/20/2016)


    NULL is a valid action type.

    I'll save Celko the effort here. NULL is not a valid action type. It is messing up your results because you are assigning meaning to it.

    Implying meaning with null is a slippery slope and it will affect your data quality and performance later. If you're intending it to mean nothing happened, then use 'None' instead. Null means you don't know what happened or even if anything happened.

    Improper Null handling is a big source of bugs and will make your job (or those after you) harder. As an example of that, if you didn't use Null as an action type, you likely would not have needed to ask your question here.

    Save yourself while there is still time, replace the null with a meaningful value.

    Wes
    (A solid design is always preferable to a creative workaround)

  • Okay folks, having realized that my query was only going to handle one ID value, I began to look a little deeper into this. I won't necessarily worry about whenriksen's point about NULL values, as there's no guarantee that assigning a meaning to NULL is automatically a bad thing. It's not necessarily best, but not always bad either. In any case, having set myself on the path to solving for multiple ID values, I took the poster's original data and made a slight modification of my query to deal with it. Take a look and feel free to critique:

    IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Data')

    DROP TABLE Data;

    CREATE TABLE dbo.Data (ID varchar(20), SEQ INT, [Date] datetime, [Action] varchar(20));

    INSERT INTO Data (ID, SEQ, [Date], [Action])

    VALUES('A', 1, '4/5/2016 12:09:23','yield'),

    ('A', 2, '4/7/2016 12:25:16','stop'),

    ('A', 3, '4/12/2016 11:25:42',NULL),

    ('A', 4, '4/18/2016 11:25:42','go'),

    ('B', 1, '2/17/2016 14:15:10','yield'),

    ('B', 2, '3/1/2016 07:56:37', 'stop'),

    ('B', 3, '4/1/2016 09:24:46', 'go'),

    ('B', 4, '5/4/2016 12:25:16', 'exit');

    WITH DATA_MAX AS (

    SELECT D.ID, MIN(CAST(D.[Date] AS date)) AS MIN_DATE, MAX(CAST(D.[Date] AS date)) AS MAX_DATE

    FROM dbo.Data AS D

    GROUP BY D.ID

    ),

    TALLY AS (

    SELECT TOP (1000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1 AS RN

    FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS X(N),

    (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS Y(N),

    (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS Z(N)

    ),

    DATE_RANGES AS (

    SELECT TOP 100 PERCENT DM.ID, DATEADD(day, T.RN, DM.MIN_DATE) AS THE_DATE

    FROM DATA_MAX AS DM

    CROSS APPLY TALLY AS T

    WHERE DATEADD(day, T.RN, DM.MIN_DATE) <= DM.MAX_DATE

    ORDER BY DM.ID, THE_DATE

    ),

    LEAD_DATA AS (

    SELECT D.*,

    CAST(D.[Date] AS date) AS DATE_ONLY,

    CASE

    WHEN CAST(D.[Date] AS date) = DM.MAX_DATE THEN DM.MAX_DATE

    ELSE DATEADD(day, -1, CAST(LEAD(D.[Date], 1, DM.MAX_DATE) OVER(PARTITION BY D.ID ORDER BY D.[Date]) AS date))

    END AS LEAD_DATE,

    ROW_NUMBER() OVER(PARTITION BY D.ID ORDER BY D.[Date]) AS RN

    FROM Data AS D

    INNER JOIN DATA_MAX AS DM

    ON D.ID = DM.ID

    )

    SELECT LD.ID,

    LD.SEQ,

    CASE

    WHEN CAST(LD.[Date] AS date) = DR.THE_DATE THEN LD.[Date]

    ELSE DR.THE_DATE

    END AS [Date],

    LD.[Action]

    FROM LEAD_DATA AS LD

    CROSS APPLY DATE_RANGES AS DR

    WHERE LD.ID = DR.ID

    AND DR.THE_DATE BETWEEN LD.DATE_ONLY AND LD.LEAD_DATE

    ORDER BY LD.ID, LD.SEQ, DR.THE_DATE;

    DROP TABLE Data;

    EDIT: TOP (100) changed to TOP (1000).

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • And now that I've seen the post about needing to deal with the possibility of more than one action in a given day, I've decided to tackle that one. Here's the code:

    IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Data')

    DROP TABLE Data;

    CREATE TABLE dbo.Data (ID varchar(20), SEQ INT, [Date] datetime, [Action] varchar(20));

    INSERT INTO Data (ID, SEQ, [Date], [Action])

    VALUES('A', 1, '4/5/2016 12:09:23', 'yield'),

    ('A', 2, '4/7/2016 12:25:16', 'stop'),

    ('A', 3, '4/12/2016 11:25:42', NULL),

    ('A', 4, '4/18/2016 11:25:42', 'go'),

    ('B', 1, '2/17/2016 14:15:10', 'yield'),

    ('B', 2, '3/1/2016 07:56:37', 'stop'),

    ('B', 3, '4/1/2016 09:24:46', 'go'),

    ('B', 4, '5/4/2016 12:25:16', 'exit'),

    ('C', 1, '3/1/2016 04:15:10', 'yield'),

    ('C', 2, '3/1/2016 07:56:37', 'stop'),

    ('C', 3, '4/1/2016 09:24:46', 'go'),

    ('C', 4, '4/1/2016 19:22:05', 'walk'),

    ('C', 5, '5/4/2016 12:25:16', 'exit');

    WITH DATA_MAX AS (

    SELECT D.ID, MIN(CAST(D.[Date] AS date)) AS MIN_DATE, MAX(CAST(D.[Date] AS date)) AS MAX_DATE

    FROM dbo.Data AS D

    GROUP BY D.ID

    ),

    TALLY AS (

    SELECT TOP (1000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1 AS RN

    FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS X(N),

    (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS Y(N),

    (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS Z(N)

    ),

    DATE_RANGES AS (

    SELECT TOP 100 PERCENT DM.ID, DATEADD(day, T.RN, DM.MIN_DATE) AS THE_DATE

    FROM DATA_MAX AS DM

    CROSS APPLY TALLY AS T

    WHERE DATEADD(day, T.RN, DM.MIN_DATE) <= DM.MAX_DATE

    ORDER BY DM.ID, THE_DATE

    ),

    LEAD_DATA AS (

    SELECT D.*,

    CAST(D.[Date] AS date) AS DATE_ONLY,

    CASE

    WHEN CAST(D.[Date] AS date) = DM.MIN_DATE THEN DM.MIN_DATE

    WHEN CAST(D.[Date] AS date) = CAST(LEAD(D.[Date], 1, DM.MAX_DATE) OVER(PARTITION BY D.ID ORDER BY D.[Date]) AS date) THEN CAST(D.[Date] AS date)

    WHEN CAST(D.[Date] AS date) = DM.MAX_DATE THEN DM.MAX_DATE

    ELSE DATEADD(day, -1, CAST(LEAD(D.[Date], 1, DM.MAX_DATE) OVER(PARTITION BY D.ID ORDER BY D.[Date]) AS date))

    END AS LEAD_DATE,

    ROW_NUMBER() OVER(PARTITION BY D.ID ORDER BY D.[Date]) AS RN

    FROM Data AS D

    INNER JOIN DATA_MAX AS DM

    ON D.ID = DM.ID

    )

    SELECT LD.ID,

    LD.SEQ,

    CASE

    WHEN CAST(LD.[Date] AS date) = DR.THE_DATE THEN LD.[Date]

    ELSE DR.THE_DATE

    END AS [Date],

    LD.[Action]

    FROM LEAD_DATA AS LD

    CROSS APPLY DATE_RANGES AS DR

    WHERE LD.ID = DR.ID

    AND DR.THE_DATE BETWEEN LD.DATE_ONLY AND LD.LEAD_DATE

    ORDER BY LD.ID, LD.SEQ, DR.THE_DATE;

    DROP TABLE Data;

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • whenriksen (12/21/2016)


    DK13 (12/20/2016)


    NULL is a valid action type.

    I'll save Celko the effort here. NULL is not a valid action type. It is messing up your results because you are assigning meaning to it.

    Implying meaning with null is a slippery slope and it will affect your data quality and performance later. If you're intending it to mean nothing happened, then use 'None' instead. Null means you don't know what happened or even if anything happened.

    Improper Null handling is a big source of bugs and will make your job (or those after you) harder. As an example of that, if you didn't use Null as an action type, you likely would not have needed to ask your question here.

    Save yourself while there is still time, replace the null with a meaningful value.

    Just to follow up on this, relational theory actually specifies two different types of missing data, unknown but applicable and unknown and inapplicable. Unfortunately, SQL Server only implements one, so when you need to distinguish between the two, the standard practice is to use an "impossible" value for unknown but applicable. That is essentially what whenriksen is suggesting here.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thank you everyone! The scope of the responses were a bit above my experience but I managed to use them to get a working solution. And needless to say I learned a lot here :satisfied:

  • DK13 (12/22/2016)


    Thank you everyone! The scope of the responses were a bit above my experience but I managed to use them to get a working solution. And needless to say I learned a lot here :satisfied:

    Glad to help out. It's a good idea to look over the various responses and mark those that contributed to your solution as "This worked". It helps the rest of us know how we did. You can designate more than one post.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply