Query to fill in missing dates in a series of rows

  • I have a table with a series of events for a patient. Say they were admitted on day 1, transferred on day 3, then discharged on day 6. I would only have 3 rows, I need to create a resultset with 6 rows - filling in the blanks with a new day, some dummy event ("Stay" or something) and the last location they were in that day.

    So my table would look like:

    person_id event_type event_date location_id

    1Admit2008-12-05 12:00:00.0001

    1Transfer2008-12-05 00:00:00.0002

    1Discharge2008-12-07 00:00:00.0002

    and I need

    1Admit2008-12-05 12:00:00.0001

    1Transfer2008-12-05 00:00:00.0002

    1 Stay 2008-12-06 00:00:00 2

    1Discharge2008-12-07 00:00:00.0002

    My table would look something like

    CREATE TABLE [dbo].[adt](

    [person_id] [int] NULL,

    [event_type] [varchar](10) NULL,

    [event_date] [datetime] NULL,

    [location_id] [int] NULL

    )

    Here are some test data

    INSERT INTO [adt] ([person_id],[event_type],[event_date],[location_id])VALUES(1,'Admit','Dec 5 2008 12:00PM',1)

    INSERT INTO [adt] ([person_id],[event_type],[event_date],[location_id])VALUES(1,'Transfer','Dec 5 2008 12:00AM',2)

    INSERT INTO [adt] ([person_id],[event_type],[event_date],[location_id])VALUES(1,'Discharge','Dec 7 2008 12:00AM',2)

    INSERT INTO [adt] ([person_id],[event_type],[event_date],[location_id])VALUES(2,'Admit','Dec 7 2008 12:00AM',2)

    INSERT INTO [adt] ([person_id],[event_type],[event_date],[location_id])VALUES(2,'Discharge','Dec 10 2008 12:00AM',2)

    INSERT INTO [adt] ([person_id],[event_type],[event_date],[location_id])VALUES(3,'Admit','Dec 8 2008 12:00AM',1)

    INSERT INTO [adt] ([person_id],[event_type],[event_date],[location_id])VALUES(3,'Transfer','Dec 8 2008 12:00AM',2)

    INSERT INTO [adt] ([person_id],[event_type],[event_date],[location_id])VALUES(3,'Transfer','Dec 10 2008 12:00AM',3)

  • Don't have time at the moment to work on this, have a few meetings to attend this morning, but I'd like you to do a search on SSC for the Tally Table. This is a good place to start on your particular requirement.

  • A similar question is answered here.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Yes, but those are all simple examples, throw in a couple extra fields to group your data upon - like my example.

  • Okay, so based on those simple examples, what have you come up with and what problems have you encountered?

  • the location_id needs to maintain the previous value - not the min/max or a cartesian product. And I need each day for each person between their admit and discharge.

    running something like

    SELECT n.[Date] AS Date, b.person_id,Coalesce(b.event_type, 'Stay') AS Event, b.location_id, b.adt_id

    FROM dbo.MyCalendarTally('12/1/2008','12/30/2008') n

    LEFT JOIN adt b ON b.event_Date = n.[Date]

    LEFT JOIN adt x ON x.event_Date = (SELECT MAX(event_Date) AS eventDate FROM adt WHERE event_Date <= n.[Date])

    order by n.date, b.person_id

    maintains neither. I'm wondering if I should go down the CTE with a row number and then hook to a tally table. But I'm stumped for the moment - hence my posting here!

  • bnordberg (12/16/2008)


    the location_id needs to maintain the previous value - not the min/max or a cartesian product. And I need each day for each person between their admit and discharge.

    running something like

    SELECT n.[Date] AS Date, b.person_id,Coalesce(b.event_type, 'Stay') AS Event, b.location_id, b.adt_id

    FROM dbo.MyCalendarTally('12/1/2008','12/30/2008') n

    LEFT JOIN adt b ON b.event_Date = n.[Date]

    LEFT JOIN adt x ON x.event_Date = (SELECT MAX(event_Date) AS eventDate FROM adt WHERE event_Date <= n.[Date])

    order by n.date, b.person_id

    maintains neither. I'm wondering if I should go down the CTE with a row number and then hook to a tally table. But I'm stumped for the moment - hence my posting here!

    This is ugly as a warthog but seems to work...

    [font="Courier New"]DROP TABLE #adt

    CREATE TABLE #adt (

            [person_id] [int] NULL,

            [event_type] [varchar](10) NULL,

            [event_date] [datetime] NULL,

            [location_id] [int] NULL

    )

    INSERT INTO #adt ([person_id],[event_type],[event_date],[location_id])

    SELECT 1,'Admit',   CONVERT(DATETIME,'Dec  5 2008 12:00PM', 100), 1 UNION ALL

    SELECT 1,'Transfer',    CONVERT(DATETIME,'Dec  5 2008 12:00AM', 100), 2 UNION ALL

    SELECT 1,'Discharge',   CONVERT(DATETIME,'Dec  7 2008 12:00AM', 100), 2 UNION ALL

    SELECT 2,'Admit',   CONVERT(DATETIME,'Dec  7 2008 12:00AM', 100), 2 UNION ALL

    SELECT 2,'Discharge',   CONVERT(DATETIME,'Dec 10 2008 12:00AM', 100), 2 UNION ALL

    SELECT 3,'Admit',   CONVERT(DATETIME,'Dec  8 2008 12:00AM', 100), 1 UNION ALL

    SELECT 3,'Transfer',    CONVERT(DATETIME,'Dec  8 2008 12:00AM', 100), 2 UNION ALL

    SELECT 3,'Transfer',    CONVERT(DATETIME,'Dec 10 2008 12:00AM', 100), 3 UNION ALL

    SELECT 3,'Transfer',    CONVERT(DATETIME,'Dec 14 2008 12:00AM', 100), 4

    SELECT person_id, event_type, event_date,

       [location_id] = ISNULL(location_id, (SELECT TOP 1 location_id

                           FROM #adt

                           WHERE person_id = d.person_id AND event_date < d.event_date

                           ORDER BY event_date DESC))

    FROM (SELECT * FROM #adt

       UNION ALL

       SELECT DISTINCT a.person_id,

           CAST('wait' AS VARCHAR(10)) AS event_type,

           DATEADD(DD, 0, n.number) AS event_date,

           CAST(NULL AS INT) AS location_id

       FROM Numbers n, #adt a, (SELECT person_id, MIN(event_date) AS MINevent_date, MAX(event_date) AS MAXevent_date

                   FROM #adt

                   GROUP BY person_id) b

       WHERE NOT EXISTS (SELECT 1 FROM #adt WHERE person_id = a.person_id AND event_date = DATEADD(DD, 0, n.number))

           AND b.person_id = a.person_id

           AND DATEADD(DD, 0, n.number) > b.MINevent_date

           AND DATEADD(DD, 0, n.number) < b.MAXevent_date

    ) d

    ORDER BY person_id, event_date, location_id

    [/font]

    and if it works, then it's time to make it pretty and fast.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks Chris!

    that appears functional. Now to tune and see how it works with a few hundred million rows!

  • bnordberg (12/17/2008)


    Thanks Chris!

    that appears functional. Now to tune and see how it works with a few hundred million rows!

    Why on Earth do you need to create what is essentially a potwad of duplicated data? Not being a smart guy here... it could make a difference...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/17/2008)


    bnordberg (12/17/2008)


    Thanks Chris!

    that appears functional. Now to tune and see how it works with a few hundred million rows!

    Why on Earth do you need to create what is essentially a potwad of duplicated data? Not being a smart guy here... it could make a difference...

    Totally agree Jeff, this is a reporting requirement.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • The data in this format will allow for contract tracing of MRSA infections. Our statisticians can pipe this through some algorithims in SAS to see if we can't find patterns of infection/spread.

  • bnordberg (12/17/2008)


    Thanks Chris!

    that appears functional. Now to tune and see how it works with a few hundred million rows!

    Are you sure it works? When I run it, I only get the original rows...

    person_id event_type event_date location_id

    ----------- ---------- ------------------------------------------------------ -----------

    1 Transfer 2008-12-05 00:00:00.000 2

    1 Admit 2008-12-05 12:00:00.000 1

    1 Discharge 2008-12-07 00:00:00.000 2

    2 Admit 2008-12-07 00:00:00.000 2

    2 Discharge 2008-12-10 00:00:00.000 2

    3 Admit 2008-12-08 00:00:00.000 1

    3 Transfer 2008-12-08 00:00:00.000 2

    3 Transfer 2008-12-10 00:00:00.000 3

    3 Transfer 2008-12-14 00:00:00.000 4

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • bnordberg (12/18/2008)


    The data in this format will allow for contract tracing of MRSA infections. Our statisticians can pipe this through some algorithims in SAS to see if we can't find patterns of infection/spread.

    Got it... lemme see what I can do... yep, I know... Chris posted code for this already...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/18/2008)


    bnordberg (12/18/2008)


    The data in this format will allow for contract tracing of MRSA infections. Our statisticians can pipe this through some algorithims in SAS to see if we can't find patterns of infection/spread.

    Got it... lemme see what I can do... yep, I know... Chris posted code for this already...

    Jeff, I was hoping you'd take the bait...proof of concept is one thing, turning prototype code into 'nasty fast' production code is entirely another. My query will not scale well. Yours will, and I look forward to learning from it.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris (12/18/2008)


    Jeff, I was hoping you'd take the bait...proof of concept is one thing, turning prototype code into 'nasty fast' production code is entirely another. My query will not scale well. Yours will, and I look forward to learning from it.

    Heh... this one isn't quite as easy as folks would think and still maintain any performance in the face of scalability at all... hope I can live up to your good expectations and mine. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply