Admission dates overlap and suming of inpatient costs

  • Hello-

    I've been struggling with this for some time. we have to group data based on Patients admission date and discharge date. If any Patients discharge date + 1 = admission date then we have group both rows into one row and sum costs from both the rows. Please check out the sample input and expected output for details. Any help will be highly appreciated

    Sample Input

    PatientID AdmissionDate DischargeDate Cost

    1009 27-07-2014 31-07-2014 1050

    1009 01-08-2014 23-08-2014 1070

    1009 31-08-2014 31-08-2014 1900

    1009 01-09-2014 14-09-2014 1260

    1009 01-12-2014 31-12-2014 2090

    1024 07-06-2014 28-06-2014 1900

    1024 29-06-2014 31-07-2014 2900

    1024 01-08-2014 02-08-2014 1800

    Expected Output

    PatientId AdminssionDate DischargeDate Cost

    1009 27-07-2014 23-08-2014 2120

    1009 31-08-2014 14-09-2014 3160

    1009 01-12-2014 31-12-2014 2090

    1024 07-06-2014 02-08-2014 6600

    Please Use the below script to generate the source table and fill them up with the sample data.

    --Create Table

    CREATE TABLE PatientProblem

    (

    PatientID INT,

    AdmissionDate DATETIME,

    DischargeDate DATETIME,

    Cost MONEY

    )

    GO

    --Insert Data

    INSERT INTO PatientProblem(PatientID,AdmissionDate,DischargeDate

    ,Cost)

    VALUES

    (1009,'2014-07-27','2014-07-31',1050.00),

    (1009,'2014-08-01','2014-08-23',1070.00),

    (1009,'2014-08-31','2014-08-31',1900.00),

    (1009,'2014-09-01','2014-09-14',1260.00),

    (1009,'2014-12-01','2014-12-31',2090.00),

    (1024,'2014-06-07','2014-06-28',1900.00),

    (1024,'2014-06-29','2014-07-31',2900.00),

    (1024,'2014-08-01','2014-08-02',1800.00)

    --Verify Data

    SELECT PatientID,AdmissionDate,DischargeDate,Cost

    FROM PatientProblem

  • This should get you what you are looking for...

    -- test data --

    IF OBJECT_ID('tempdb..#PatientProblem') IS NOT NULL

    DROP TABLE #PatientProblem;

    CREATE TABLE #PatientProblem

    (

    PatientID INT,

    AdmissionDate DATETIME,

    DischargeDate DATETIME,

    Cost MONEY

    )

    GO

    --Insert Data

    INSERT INTO #PatientProblem(PatientID,AdmissionDate,DischargeDate,Cost)

    VALUES

    (1009,'2014-07-27','2014-07-31',1050.00),

    (1009,'2014-08-01','2014-08-23',1070.00),

    (1009,'2014-08-31','2014-08-31',1900.00),

    (1009,'2014-09-01','2014-09-14',1260.00),

    (1009,'2014-12-01','2014-12-31',2090.00),

    (1024,'2014-06-07','2014-06-28',1900.00),

    (1024,'2014-06-29','2014-07-31',2900.00),

    (1024,'2014-08-01','2014-08-02',1800.00);

    -- index to eliminate 1 sort operator and lessen the impact of the other. --

    CREATE NONCLUSTERED INDEX ix_PatientProblem_PatientID_AdminDate ON #PatientProblem (

    PatientID, AdmissionDate)

    INCLUDE (DischargeDate, Cost);

    WITH AdminDateGroup AS (-- use datediff in combo with lag to determine were the groups are to be split.

    SELECT

    pp.PatientID,

    pp.AdmissionDate,

    pp.DischargeDate,

    pp.Cost,

    CASE WHEN DATEDIFF(dd, LAG(pp.DischargeDate, 1) OVER (PARTITION BY pp.PatientID ORDER BY pp.AdmissionDate), pp.AdmissionDate) > 1 OR LAG(pp.DischargeDate, 1) OVER (PARTITION BY pp.PatientID ORDER BY pp.AdmissionDate) IS NULL THEN pp.AdmissionDate END AS AdminDateGroup

    FROM

    #PatientProblem pp

    ), AdminDateGroupFill AS (-- fill in null values with the appropriate group value

    SELECT

    adg.PatientID,

    adg.AdmissionDate,

    adg.DischargeDate,

    adg.Cost,

    MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup

    FROM

    AdminDateGroup adg

    )

    -- at this point it's just a simple query with a couple of aggrigates.

    SELECT

    adgf.PatientID,

    adgf.AdminDateGroup AS AdmissionDate,

    MAX(adgf.DischargeDate) AS DischargeDate,

    SUM(adgf.Cost) AS Cost

    FROM

    AdminDateGroupFill adgf

    GROUP BY

    adgf.PatientID,

    adgf.AdminDateGroup

    Here are the results...

    PatientID AdmissionDate DischargeDate Cost

    ----------- ----------------------- ----------------------- ---------------------

    1024 2014-06-07 00:00:00.000 2014-08-02 00:00:00.000 6600.00

    1009 2014-07-27 00:00:00.000 2014-08-23 00:00:00.000 2120.00

    1009 2014-08-31 00:00:00.000 2014-09-14 00:00:00.000 3160.00

    1009 2014-12-01 00:00:00.000 2014-12-31 00:00:00.000 2090.00

    HTH,

    Jason

    PS.. Thank you for posting DDL & test data... It makes things much easier to work on. 😀

  • Perfect. Thanks for your help, this will solve the problem, will tryout and give you a feedback. I like how you used the LAG function and partition - Brilliant. :w00t:

  • Hi Jason,

    I'm using version 2008. Did not really pay attention to it, and it doesn't come with LAG and LEAD functions. I've just re-posted this on the 2008 forum. Any help will be highly appreciated.

    Thanks

  • Maybe you can use this article[/url] to simulate LAG and LEAD.

  • sratemo (6/29/2015)


    Hi Jason,

    I'm using version 2008. Did not really pay attention to it, and it doesn't come with LAG and LEAD functions. I've just re-posted this on the 2008 forum. Any help will be highly appreciated.

    Thanks

    No worries. Just in future don't double post. It only leads to further confusion.

    In any case it's not a problem to accomplish the same results without the LAG function. (Not as efficient but still very doable).

    I tried to keep the two solutions as similar as possible to make it easier to see the differences.

    -- test data --

    IF OBJECT_ID('tempdb..#PatientProblem') IS NOT NULL

    DROP TABLE #PatientProblem;

    CREATE TABLE #PatientProblem

    (

    PatientID INT,

    AdmissionDate DATETIME,

    DischargeDate DATETIME,

    Cost MONEY

    )

    GO

    -- index to eliminate 1 sort operator and lessen the impact of the other. --

    CREATE NONCLUSTERED INDEX ix_PatientProblem_PatientID_AdminDate ON #PatientProblem (

    PatientID, AdmissionDate)

    INCLUDE (DischargeDate, Cost);

    --Insert Data

    INSERT INTO #PatientProblem(PatientID,AdmissionDate,DischargeDate,Cost)

    VALUES

    (1009,'2014-07-27','2014-07-31',1050.00),

    (1009,'2014-08-01','2014-08-23',1070.00),

    (1009,'2014-08-31','2014-08-31',1900.00),

    (1009,'2014-09-01','2014-09-14',1260.00),

    (1009,'2014-12-01','2014-12-31',2090.00),

    (1024,'2014-06-07','2014-06-28',1900.00),

    (1024,'2014-06-29','2014-07-31',2900.00),

    (1024,'2014-08-01','2014-08-02',1800.00);

    WITH PatientProblemAddRN AS (-- add a row number to the data set.

    SELECT

    pp.PatientID,

    pp.AdmissionDate,

    pp.DischargeDate,

    pp.Cost,

    ROW_NUMBER() OVER (PARTITION BY pp.PatientID ORDER BY pp.AdmissionDate) AS RN

    FROM

    #PatientProblem pp

    ), AdminDateGroup AS (-- join the prev CTE to itself... note that the row number is offset by one in the ON clause of the join. this is the mechanism that replaces the LAG function of the previous solution.

    SELECT

    p1.PatientID,

    p1.AdmissionDate,

    p1.DischargeDate,

    p1.Cost,

    CASE WHEN DATEDIFF(dd, p2.DischargeDate, p1.AdmissionDate) > 1 OR p2.AdmissionDate IS NULL THEN p1.AdmissionDate END AS AdminDateGroup

    FROM

    PatientProblemAddRN p1

    LEFT JOIN PatientProblemAddRN p2

    ON p1.PatientID = p2.PatientID

    AND p1.RN = p2.RN + 1

    ), AdminDateGroupFill AS (-- fill in null values with the appropriate group value

    SELECT

    adg.PatientID,

    adg.AdmissionDate,

    adg.DischargeDate,

    adg.Cost,

    MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup

    FROM

    AdminDateGroup adg

    )

    -- at this point it's just a simple query with a couple of aggrigates.

    SELECT

    adgf.PatientID,

    adgf.AdminDateGroup AS AdmissionDate,

    MAX(adgf.DischargeDate) AS DischargeDate,

    SUM(adgf.Cost) AS Cost

    FROM

    AdminDateGroupFill adgf

    GROUP BY

    adgf.PatientID,

    adgf.AdminDateGroup

    Again... The same results as before...

    PatientID AdmissionDate DischargeDate Cost

    ----------- ----------------------- ----------------------- ---------------------

    1024 2014-06-07 00:00:00.000 2014-08-02 00:00:00.000 6600.00

    1009 2014-07-27 00:00:00.000 2014-08-23 00:00:00.000 2120.00

    1009 2014-08-31 00:00:00.000 2014-09-14 00:00:00.000 3160.00

    1009 2014-12-01 00:00:00.000 2014-12-31 00:00:00.000 2090.00

    Also note that the index I suggested in the 1st post remains unchanged. The same sort operation take place as before... Just with ROW_NUMBER instead of LAG.

    HTH,

    Jason

  • Oops, wrong thread.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Here's another quite clever method originating with Itzik Ben-Gan. Unfortunately, I don't believe that the original link is still active but I used the technique myself in this article (with attribution of course).

    Calculating Gaps Between Overlapping Time Intervals in SQL[/url]

    Using Jason's test data:

    WITH C1 AS

    (

    -- Since the data table contains rows with a start and end date, we'll first unpivot

    -- those using CROSS APPLY VALUES and assign a type to each. The columns e and s will

    -- either be NULL (s NULL for an end date, e NULL for a start date) or row numbers

    -- sequenced by the time. UserID has been eliminated because we're looking for overlapping

    -- intervals across all users.

    SELECT PatientID, ts, [Type], Cost

    ,e=CASE [Type]

    WHEN 1 THEN NULL

    ELSE ROW_NUMBER() OVER (PARTITION BY PatientID, [Type] ORDER BY DischargeDate) END

    ,s=CASE [Type]

    WHEN -1 THEN NULL

    ELSE ROW_NUMBER() OVER (PARTITION BY PatientID, [Type] ORDER BY AdmissionDate) END

    FROM #PatientProblem

    CROSS APPLY (VALUES (1, AdmissionDate-1), (-1, DischargeDate+1)) a([Type], ts)

    ),

    C2 AS

    (

    -- Add a row number ordered as shown

    SELECT C1.*, se=ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY ts, [Type] DESC)

    FROM C1

    ),

    C3 AS

    (

    -- Create a grpnm that pairs the rows

    SELECT ts, grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY ts)-1) / 2 + 1), PatientID, Cost

    FROM C2

    -- This filter is the magic that eliminates the overlaps

    WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0

    ),

    C4 AS

    (

    -- Grouping by grpnm restores the records to only non-overlapped intervals

    -- *** Adjust intervals back to original

    SELECT AdmissionDate=MIN(ts)+1, DischargeDate=MAX(ts)-1, PatientID, Cost=SUM(Cost)

    FROM C3

    GROUP BY PatientID, grpnm

    )

    SELECT PatientID, AdmissionDate, DischargeDate

    -- *** Subquery to compute the cost

    ,Cost =

    (

    SELECT SUM(Cost)

    FROM #PatientProblem b

    WHERE a.PatientID = b.PatientID AND

    b.AdmissionDate >= a.AdmissionDate AND

    b.DischargeDate <= a.DischargeDate

    )

    FROM C4 a

    ORDER BY PatientID, AdmissionDate;

    You should find this approach to be pretty fast. I had to make just a couple of modifications to handle the 1 day apart issue between your end/start date overlaps (*** commented).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Amazing. Works perfectly. Thank you:-):-):-)

  • Hi Jason,

    Your solution looks solid but the query wouldn't execute past:

    "MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup".

    error - Incorrect syntax near 'order'.

    Tried going around this but then got discrepancies in the results.

  • That's because prior to 2012, ORDER BY clauses were not supported in aggregate window functions; they were supported only in ranking window functions.

    See https://msdn.microsoft.com/en-us/library/ms189461(v=sql.105).aspx for the appropriate documentation.

    Cheers!

  • sratemo (7/2/2015)


    Hi Jason,

    Your solution looks solid but the query wouldn't execute past:

    "MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup".

    error - Incorrect syntax near 'order'.

    Tried going around this but then got discrepancies in the results.

    Yea... That's my bad. I knew that 2008 supported windowed aggregates but it looks like it only supports partitioning, not ordering... So completely my fault on that...

    Luckily Dwain was able to step up to the plate and come up with a good working solution. I haven't had a chance to dig into his code but I definitely will. He's always good at supplying fantastic new tools for the toolbox. 🙂

  • Jason A. Long (7/2/2015)


    sratemo (7/2/2015)


    Hi Jason,

    Your solution looks solid but the query wouldn't execute past:

    "MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup".

    error - Incorrect syntax near 'order'.

    Tried going around this but then got discrepancies in the results.

    Yea... That's my bad. I knew that 2008 supported windowed aggregates but it looks like it only supports partitioning, not ordering... So completely my fault on that...

    Luckily Dwain was able to step up to the plate and come up with a good working solution. I haven't had a chance to dig into his code but I definitely will. He's always good at supplying fantastic new tools for the toolbox. 🙂

    Well, that's not exactly a new tool (IBG's approach to collapsing overlapping intervals) but I thank you nonetheless.

    Note that I tried to avoid the extra sub-query I used to pull the cost but I couldn't see a way to make that work within the interval collapsing algorithm. That "magic filter" that IBG came up with seems to preclude doing it in there. Had I been able to, I believe it would have removed the extra INDEX SCAN (or SEEK).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply