PERCENTILE_DISC for median calculation

  • Question: is PERCENTILE_DISC for median calculation works for both odd and even number of records?

    In SQL Server 2008 to calculate median we used this code based on modulo (@modulo was calculated before this)

    case @modulo

    when 1 then

    (SELECT MAX([TimeInDepartmentMinutes]) FROM

    (SELECT TOP 50 PERCENT [TimeInDepartmentMinutes] FROM dbo.vw_ED_Time_Calc

    where ArrivalDateTime >= @StartDate and ArrivalDateTime <= @EndDate and Psyc ='yes' and HospitalName = @Hospital

    ORDER BY [TimeInDepartmentMinutes]) AS Median)

    else -- even

    (SELECT

    ( (SELECT MAX([TimeInDepartmentMinutes]) FROM

    (SELECT TOP 50 PERCENT [TimeInDepartmentMinutes]

    FROM dbo.vw_ED_Time_Calc where ArrivalDateTime >= @StartDate and ArrivalDateTime <= @EndDate and Psyc ='yes' and HospitalName = @Hospital

    ORDER BY 1--==>

    ) AS m1)

    + (

    SELECT MIN([TimeInDepartmentMinutes]) FROM

    (SELECT TOP 50 PERCENT [TimeInDepartmentMinutes] FROM dbo.vw_ED_Time_Calc

    where ArrivalDateTime >= @StartDate and ArrivalDateTime <= @EndDate and Psyc ='yes' and HospitalName = @Hospital

    ORDER BY 1 DESC --==>

    ) AS m2)

    ) / 2 AS Me)

    end AS Median_LOS_All_Psyc_Patients

  • That depends on your desired value.   The median value computed by PERCENTILE_DISC has to actually come from the data, so the number of values is irrelevant in that regard only.   After all, if you only have two values, which one is the median?   Thus there is the potential to have a meaningless result unless you also count the number of rows from which that value has been selected as the median.   PERCENTILE_CONT will compute the actual median value, which will not necessarily be represented in the data.  Thus the meaning of _DISC is "discrete", and of _CONT is "continuous".   Does that help answer the question?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • The OP actually made the mistake of posting their question in two separate posts.  This question has been beat to death in the other post, which is located at the following URL...
    https://www.sqlservercentral.com/Forums/2013502/PERCENTILEDISC-for-median-calculation

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sgmunson - Monday, December 17, 2018 2:36 PM

    That depends on your desired value.   The median value computed by PERCENTILE_DISC has to actually come from the data, so the number of values is irrelevant in that regard only.   After all, if you only have two values, which one is the median?   Thus there is the potential to have a meaningless result unless you also count the number of rows from which that value has been selected as the median.   PERCENTILE_CONT will compute the actual median value, which will not necessarily be represented in the data.  Thus the meaning of _DISC is "discrete", and of _CONT is "continuous".   Does that help answer the question?

    Thank you very much.. So we should use PERCENTILE_CONT for median.
    We discussed it with Jeff. I posted the question to this forum first. After a couple of days with no response I posted it on 2008.
    Should I remove my post?

  • valeryk2000 - Tuesday, December 18, 2018 8:41 AM

    sgmunson - Monday, December 17, 2018 2:36 PM

    That depends on your desired value.   The median value computed by PERCENTILE_DISC has to actually come from the data, so the number of values is irrelevant in that regard only.   After all, if you only have two values, which one is the median?   Thus there is the potential to have a meaningless result unless you also count the number of rows from which that value has been selected as the median.   PERCENTILE_CONT will compute the actual median value, which will not necessarily be represented in the data.  Thus the meaning of _DISC is "discrete", and of _CONT is "continuous".   Does that help answer the question?

    Thank you very much.. So we should use PERCENTILE_CONT for median.
    We discussed it with Jeff. I posted the question to this forum first. After a couple of days with no response I posted it on 2008.
    Should I remove my post?

    No.  You're alright.  Guess we can't blame someone when they don't get a response after a couple of days and some people go nuts when someone bumps their post.  I also made the mistake of not verifying the original posting date of the two posts before I called them out as a "duplicate".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • For anyone interested in the conversation on the other post, here's the link...
    https://www.sqlservercentral.com/Forums/2013502/PERCENTILEDISC-for-median-calculation

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply