relatively simple mode, median, mean

  • Oblio Leitch

    SSCrazy

    Points: 2137

    Comments posted to this topic are about the item relatively simple mode, median, mean

  • charles.sellers

    Grasshopper

    Points: 19

    Good post, I needed some mode and median calculations in a short time frame, and your SQL was invaluable. Thanks.

  • Jesse McLain

    SSCommitted

    Points: 1784

    There's only one problem with the calculation of the median: if the row count of the sample data is even, then the median is not unique (see http://en.wikipedia.org/wiki/Median). Here's a version of the calc that accounts for even row counts:

    ;WITH

    TopHalf AS (

    SELECT TOP 50 PERCENT DataValue

    FROM DataSet

    ORDER BY DataValue ASC

    )

    ,BottomHalf AS (

    SELECT TOP 50 PERCENT DataValue

    FROM DataSet

    ORDER BY DataValue DESC

    )

    ,BottomOfTopHalf AS (

    SELECT TOP 1 DataValue

    FROM TopHalf

    ORDER BY DataValue DESC

    )

    ,TopOfBottomHalf AS (

    SELECT TOP 1 DataValue

    FROM BottomHalf

    ORDER BY DataValue ASC

    )

    SELECT

    Median = (BottomOfTopHalf.DataValue + TopOfBottomHalf.DataValue) / 2.0

    FROM BottomOfTopHalf, TopOfBottomHalf

  • leea

    SSC Enthusiast

    Points: 186

    I need something similar, but I need the daily median of a data set, effectively doing a group by on the Date.

    Only way I've found to do this is with a cursor was wondering if any clever chaps out there have a better idea?

  • Jesse McLain

    SSCommitted

    Points: 1784

    leea (9/9/2009)


    I need something similar, but I need the daily median of a data set, effectively doing a group by on the Date.

    Only way I've found to do this is with a cursor was wondering if any clever chaps out there have a better idea?

    I'm guessing that you're running this against data that is historically static, correct? If so, how about creating a table to hold the results of your daily median calculation? That way you don't have to rerun it for historic data every time a user needs it.

  • Langston Montgomery

    SSC Eights!

    Points: 934

    Very good catch Jesse!

    Jesse McLain (4/3/2009)


    There's only one problem with the calculation of the median: if the row count of the sample data is even, then the median is not unique (see http://en.wikipedia.org/wiki/Median). Here's a version of the calc that accounts for even row counts:

    ;WITH

    TopHalf AS (

    SELECT TOP 50 PERCENT DataValue

    FROM DataSet

    ORDER BY DataValue ASC

    )

    ,BottomHalf AS (

    SELECT TOP 50 PERCENT DataValue

    FROM DataSet

    ORDER BY DataValue DESC

    )

    ,BottomOfTopHalf AS (

    SELECT TOP 1 DataValue

    FROM TopHalf

    ORDER BY DataValue DESC

    )

    ,TopOfBottomHalf AS (

    SELECT TOP 1 DataValue

    FROM BottomHalf

    ORDER BY DataValue ASC

    )

    SELECT

    Median = (BottomOfTopHalf.DataValue + TopOfBottomHalf.DataValue) / 2.0

    FROM BottomOfTopHalf, TopOfBottomHalf

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply