Find Median

  • Hi,

    Can some one help to find median for Odd and Even count in one function

    Thanks

    Grace

  • Yes. First, see my original post on the subject from 2007 so you get the big picture of what was requested so that you'll understand the following code, which was the solution also posted on that thread.

    http://www.sqlservercentral.com/Forums/Topic351991-8-1.aspx

    Here's the code to setup the million row test table and the solution I provided, which still works even at this late day. Note that it finds ~70,200 medians across 1 million rows in about 6 seconds (not including the time it takes to build the test data, which takes about 11 seconds).

    /**************************************************************************************************

    Setup a million row test table.

    This is NOT a part of the solution. We're just creating a test table here.

    **************************************************************************************************/

    --===== If the temp table already exists, drop it

    IF OBJECT_ID('TempDB..#Steps') IS NOT NULL

    DROP TABLE #Steps

    ;

    --===== Simulate loading your real temp table with ...

    -- A MILLION rows, 17576 reps (3 character "names"), a DeltaT from 0 to 1999,

    -- and 4 "Steps" (Step 1, Step 2, etc)

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS RowNum,

    CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

    + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

    + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)) AS Rep,

    CAST(RAND(CAST(NEWID() AS VARBINARY))*2000 AS INT) AS DeltaT,

    'Step '

    + CAST(CAST(RAND(CAST(NEWID() AS VARBINARY))*4 AS INT)+1 AS VARCHAR(25)) AS MileCode

    INTO #Steps

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    --===== Every table, even temporary ones, should have a Primary Key...

    -- Makes all the difference in the world, performance wise, on this script.

    ALTER TABLE #Steps

    ADD PRIMARY KEY CLUSTERED (RowNum)

    ;

    --===== And, trust me, you want this index. It's a "covering" index for both the

    -- ORDER BY's, the JOINs, and the data that will appear in the code that follows.

    CREATE INDEX IX_Steps ON #Steps

    (Rep,MileCode,DeltaT)

    ;

    /**************************************************************************************************

    Find the median for all steps of all reps.

    6 seconds to find ~70,200 medians across 1 million rows.

    **************************************************************************************************/

    SELECT DISTINCT

    m.Rep,

    m.MileCode,

    (

    (SELECT TOP 1 DeltaT FROM

    (SELECT TOP 50 PERCENT DeltaT FROM #Steps t

    WHERE t.Rep = m.Rep AND t.MileCode = m.MileCode

    ORDER BY DeltaT ASC

    ) lo

    ORDER BY DeltaT DESC)

    +(SELECT TOP 1 DeltaT FROM

    (SELECT TOP 50 PERCENT DeltaT FROM #Steps t

    WHERE t.Rep = m.Rep AND t.MileCode = m.MileCode

    ORDER BY DeltaT DESC

    ) hi

    ORDER BY DeltaT ASC)

    ) / 2 AS MEDIAN

    FROM #Steps m

    --WHERE Rep = 'AAB' AND MileCode = 'Step 4' --Uncomment to find the info on just one rep and step

    ORDER BY m.Rep,m.MileCode

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here are 2 other methods that seem to be faster.

    WITH cteStepsTiles AS(

    SELECT *, NTILE(2) OVER(PARTITION BY Rep, MileCode ORDER BY DeltaT) tile

    FROM #Steps

    )

    SELECT

    Rep,

    MileCode,

    Median = CASE WHEN COUNT(CASE WHEN tile = 1 THEN 1 END) > COUNT(CASE WHEN tile = 2 THEN 1 END)

    THEN MAX(CASE WHEN tile = 1 THEN DeltaT END)

    ELSE (MAX(CASE WHEN tile = 1 THEN DeltaT END) + MIN(CASE WHEN tile = 2 THEN DeltaT END)) / 2. END

    FROM cteStepsTiles

    GROUP BY Rep, MileCode

    ORDER BY Rep, MileCode;

    SELECT

    Rep,

    MileCode,

    Median = AVG(1.0 * DeltaT)

    FROM

    (

    SELECT Rep, MileCode, DeltaT,

    rn = ROW_NUMBER() OVER(PARTITION BY Rep, MileCode ORDER BY DeltaT),

    c = COUNT(*) OVER(PARTITION BY Rep, MileCode)

    FROM #Steps AS o

    ) AS x

    WHERE rn IN ((c + 1)/2, (c + 2)/2)

    GROUP BY Rep, MileCode

    ORDER BY Rep, MileCode;

    EDIT: In SQL Server 2012, there was a function added that would simplify the queries looking for the median. I didn't include it because this is a 2008 forum, but I thought it was worth mentioning it.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (12/30/2016)


    EDIT: In SQL Server 2012, there was a function added that would simplify the queries looking for the median. I didn't include it because this is a 2008 forum, but I thought it was worth mentioning it.

    Which function would that be? I forget the name of it but are you talking about the function for the simplification of paging?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's PERCENTILE_DISC or PERCENTILE_CONT which use OVER to define window and order.

    The median is the first example on the BOL page:

    https://msdn.microsoft.com/en-us/library/hh231327.aspx

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (12/30/2016)


    It's PERCENTILE_DISC or PERCENTILE_CONT which use OVER to define window and order.

    The median is the first example on the BOL page:

    https://msdn.microsoft.com/en-us/library/hh231327.aspx

    Very cool. Thanks.

    On the two code examples you provided above (and thank you for those... learned a new trick today thanks to you), the first one is, indeed, quite fast and returns in 3 seconds instead of 6 on my humble laptop (i5 processor with 6GB ram).

    The second one takes 18 seconds.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/30/2016)


    Luis Cazares (12/30/2016)


    It's PERCENTILE_DISC or PERCENTILE_CONT which use OVER to define window and order.

    The median is the first example on the BOL page:

    https://msdn.microsoft.com/en-us/library/hh231327.aspx

    Very cool. Thanks.

    On the two code examples you provided above (and thank you for those... learned a new trick today thanks to you), the first one is, indeed, quite fast and returns in 3 seconds instead of 6 on my humble laptop (i5 processor with 6GB ram).

    The second one takes 18 seconds.

    Are you sure there's nothing else going on? The times in milliseconds I get for the 3 queries:

    2 subqueries: 7551

    NTILE: 3008

    ROW_NUMBER & COUNT: 3452

    The numbers would vary a bit from run to run (obviously) but are consistent.

    This was on a SQL Server 2014 running on my laptop Intel Core i5 with 16GB of RAM.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Yes. I'm running 2008 (non r2) on my laptop and nothing else was going on at the time. The first of your queries took 3 seconds, the second took 18 seconds.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks, Actually I have single column in table, need to find median on that column, I may have count odd and even, Please help me on this

  • kaladharreddy15 (12/30/2016)


    Thanks, Actually I have single column in table, need to find median on that column, I may have count odd and even, Please help me on this

    The code that we've posted so far will work with a single column. All the code needs is a bit of modification to meet the requirements of your table.

    Speaking of that, you've provided no description of the table, the columns, nor even what needs the Median calculation. If you want something that meets your needs, you have to be able to adequately describe those needs.

    With that in mind, please read and heed the first article posted in my signature line below under "Helpful Links". Otherwise this whole thread will turn into a round robin of second guessing littered with partial solutions that don't measure up to your requirements because.... we just don't know what they are at this point. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply