Median

  • i have a table with 3 rows

    Producttyp2col1 col2

    A22

    A02

    A11

    A20

    B 2 5

    B 1 2

    B 1 1

    i am trying to find the median by product type wise

    i feel sorting the col2 and col3 and take the lowest of 50% will give median, but not sure how to do it or anyother easy way to do it please suggest

  • What would be your expected result?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Fairly straightfoward - use mated ROW_NUMBER() rankings, one flowing forward and one flowing backward over the data. By taking the first record where the orders cross over, you can get the median value (and depending on whether you order ASC/DESC you can get the upper/lower value if the median falls on one of two records in an even number of rows).

    Example query setup:

    CREATE TABLE SomeData

    (DataItemID INT IDENTITY(1,1), ProductType VARCHAR(255), SomeValueToMedian INT)

    /* Some dummy data, including a duplicate value */

    INSERT INTO SomeData SELECT 'Product A', 50

    INSERT INTO SomeData SELECT 'Product A', 6

    INSERT INTO SomeData SELECT 'Product A', 36

    INSERT INTO SomeData SELECT 'Product A', 2

    INSERT INTO SomeData SELECT 'Product A', 49

    INSERT INTO SomeData SELECT 'Product B', 75

    INSERT INTO SomeData SELECT 'Product B', 21

    INSERT INTO SomeData SELECT 'Product B', 62

    INSERT INTO SomeData SELECT 'Product B', 21

    And the ranking is done via:

    SELECT

    ProductType,

    DataItemID,

    SomeValueToMedian

    FROM

    (

    SELECT

    ProductType,

    DataItemID,

    SomeValueToMedian,

    ROW_NUMBER() OVER(PARTITION BY ProductType ORDER BY SomeValueToMedian ASC) AS Sequence

    FROM

    (

    SELECT

    DataItemID,

    ProductType,

    SomeValueToMedian,

    ROW_NUMBER() OVER (PARTITION BY ProductType ORDER BY SomeValueToMedian ASC) AS Forward,

    ROW_NUMBER() OVER (PARTITION BY ProductType ORDER BY SomeValueToMedian DESC) AS Backward

    FROM SomeData

    ) Sorted

    WHERE

    Forward >= Backward

    ) Sequenced

    WHERE

    Sequenced.Sequence = 1

    ORDER BY ProductType

    Output will be:

    Product A336

    Product B862

  • Here is a useful article on several different methods to calculate the median.

    http://www.simple-talk.com/sql/t-sql-programming/median-workbench/

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Steven James Gray (4/30/2010)


    Fairly straightfoward - use mated ROW_NUMBER() rankings, one flowing forward and one flowing backward over the data. By taking the first record where the orders cross over, you can get the median value (and depending on whether you order ASC/DESC you can get the upper/lower value if the median falls on one of two records in an even number of rows).

    Example query setup:

    CREATE TABLE SomeData

    (DataItemID INT IDENTITY(1,1), ProductType VARCHAR(255), SomeValueToMedian INT)

    /* Some dummy data, including a duplicate value */

    INSERT INTO SomeData SELECT 'Product A', 50

    INSERT INTO SomeData SELECT 'Product A', 6

    INSERT INTO SomeData SELECT 'Product A', 36

    INSERT INTO SomeData SELECT 'Product A', 2

    INSERT INTO SomeData SELECT 'Product A', 49

    INSERT INTO SomeData SELECT 'Product B', 75

    INSERT INTO SomeData SELECT 'Product B', 21

    INSERT INTO SomeData SELECT 'Product B', 62

    INSERT INTO SomeData SELECT 'Product B', 21

    And the ranking is done via:

    SELECT

    ProductType,

    DataItemID,

    SomeValueToMedian

    FROM

    (

    SELECT

    ProductType,

    DataItemID,

    SomeValueToMedian,

    ROW_NUMBER() OVER(PARTITION BY ProductType ORDER BY SomeValueToMedian ASC) AS Sequence

    FROM

    (

    SELECT

    DataItemID,

    ProductType,

    SomeValueToMedian,

    ROW_NUMBER() OVER (PARTITION BY ProductType ORDER BY SomeValueToMedian ASC) AS Forward,

    ROW_NUMBER() OVER (PARTITION BY ProductType ORDER BY SomeValueToMedian DESC) AS Backward

    FROM SomeData

    ) Sorted

    WHERE

    Forward >= Backward

    ) Sequenced

    WHERE

    Sequenced.Sequence = 1

    ORDER BY ProductType

    Output will be:

    Product A336

    Product B862

    If this person is using SQL 2000 (since this is the 2000 forum), then this method won't work.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • keywestfl9 (4/30/2010)


    i have a table with 3 rows

    Producttyp2col1 col2

    A22

    A02

    A11

    A20

    B 2 5

    B 1 2

    B 1 1

    i am trying to find the median by product type wise

    i feel sorting the col2 and col3 and take the lowest of 50% will give median, but not sure how to do it or anyother easy way to do it please suggest

    Find the median of what? Col1 or Col2 or both?

    Also, you've visited this site more than 600 times and you have almost 200 points... it's time for you to learn how to post questions so people will actually want to help you. Please read and heed the article at the first link in my signature line below for how to do that. You'll be amazed at how many people will help if you follow the code examples to make your data readily consumable. 😉

    Finally, (and borrowing from Steven's generous post), here's how to solve the median problem in a set based fashion using Steven's test data...

    --===== Do these tests in a nice safe place...

    USE TempDB

    ;

    --===== Create a table and some test data.

    -- This is not a part of the solution but it is how

    -- you need to post your data in the future.

    CREATE TABLE dbo.SomeTable

    (

    DataItemID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    ProductType VARCHAR(255),

    SomeValueToMedian INT

    )

    ;

    INSERT INTO dbo.SomeTable

    (ProductType, SomeValueToMedian)

    SELECT 'Product A', 50 UNION ALL

    SELECT 'Product A', 6 UNION ALL

    SELECT 'Product A', 36 UNION ALL

    SELECT 'Product A', 2 UNION ALL

    SELECT 'Product A', 49 UNION ALL

    SELECT 'Product B', 75 UNION ALL

    SELECT 'Product B', 21 UNION ALL

    SELECT 'Product B', 62 UNION ALL

    SELECT 'Product B', 21

    ;

    --===== Here's the solution to Median and it runs nasty fast in

    -- 2000 and 2005

    SELECT DISTINCT

    m.ProductType,

    (

    (SELECT TOP 1 SomeValueToMedian FROM

    (SELECT TOP 50 PERCENT SomeValueToMedian FROM dbo.SomeTable t

    WHERE t.ProductType = m.ProductType

    ORDER BY SomeValueToMedian ASC

    ) lo

    ORDER BY SomeValueToMedian DESC)

    +(SELECT TOP 1 SomeValueToMedian FROM

    (SELECT TOP 50 PERCENT SomeValueToMedian FROM dbo.SomeTable t

    WHERE t.ProductType = m.ProductType

    ORDER BY SomeValueToMedian DESC

    ) hi

    ORDER BY SomeValueToMedian ASC)

    ) / 2.0 AS MEDIAN

    FROM dbo.SomeTable m

    ORDER BY m.ProductType

    ;

    In the presence of the correct indexes and the like, the code above will resolve 70,200 medians in a million rows in about 32 seconds on an 8 year old single 1.8 GHz cpu desktop with only 1GB of RAM.

    See the following post for a super detailed example including more data than you can shake a stick at...

    http://www.sqlservercentral.com/Forums/Topic351991-8-1.aspx#bm352218

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just noticed I'd posted to the SQL 2K forum from your comment. Thats what I get for mis-reading the Active Topics list 😉

  • Steven James Gray (5/1/2010)


    Just noticed I'd posted to the SQL 2K forum from your comment. Thats what I get for mis-reading the Active Topics list 😉

    It catches all of us from time to time.

    Good solution otherwise.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • @keywestfl9,

    So... are you all set now or do you have other questions about MEDIANs?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • yes it helped to finish my task. thank you all for the help

    They want to do for another report in access the same thing,i will try and post it if not in the access forum.

  • keywestfl9 (5/1/2010)


    yes it helped to finish my task. thank you all for the help

    They want to do for another report in access the same thing,i will try and post it if not in the access forum.

    I suggest avoiding it in access and do it in SQL server instead.;-)

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • keywestfl9 (5/1/2010)


    yes it helped to finish my task. thank you all for the help

    They want to do for another report in access the same thing,i will try and post it if not in the access forum.

    Good. Thanks for the feedback. I also share Jason's sentiment above... If you can do it in SQL Server, it may be better to avoid Access for such a thing if you can unless you have plans for regular backups and other support.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply