PERCENTILE FUNCTION equivalent to excel function in SQL 2008

  • Hi,

    I am trying to calculate percentile in SQL server exactly similar to MS-excel Percentile function where we give data set and percentile and gives threshold

    I am trying to get different thresholds from data set at 0.2, 0.4, 0.6, 0.8

    e.g.

    I have data set

    12000

    6000

    4000

    15000

    8000

    4000

    2000

    9000

    5000

    3000

    with excel function =PERCENTILE(A1:A10,0.2) I get result 3800

    with excel function =PERCENTILE(A1:A10,0.4) I get result 4600

    with excel function =PERCENTILE(A1:A10,0.6) I get result 6800

    with excel function =PERCENTILE(A1:A10,0.8) I get result 9600

    Please let me know if any function or any script can help to get same result

    Thanks

  • Hi and welcome to the forums. In order to help we will need a few things:

    1. Sample DDL in the form of CREATE TABLE statements

    2. Sample data in the form of INSERT INTO statements

    3. Expected results based on the sample data

    Please take a few minutes and read the first article in my signature for best practices when posting questions.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Hi

    please find table with sample data

    IF OBJECT_ID( 'tempdb..#temp_quintiles', 'U' ) IS NOT NULL

    DROP TABLE #temp_quintiles;

    Create table #temp_quintiles(

    incomes int NULL);

    insert into #temp_quintiles (incomes)

    values

    (12000),

    (6000),

    (4000),

    (15000),

    (8000),

    (4000),

    (2000),

    (9000),

    (5000),

    (3000)

    expected out put to get 0.2 percentile equivalent to excel percentile function 0.2 percentile result 3800

    0.4 percentile result 4600

    0.6 percentile result 6800

    Thanks

  • In SQL 2012 there is a function PERCENTILE_CONT that will do this for you.

    At PERCENTILE_CONT(0.5) this is the median. Which can be calculated like this:

    An Even Faster Method of Calculating the Median on a Partitioned Heap[/url]

    I have played around with creating a replica of PERCENTILE_CONT on earlier versions of SQL Server. Median is difficult enough, but supporting the whole gamut of parameters you can pass as the percentile is quite the challenge.

    Before I attempt to rise to that challenge (never got it fully working), I wanted to make sure you're not already on SQL 2012 and posting to the wrong forum.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I am working on sql server 2008, can anyone help me with the logic please 🙂

  • akash_singh (4/21/2015)


    I am working on sql server 2008, can anyone help me with the logic please 🙂

    I was afraid you were going to say that.

    One of the issues I encountered while attempting to replicate the formula is that I had no documentation describing the calculation. Is that described anywhere in the Excel documentation on line? Send a link if it is.

    One issue with the sample data you provided is that it is too simplistic. What happens if the row count is not 10? That's where it really starts getting convoluted. I was trying to back into a formula, but was unable to do so.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • There is a Definition of the Microsoft Excel method for its PERCENTILE function on the linked Wiki page.

    This implements that definition (not checked for all cases) and seems to return the correct value for the 20th percentile.

    WITH RowsInSampleData AS

    (

    SELECT [rows]=COUNT(*)

    FROM #temp_quintiles

    ),

    Percentile AS

    (

    SELECT p=0.2

    ),

    SampleData AS

    (

    SELECT incomes

    ,rn=ROW_NUMBER() OVER (ORDER BY incomes)

    FROM #temp_quintiles a

    ),

    CalculateRank AS

    (

    SELECT incomes, rn, [rows], [rank]=p*([rows]-1)+1, p

    FROM SampleData

    CROSS JOIN RowsInSampleData b

    CROSS JOIN Percentile c

    )

    SELECT *, PERCENTILE=incomes+[rank]%1*(next_income-incomes)

    FROM CalculateRank a

    OUTER APPLY

    (

    SELECT TOP 1 incomes

    FROM SampleData c

    WHERE c.rn > a.rn

    ORDER BY c.rn

    ) b (next_income)

    WHERE rn=FLOOR([rank]);


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • What? No thank you?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hey Thanks a ton, i was testing the logic and its working fine with my data set. Thanks a lot once again 🙂

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply