PERCENT_RANK() problems

  • Hello,

    I have a formula I am having difficultly with and cannot seem to find a solution for it. My understanding of the PERCENT_RANK function is it will break up your partition bebetween and 100% in float/decimal format. The trouble I am having is with this formula, I am getting percentages between 0 and 0.995652174 and I am not sure why it wont go to 100%.

    Here is my current formula:

    ,PERCENT_RANK() over (partition by PARTITON ORDER BY (sum(CASE WHEN cast((Field1)as float)= 0 THEN 0 ELSE (cast((Field2) as float) / cast((Field1) as float)) - (cast((Field3) as float)/cast((Field1) as float)) END)) ASC) AS Column_Label

    Can someone assist?

    Thanks

  • CPiacente (6/19/2015)


    Hello,

    I have a formula I am having difficultly with and cannot seem to find a solution for it. My understanding of the PERCENT_RANK function is it will break up your partition bebetween and 100% in float/decimal format. The trouble I am having is with this formula, I am getting percentages between 0 and 0.995652174 and I am not sure why it wont go to 100%.

    Here is my current formula:

    ,PERCENT_RANK() over (partition by PARTITON ORDER BY (sum(CASE WHEN cast((Field1)as float)= 0 THEN 0 ELSE (cast((Field2) as float) / cast((Field1) as float)) - (cast((Field3) as float)/cast((Field1) as float)) END)) ASC) AS Column_Label

    Can someone assist?

    Thanks

    I don't know what kind of data type that will return, but it might well be float, and that alone has at least the potential for not adding up to a full 100%. Wondering if changing that code to use decimal(16,8) or something similar might improve the result.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thanks for the suggestion. I tried it but I got the same results. Any other suggestions?

  • CPiacente (6/19/2015)


    Thanks for the suggestion. I tried it but I got the same results. Any other suggestions?

    There's always the option of using a CASE statement to set the value to 1 when the two values being divided are equal, otherwise grab the percent_rank, but that would have to be an additional query as a "second pass", so to speak, but that is just kind of a band-aid, and might not be a good idea. Another possibility would be to ROUND the final value to some number of decimal places...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thank you for the information. I will look into that. In addition however, I noticed some percent_rank() values would only get to around 65% or so as the max. Something definitely seems off.

  • CPiacente (6/22/2015)


    Thank you for the information. I will look into that. In addition however, I noticed some percent_rank() values would only get to around 65% or so as the max. Something definitely seems off.

    You really need to read the Books Online entry for PERCENT_RANK and understand what it's doing. I really can't see this function as useful in any practical way for small numbers of records. I had yet to read it when before my first two replies, and have since reviewed it, and now that I have, I would expect it's value to only be useful when you have at least a good 20 to 30 records to work with, as the size of the changes across 3 to 5 records is just too large to make the use of this function worthwhile. You might be better off calculating the percentage another way, such as in a report (e.g. Crystal Reports, or SSRS).

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I probably will have to find another way but I have hundreds of records and in some case thousand or more. I will check that out however.

  • CPiacente (6/22/2015)


    I probably will have to find another way but I have hundreds of records and in some case thousand or more. I will check that out however.

    I can't imagine you'll have trouble with those larger number of records. The problems with statistical measures that are designed mostly for larger quantities of data is that they just don't work well with small sets of data, so if you want to do some useful testing of the function, then write some code to create a large sample data set. Even random data would at least help you see how the function works with larger recordsets.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • You wouldn't think so but I am having issues. Makes no sense to me and that is why I think it is so obvious haha!:-D

  • Getting values <1 for the highest PERCENT_RANK() is quite normal. All that has to happen is for the value ordered last by the ORDER BY to be repeated in the data, so that there are ties at the top.

    The data type of the value being ordered doesn't come into play (in the following example the expression being ordered is an int).

    The following code illustrates this:

    DECLARE @table TABLE (SomeInt int)

    INSERT INTO @table

    SELECT TOP 10000 RN

    FROM (SELECT ROW_NUMBER () OVER (ORDER BY ac1.object_id) RN

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2) a

    --This shows us a max value of 1, as expected.

    SELECT TOP 1

    PR=PERCENT_RANK() OVER (ORDER BY SomeInt asc )

    FROM @table

    ORDER BY PR DESC

    --Now make sure there's a tie between two rows at the top

    INSERT INTO @table

    SELECT 10000

    --This will yield 0.9999 as the max PERCENT_RANK

    SELECT TOP 1

    PR=PERCENT_RANK() OVER (ORDER BY SomeInt asc )

    FROM @table

    ORDER BY PR DESC

    DELETE FROM @table

    INSERT INTO @table

    VALUES (1),(2),(2)

    --With a tie at the top of a very small partition, the max value gets quite low.

    --Here it's 0.5.

    --In fact, if there is only one value for the ORDER BY expression in the partition,

    --then the highest PERCENT_RANK will be 0

    SELECT TOP 1

    PR=PERCENT_RANK() OVER (ORDER BY SomeInt asc )

    FROM @table

    ORDER BY PR DESC

    I hope this helps clear up why this might be happening. You probably just have ties for the value sorted last by your ORDER BY expression.

    You could consider using CUME_DIST () instead, which will return a 1 for the value sorted last regardless of ties, but it's a bit different calculation than PERCENT_RANK (), so you'll want to make sure it's appropriate for your purposes.

    Cheers!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply