SQLServerCentral Article

Percentile calculations in DAX

As noted by several authors, the percentile measures are not part of the calculations available in DAX. In my opinion, this is a drawback, but on the other hand, making these calculations myself improved my understanding and control over the whole tabular model. 

The starting point for this piece was the article written by Colin Banfield called “Creating Accurate Percentile Measures in DAX”, but after trying an implementation, I found that

  1. There are problems when 2 or more measures have the same value
  2. The calculations do not match the ones produced by the Excel formula PERCENTILE.INC or PERCENTILE.EXL

Point number 2 is specially problematic when you have to defend your calculations against a Business Analyst that has created the specifications using Excel. Then you really need to mimic the same calculations that Excel does.

Therefore, I investigated the formulas internally used by Excel, and they are clearly defined in Wikipedia under the R-7 reference. The calculation steps for the percentile 25 are as follows. In my case, I am not calculating the percentile for all the rows in the table but just the ones that match a filtering condition ([FilterColumn] = [FilterCondition]):

RankAscSparse: this calculation will rank the measures in scope in the ascending direction skipping places with ties. For example, if 2 measures have the same lowest value, the third measure will get ranking number 4.

= 
RANKX ( 
    FILTER ( 'Measures', [FilterColumn] = [FilterCondition] ), 
    [Measures], 
    , 
    TRUE (), 
    SKIP 
) 

h25: will estimate what is the measure number of the estimated 25 percentile. For 100 measures, it will be measure 26th.

=
CALCULATE ( 
    ( COUNTROWS ( 'Measures' ) - 1 ) * 0.25 + 1, 
    FILTER ( 'Measures', [FilterColumn] = [FilterCondition] ) 
) 

x25down: will get the value of the measure in 25th position when rounding h25 down.

=
MAXX ( 
    FILTER ( 
        'Measures', 
        [FilterColumn] = [FilterCondition] 
            && [RankAscSparse] <= ROUNDDOWN ( [h25], 0 ) 
    ), 
    [Measures] 
) 

x25up: will get the value of the next measure after the 25th position.

=
MAXX (
    FILTER ( 
        'Measures', 
        [FilterColumn] = [FilterCondition] 
            && [RankAscSparse] <= ROUNDDOWN ( [h25], 0 ) + 1 
    ), 
    [Measures] 
) 

The Percentile 25 calculation will apply linear interpolation to the x25down and x25up items.

=
IF ( 
    [FilterColumn] = [FilterCondition], 
    [x25down] 
        + ( [h25] - ROUNDDOWN ( [h25], 0 ) ) * ( [x25up] - [x25down] ), 
    BLANK () 
) 

With these calculations, I could get the same results as Excel with the PERCENTILE.INC function, and therefore, it was easy to validate the numbers.

Thanks for reading.

Rate

5 (7)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (7)

You rated this post out of 5. Change rating