Blog Post

SQL Server – "Denali" – Using Analytic Function CUME_DIST()

,

CUME_DIST() calculates relative position of a value relative to a group of values. The value returned by CUME_DIST() is > 0 and <= 1, which represents percentage of number of rows with value less than (for ascending order) or equal to current row.

For example, consider below data:

Year        Month       Amount

2010        1           5000.00

2010        2           6000.00

2010        3           7000.00

2010        4           2000.00

2011        1           1000.00

2011        2           2000.00

2011        3           3000.00

2011        4           4000.00

Cumulative distribution of row 4 (year = 2010, month = 4) will be 37.50%, as number of rows which has value <= row 4 are 3 rows, and the value returned by CUME_DIST() will be 0.375.

 

Using CUME_DIST():

CUME_DIST() takes an ORDER BY clause as argument, ORDER BY clause determines the order in which the operation is performed:

SELECT [Year], [Month], [Amount],

       CUME_DIST() OVER (ORDER BY [Amount]) CUME_DIST

       –CAST(CUME_DIST() OVER (ORDER BY [Amount]) * 100 AS VARCHAR) + '%' CUME_DIST

FROM   dbo.SalesData

ORDER BY [Amount]

 

Result Set:

Year        Month       Amount                CUME_DIST

2011        1           1000.00               0.125

2011        2           2000.00               0.375

2010        4           2000.00               0.375

2011        3           3000.00               0.5

2011        4           4000.00               0.625

2010        1           5000.00               0.75

2010        2           6000.00               0.875

2010        3           7000.00               1

 

CUME_DUST() with PARTITION BY:

PARTITION BY clause divides the result set into partition to which CUME_DIST() is applied:

SELECT [Year], [Month], [Amount],

       CUME_DIST() OVER (PARTITION BY [Year] ORDER BY [Amount])

       CUME_DIST

FROM   dbo.SalesData

ORDER BY [Year], [Month], [Amount]

Result Set:

Year        Month       Amount                CUME_DIST

2010        1           5000.00               0.5

2010        2           6000.00               0.75

2010        3           7000.00               1

2010        4           2000.00               0.25

2011        1           1000.00               0.25

2011        2           2000.00               0.5

2011        3           3000.00               0.75

2011        4           4000.00               1

 

CUME_DIST() can also be used to find "TOP" rows, check below queries which selects 50% of data and the different result sets generated by them:

;WITH tempTable

AS

(

       SELECT [Year], [Month], [Amount],

              CUME_DIST() OVER (ORDER BY [Amount]) CUME_DIST

       FROM   dbo.SalesData

)

SELECT [Year], [Month], [Amount]

FROM   tempTable

WHERE  CUME_DIST < 0.50

 

 

SELECT TOP 50 PERCENT [Year], [Month], [Amount]

FROM   dbo.SalesData

Result Sets:

Year        Month       Amount

———– ———– ———————

2011        1           1000.00

2011        2           2000.00

2010        4           2000.00

 

(3 row(s) affected)

 

Year        Month       Amount

———– ———– ———————

2011        1           1000.00

2011        2           2000.00

2011        3           3000.00

2011        4           4000.00

 

(4 row(s) affected)

 

CUME_DIST() has been available in Oracle since 8i, and can also be used as an aggregate function in Oracle.

 

Hope This Helps! Cheers!

Reference : Vishal (http://SqlAndMe.com)

Filed under: SQLServer, SQLServer "Denali"

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating