Stairway to Advanced T-SQL

Stairway to Advanced T-SQL Level 7: Ordering Your Data Using Ranking Functions

,

What Are Ranking Functions?

Ranking functions return a ranking value based on a set of records. A ranking value is just a number, which typically starts at 1 and increments by one for each new ranking value. Depending on the ranking function the ranking value returned might be unique for each row in the result set, or some rows may have the same ranking value. In the sections that follow I will explore the different ranking functions and how they can be used.

Example of using RANK Function

The RANK function numbers each row in a partition sequentially starting at 1. A “partition” in a ranking function is a set of rows that have the same values for a specified partition column. If two rows in a partition have the same value for the ranking column (the column specified in the ORDER BY) then they will both get the same ranking value assignment. To better understand how to use the RANK function let's review the syntax:

RANK ( ) OVER ( [ PARTITION BY <partition_column> ] ORDER BY <order_by_column> )

Where:

  • <partition_column>: Identifies the column name or column names that will be used for partitioning the data
  • <order by column>: Identifies the column name or column names that will be used for ordering the output of each partition.

Note:

The PARTITION BY clause is optional. If not used, data will be ranked based on a single partition. If you specify the PARTITION BY clause with the RANK function, the ranking value will be restarted at 1 for every partition in the data set.

Now that you have an understanding of what the RANK function does, and the syntax, I'll run through a couple of RANK function examples. All of my examples will be using the AdventureWorks2012 database. If you want to follow along you can download the AdventureWorks2012 database from the following location: http://msftdbprodsamples.codeplex.com/releases/view/93587.

For my first example of using the RANK function let me run the following code:

USE AdventureWorks2012;
GO
SELECT PostalCode, StateProvinceID,
       RANK() OVER 
         (ORDER BY PostalCode ASC) AS RankingValue
FROM Person.Address 
WHERE StateProvinceID IN (23,46);

Listing 1: Simple RANK function example

When I run the code in Listing 1 I get the output in Results 1.

PostalCode      StateProvinceID RankingValue
--------------- --------------- --------------------
03064           46              1
03064           46              1
03106           46              3
03276           46              4
03865           46              5
83301           23              6
83402           23              7
83501           23              8
83702           23              9
83864           23              10

Results 1: Output when running the code in Listing 1

If you review the output in Results 1, you can see the values created by the RANK function in the column RankingValue. In this example I'm ranking based on PostalCode. Each unique PostalCode gets a different ranking value. If you look at the rows of output for PostalCode 03054 you will see two rows, where each row has a RankingValue of 1. Because there were two PostalCode values of 03064, the RankingValue of 2 was skipped. The RankingValue for Postal Code 03106 then has a RankingValue of 3. The rest of the RankingValue data values were assigned the next sequential value because their PostalCode values where all unique.

Since the PARTITION BY clause of the RANK function was not used in Listing1, the entire set was considered as a single partition. If I wanted to start the RankingValue over again for every unique StateProvinceID value all I would have to do is partition my results based on the StateProvinceID. In Listing 2 I ranked by PostalCode and partitioned by StateProvinceID

USE AdventureWorks2012;
GO
SELECT PostalCode, StateProvinceID,
       RANK() OVER 
         (PARTITION BY StateProvinceID
  ORDER BY PostalCode ASC) AS RankingValue
FROM Person.Address 
WHERE StateProvinceID IN (23,46);

Listing 2: Using the PARTITION BY clause

When I run the code in listing 2 I get the output in Results 2.

PostalCode      StateProvinceID RankingValue
--------------- --------------- --------------------
83301           23              1
83402           23              2
83501           23              3
83702           23              4
83864           23              5
03064           46              1
03064           46              1
03106           46              3
03276           46              4
03865           46              5

Results 2: Results displayed when running Listing 2.

In the output in Listing 2 there are two partitions. One partition includes all PostalCode values that have a StateProvinceID of 23 and a second partition includes the PostalCode values that have a StateProvinceID value of 46. Note the RankingValue column started at 1 for each partition.

Example of using DENSE RANK Function

When I ran the RANK function for each duplicate PostalCode value, my output skipped a RankingValue. By using the DENSE RANK function I can generate a ranking value that does not skip values. The DENSE_RANK has the following syntax:

DENSE_RANK ( ) OVER ( [ PARTIION BY <partition_column> ] ORDER BY <order_by_column> )

Where:

  • <partition_column>: Identifies the column name or column names that will be used for partitioning the data
  • <order by column>: Identifies the column name or column names that will be used for ordering the output of each partition.

The only difference in syntax between the RANK function and the DENSE_RANK is the actual function name.

To explore the DENSE RANK function let me run the code in Listing 3.

USE AdventureWorks2012;
GO
SELECT PostalCode, StateProvinceID,
       DENSE_RANK() OVER 
         (PARTITION BY StateProvinceID
  ORDER BY PostalCode ASC) AS RankingValue
FROM Person.Address 
WHERE StateProvinceID IN (23,46);

Listing 3: Using the DENSE_RANK

When I run the code in Listing 3 I get the output in Results 3.

PostalCode      StateProvinceID RankingValue
--------------- --------------- --------------------
83301           23              1
83402           23              2
83501           23              3
83702           23              4
83864           23              5
03064           46              1
03064           46              1
03106           46              2
03276           46              3
03865           46              4

Results 3: Output from running Listing 3

By reviewing the output in Results 3 you will see the rows with PostalCode 03064 have the same RankingValue, but the next PostalCode has a 2 instead of a 3. Remember in Results 2 the RANK function skipped a RankingValue for this same duplicate PostalCode. With DENSE_RANK it doesn't skip a RankingValue when a duplicate PostalCode is found. Instead it keeps all the RankingValue sequential even when duplicate values are found.

Example of using NTILE function

The NTILE function splits a set of records into groups. The number of groups returned is specified by an integer expression. Below you will find the syntax for the NTILE Function:

NTILE (integer_expression) OVER ( [ PARTIION BY <partition_column> ] ORDER BY <order_by_column> )

Where:

  • <integer_expression>: Identifies the number of different groups to create
  • <partition_column>: Identifies the column name or names that will be used for partitioning the data
  • <order by column>: Identifies the column name or names that will be used for ordering the output of each partition.

In order to better understand what the NTILE function does, let's review a number of different examples. For the first example, let's assume you want to put each PostalCode into one of two groups To accomplish meeting these requirements I will run the NTILE function found in the code in Listing 4.

USE AdventureWorks2012;
GO
SELECT PostalCode, StateProvinceID,
       NTILE(2) OVER 
         (ORDER BY PostalCode ASC) AS NTileValue
FROM Person.Address 
WHERE StateProvinceID IN (23,46);

Listing 4: NTILE Query

When I run Listing 4 I get the results in Results 4.

PostalCode      StateProvinceID NTileValue
--------------- --------------- --------------------
03064           46              1
03064           46              1
03106           46              1
03276           46              1
03865           46              1
83301           23              2
83402           23              2
83501           23              2
83702           23              2
83864           23              2

Results 4: Output from running Listing 4

By looking at the output in Results 4 you can see that there are two different NTileValue column values, 1 and 2. The two different NTileValue values were created because I specified “NTILE(2)” in my SELECT statement in Listing 4. The value inside the parentheses following the NTILE function name is an integer expression, which specifies the number of groups that should be created. As you can see in Results 4 there are 10 rows returned. The first 5 rows have an NTileValue of 1 and the last 5 rows have a NTileValue of 2. The code in Listing 4 created two groups of rows as expected, with half the rows in each group.

You might be asking yourself what happens when the set of records cannot be evenly divided by the NTILE integer_expression parameter, causing a remainder. When this happens, one remainder row is placed in each group, starting from the first group, until all remainder rows have been assigned to a group, as you will see when we look at Listing 5. Listing 5 not only shows what happens when the NTILE integer_expression causes an uneven distribution of rows, but it also shows how the integer_expression value can be a local variable.

Let's review the code in Listing 5 and the output in Results 5 to determine how SQL Server handles creating groups when the number of rows can't be evenly distributed among the groups.

USE AdventureWorks2012;
GO
DECLARE @Integer_Expression int = 4;
SELECT PostalCode, StateProvinceID,
       NTILE(@Integer_Expression) OVER 
         (ORDER BY PostalCode ASC) AS NTileValue
FROM Person.Address 
WHERE StateProvinceID IN (46,23);

Listing 5: NTile Query with odd number of records

When I run Listing 5 I get the results in Results 5.

PostalCode      StateProvinceID NTileValue
--------------- --------------- --------------------
03064           46              1
03064           46              1
03106           46              1
03276           46              2
03865           46              2
83301           23              2
83402           23              3
83501           23              3
83702           23              4
83864           23              4

Results 5: Output from running Listing 5

In the code in Listing 5 I defined a local variable named @Integer_Expression and assigned that variable a value of 4. I then used this variable in the call to NTILE function to specify returning 4 groups. If you look at the output you can see that SQL Server created 4 different groups. When you divide 10 by 4 you get a remainder of 2. This means that the first two groups should have 1 more row than the last two groups. You can confirm this by review the output in Listing 5. In this output you can see that groups 1 and 2 each have 3 rows, whereas when the NTileValue is 3 and 4 there are only two rows associated with those groups.

Just like in the RANK function, you can also create NTILE ranking values within a partition by including the PARTITION BY clause in your NTILE function call. When you add the PARTITION BY clause SQL Server will start the NTILE ranking value at 1 for each new partition. To demonstrate this let me run the code in Listing 6.

USE AdventureWorks2012;
GO
DECLARE @Integer_Expression int = 3;
SELECT PostalCode, StateProvinceID,
       NTILE(@Integer_Expression) OVER 
 (PARTITION BY StateProvinceID
  ORDER BY PostalCode ASC) AS NTileValue
FROM Person.Address 
WHERE StateProvinceID IN (46,23);

Listing 6: NTile Query with odd value

When I run Listing 6 I get the output in Results 6.

PostalCode      StateProvinceID NTileValue
--------------- --------------- --------------------
83301           23              1
83402           23              1
83501           23              2
83702           23              2
83864           23              3
03064           46              1
03064           46              1
03106           46              2
03276           46              2
03865           46              3

Results 6: Output from running Listing 6

If you review the output in Results 6 you can see the effects caused by adding the PARTITION BY clause to my NTILE function call. If you look at the column values for the output column NTileValue, you can see that the ranking values started over at 1 for the rows with a StateProvinceID value of 46. This was caused by adding the “PARTITION BY StateProvinceID” clause to my NTILE function specification.

Example of using ROW_NUMBER Function

There are times when you just want to generate a row number for each row in your output, where the row number is sequentially increased by 1 for each new row in the results set. To accomplish this we can use the ROW_NUMBER function.

Below is the syntax for the ROW_NUMBER function:

ROW_NUMBER () OVER ( [ PARTIION BY <partition_expressions> ] ORDER BY <order_by_column> )

Where:

  • <partition_expressions>: Identifies one or more expressions that will be used for partitioning the data
  • <order by column>: Identifies the column name or names that will be used for ordering the output of each partition.

To demonstrate how the ROW_NUMBER function works I will run the code in Listing 7.

USE AdventureWorks2012;
GO
SELECT PostalCode, StateProvinceID,
       ROW_NUMBER() OVER 
 (ORDER BY PostalCode ASC) AS RowNumber
FROM Person.Address 
WHERE StateProvinceID IN (46,23);

Listing 7: Using ROW_NUMBER function

When I run Listing 7 I get the results in Results 7.

PostalCode      StateProvinceID RowNumber
--------------- --------------- --------------------
03064           46              1
03064           46              2
03106           46              3
03276           46              4
03865           46              5
83301           23              6
83402           23              7
83501           23              8
83702           23              9
83864           23              10

Results 7: Output from running code in Listing 7

By looking at the output in Results 7 you can see each of my rows has a unique RowNumber column value. The RowNumber column values start at 1 and increments by 1 for each row. When I called the ROW_NUMBER() function you can see that I only specify the ORDER BY clause. In my example I'm ordering by the PostalCode column. Because I have no PARTITION BY clause the ROW_NUMBER() function returns a different RowNumber value for each row.

Suppose you wanted to order the output by PostalCode, but you wanted the RowNumber to restart at 1 for each new StateProvinceID. To accomplish this I will add a PARTITION BY clause to my query as I have done in Listing 8.

USE AdventureWorks2012;
GO
SELECT PostalCode, StateProvinceID,
       ROW_NUMBER() OVER 
 (PARTITION BY StateProvinceID
  ORDER BY PostalCode ASC) AS RowNumber
FROM Person.Address 
WHERE StateProvinceID IN (46,23);

Listing 8: Query Using PARTITION BY clause with ROW_NUMBER function

When I run Listing 8 I get the results in Results 8.

PostalCode      StateProvinceID RowNumber
--------------- --------------- --------------------
83301           23              1
83402           23              2
83501           23              3
83702           23              4
83864           23              5
03064           46              1
03064           46              2
03106           46              3
03276           46              4

Results 8: Output from running Listing 8

As you can see in Results 8, by adding the PARTITION BY clause to my query the RowNumber column value restarts at 1 for each new StateProvinceID value.

Summary

There are many different ways to order your data, and some of those ways require you to assign a sequential number. In this article I showed you how the ROW_NUMBER, NTILE, RANK and DENSE_RANK functions allows you to generate a sequential value column with each row of data. Next time you need to generate a sequential number you should consider using one of these ranking functions.

Question and Answer

In this section you can review how well you have understood using the different ranking functions by answering the following questions.

Question 1:

Which ranking function allows you to specify an integer expression that identifies the number of groups you want your data split up into?

  • The RANK function
  • The DENSE_RANK function
  • The NTILE function
  • The ROW_NUMBER function

Question 2:

What does the DENSE_RANK function do differently than the RANK() function?

  • Skips values when generating ranking numbers when there are ties/duplicates.
  • Doesn't skip values then generating ranking numbers when there are ties / duplicates.

Question 3:

Which clause causes the ranking number to restart at 1 for each different groupings of data?

  • ORDER BY
  • OVER
  • PARTITION BY

Answers:

Question 1:

The correct answer is c. The NTILE function uses the integer value provided to split your data up into groups.

Question 2:

The correct answer is b. The DENSE_RANK function doesn't skip sequence number when it encounters ties or duplicate values.

Question 3:

The correct answer is c. The PARTITION clause tell SQL Server to restart the ranking number at 1 each time a new partition column value is returned.

This article is part of the parent stairway Stairway to Advanced T-SQL

Rate

5 (4)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (4)

You rated this post out of 5. Change rating