SQLServerCentral Article

SQL Window Functions Series: RANK() and DENSE_RANK()

,

Welcome to the fascinating world of SQL Window functions! As we journey through the intricate labyrinths of databases and large datasets, having the right tools to aggregate, compute, and analyze data becomes indispensable. Today, we'll explore two of these tools in detail: RANK() and DENSE_RANK(). But before we dive into these functions, let’s understand what window functions are and why we should use them.

What are Window Functions?

SQL window functions, also known as OLAP (Online Analytical Processing) functions, are a subset of SQL functions that operate over a set (or "window") of rows related to the current row within the result set. Unlike aggregate functions, which return a single value for a group of rows, window functions return a single value for each row from the underlying query, based on the related set of rows.

Why would someone want to use Window Functions? Firstly, they provide deeper insights into data, allowing for calculations across sets of rows related to the current row. Additionally, they often lead to more efficient queries as compared to self-joins or subqueries. Moreover, they can provide running totals, moving averages, and other cumulative metrics without altering the granularity of the query result.

Popular Window Functions

To illustrate, here's a list of some of the most frequently used window functions in SQL:

  • RANK()
  • DENSE_RANK()
  • LAG()
  • LEAD()
  • ROW_NUMBER()
  • FIRST_VALUE()
  • LAST_VALUE()
  • NTILE()

Let's take a closer look at RANK() and DENSE_RANK()

RANK()

The RANK() function assigns a unique rank to each distinct row within a result set based on the values in one or more columns. If two (or more) rows have the same values in the specified columns, they get the same rank. However, when identical ranks are assigned, it causes the next rank(s) in the sequence to be skipped, depending on the number of duplicates.

Let's illustrate the function with an example. Imagine we have a table students with the following data:

name | score 
-----|-------
Amy  | 90 
Bob  | 85 
Cal  | 85 
Dan  | 88
Eon  | 80 
Fin  | 78

To rank these students based on their scores, we use the RANK() function. The SQL command will look like this:

SELECT name, score, RANK() OVER(ORDER BY score DESC) as ranking
FROM students;

Breaking down the SQL command:

  • SELECT name, score: Fetches the name and score columns from our table.
  • RANK(): This is the function we're focusing on. It assigns a rank based on a specific order.
  • OVER(ORDER BY score DESC): This clause determines the ordering of the rank. We want to rank from the highest to the lowest score, so we want the lowest rank value assigned to the highest score. Conversely, in ascending order, the lowest value gets the top rank. The OVER clause defines the "window" over which the ranking is computed.
name | score | ranking
-----|-------|--------
Amy  | 90    | 1
Dan  | 88    | 2
Bob  | 85    | 3
Cal  | 85    | 3
Eon  | 80    | 5
Fin  | 78    | 6

Notice how the ranking for 'Cal' is the same as 'Bob' since they have the same score. As a result, the next rank, 4, is skipped.

Let's introduce another column subject to the students table, which indicates the subject of the exam they took. This will allow us to rank students not just based on their scores, but also grouped by subjects.

name  | score | subject
------|-------|---------
Amy   | 90    | Math
Bob   | 85    | Math
Cal   | 85    | Science
Dan   | 88    | Science
Eon   | 80    | Science
Fin   | 78    | Math

With the inclusion of the subject column, we can now demonstrate how RANK() operates within categories or groups. Let's rank students within each subject based on their scores:

SELECT name, score, subject,
RANK() OVER(PARTITION BY subject ORDER BY score DESC) as ranking_within_subject
FROM students;

How This Works:

  1. PARTITION BY: The PARTITION BY clause divides the result set into partitions (or groups) based on the values in the subject column. In this case, we get two partitions: one for Math and another for Science.
  2. ORDER BY: Within each partition, the ORDER BY clause orders students by their scores in descending order.
  3. RANK(): The RANK() function then assigns a rank within each partition based on the ordering defined.
name | score | subject | ranking_within_subject
-----|-------|---------|-----------------------
Amy  | 90    | Math    | 1
Bob  | 85    | Math    | 2
Fin  | 78    | Math    | 3
Dan  | 88    | Science | 1
Cal  | 85    | Science | 2
Eon  | 80    | Science | 3

This approach of using PARTITION BY along with RANK() is exceptionally powerful when we need category-wise rankings or computations in our data.

In practice, when working with databases, the inclusion of different columns or adjusting the PARTITION BY and ORDER BY clauses can yield varied and insightful results tailored to specific analytical needs.

DENSE_RANK()

DENSE_RANK() operates similarly to RANK(). The significant difference is that DENSE_RANK() doesn't skip any ranks when there are duplicates, ensuring a continuous ranking sequence. Using the same students table but using DENSE_RANK() to assign ranking.

name | score 
-----|-------
Amy  | 90 
Bob  | 85 
Cal  | 85 
Dan  | 88
Eon  | 80 
Fin  | 78

Here is the query:

SELECT name, score, DENSE_RANK() OVER(ORDER BY score DESC) as dense_ranking
FROM students;

Breaking down the SQL command:

  • SELECT name, score: Fetches the name and score columns from our table.
  • DENSE_RANK(): This is the function we're focusing on. It assigns a rank based on a specific order.
  • OVER(ORDER BY score DESC): This clause determines the ordering of the rank. ORDER BY score DESC means we are ranking from the highest to the lowest score. The OVER clause defines the "window" over which the ranking is computed.
name | score | dense_ranking
-----|-------|--------------
Amy  | 90    | 1
Dan  | 88    | 2
Bob  | 85    | 3
Cal  | 85    | 3
Eon  | 80    | 4
Fin  | 78    | 5

With DENSE_RANK(), 'Cal' and 'Bob' still both get a ranking of 3 because of their tie. However, the next rank isn't skipped; 'Eon' gets the immediate next rank, which is 4.

Alright, let's take it up a notch, by introducing two more columns to our students table: class (indicating which class the student is in) and attendance (indicating the percentage of classes attended).

name  | score | class    | attendance
------|-------|----------|----------------
Amy   | 90    | 10A      | 95%
Bob   | 85    | 10B      | 88%
Cal   | 85    | 10A      | 90%
Dan   | 88    | 10B      | 85%
Eon   | 90    | 10A      | 93%
Fin   | 78    | 10B      | 82%

Say you want to write a query to do the following:

  • rank students based on their scores,
  • but in case of a tie, you want to use attendance as the tie-breaker.
  • Moreover, you wish to rank them within each class

We can use this code:

SELECT name, score, class, attendance,
DENSE_RANK() OVER(PARTITION BY class ORDER BY score DESC, attendance DESC) as dense_rank_within_class
FROM students;

Let's break down the command:

  • PARTITION BY: This divides the result set into partitions based on the values in the class column. We get two partitions: one for 10A and another for 10B.
  • ORDER BY: Within each partition, we first order students by their scores in descending order. If scores are identical, we use attendance as a tie-breaker, also in descending order.
  • DENSE_RANK(): This function assigns a dense rank within each partition based on the ordering defined.
name | score | class    | attendance | dense_rank_within_class
-----|-------|----------|------------|------------------------
Amy  | 90    | 10A      | 95%        | 1
Eon  | 90    | 10A      | 93%        | 2
Cal  | 85    | 10A      | 90%        | 3
Dan  | 88    | 10B      | 88%        | 1
Bob  | 85    | 10B      | 85%        | 2
Fin  | 78    | 10B      | 82%        | 3

Breaking down the result, we can see for class 10A:

  • 'Amy', with a score of 90 and 95% attendance, leads with a rank of 1.
  • 'Eon', despite scoring 90 (the same as Amy), follows with a rank of 2 due to slightly lower attendance.
  • 'Cal' is next with a rank of 3.

And for  class 10B:

  • 'Dan' takes the top rank of 1 with a score of 88 and 88% attendance.
  • 'Bob' follows with a rank of 2.
  • 'Fin' rounds it up with a rank of 3.

DENSE_RANK() ensures that the ranks are continuous without any gaps, regardless of ties. By using multiple columns in our ORDER BY clause, we can define intricate ranking criteria tailored to various situations.

Common Pitfalls and Solutions

While window functions like RANK() and DENSE_RANK() can be powerful tools in your SQL toolkit, they're not without their nuances that can trip up even seasoned professionals. Here are some common pitfalls and their solutions:

  • Forgetting the ORDER BY clause in the OVER() partition: The ORDER BY clause in the OVER() partition is crucial for determining the order of ranking. Without it, the ranking order becomes unpredictable. Always specify an ORDER BY clause within the OVER() partition to ensure consistent and expected rankings.
  • Misunderstanding the difference between RANK() and DENSE_RANK(): These two functions seem similar but handle tied ranks differently, leading to unexpected results. Always remember that RANK() skips the next rank in case of a tie, while DENSE_RANK() does not. Choose the function that suits your ranking criteria.
  • Not Using Partition BY when Needed: If you're ranking within specific categories or groups and forget the PARTITION BY clause, your ranks will be calculated over the entire result set, not within the desired partitions. Always remember to use the PARTITION BY clause in the OVER() partition to rank within specific categories or groups.

Performance Considerations for RANK() and DENSE_RANK()

Like all SQL functions, window functions have performance considerations, especially when dealing with large datasets. Here's what you need to know:

  • Large Data Sets: Running window functions on extensive datasets can be resource-intensive and slow. Consider filtering your data first with a subquery or a CTE (Common Table Expression) to reduce the dataset size before applying window functions.
  • Multiple Window Functions in a Single Query: Using multiple window functions or different PARTITION BY clauses can increase the complexity and execution time of the query. Start with analyzing the execution plan to see if there's any optimization you can do, like reusing the result of one window function for others if possible.
  • Indexes: If the columns specified in the ORDER BY clause of the OVER() partition aren't indexed, the performance can degrade. Therefore, Index the columns used in the ORDER BY clause. This can significantly improve the performance of window functions.
  • Choosing Between RANK() and DENSE_RANK(): While both functions are similar in many respects, there might be minor performance differences based on the RDBMS and the data distribution. If you're unsure which function to use and performance is a concern, test both on a sample of your data and see if one consistently outperforms the other.

Summing Up

Window functions are powerful tools in the SQL arsenal, providing deeper insights into datasets without complicating the query's structure. RANK() and DENSE_RANK() are particularly useful when you want to assign rankings to your dataset based on certain criteria. Whether you're dealing with competition scores, sales figures, or any other metric where rank matters, these functions can simplify your task.

Stay Tuned! In our next article, we'll shift our focus to LAG() and LEAD(), two window functions that allow you to look backward and forward within your result set, respectively. They're game-changers when it comes to analyzing sequential or time-series data. See you then!

Call to Action to the Reader

Your insights and experiences are invaluable! As I constantly strive to deliver accurate and user-friendly content, I highly encourage readers to:

  • Leave Comments: Did you find a section particularly enlightening? Or perhaps there's something you didn't quite understand? Leave your thoughts in the comments section.
  • Share Your Experiences: Have you used RANK() or DENSE_RANK() in unique ways? Discuss your real-world scenarios and use cases.
  • Ask Questions: If there's something you're curious about or if a certain topic piques your interest, don't hesitate to ask. Either fellow readers or we will strive to provide answers.
  • Suggest Future Topics: Our next deep-dive is into LAG() and LEAD(). If there are other functions or SQL topics you're keen on exploring, let us know!

Share

Rate

5 (4)

You rated this post out of 5. Change rating