# Returning the Top X row for each group (SQL Spackle)

By Dave Ballantyne, (first published: 2010/12/06)

In our table we have a list of runners, the time the course was completed in and the runner's age. Our task is to retrieve the fastest X runners in each age range.

```CREATE TABLE #Runners
(
Runner integer NOT NULL,
Time integer NOT NULL,
Age integer NOT NULL
)
INSERT INTO #Runners
SELECT 1 , 10 , 20 UNION ALL
SELECT 2 , 15 , 20 UNION ALL
SELECT 3 , 11 , 20 UNION ALL
SELECT 4 , 12 , 30 UNION ALL
SELECT 5 , 18 , 30 UNION ALL
SELECT 6 , 9 , 40 UNION ALL
SELECT 7 , 16 , 40 UNION ALL
SELECT 8 , 13 , 30
```

By far the simplest way is to use a combination of the ranking functions ( Specifically row_number () ) and CTE 's (Common Table Expressions). This may not always be the most efficient, but I'll come back to that point later.

Our first task is to ascertain the order that the runners completed the course in. If we were interested in only the overall, age range aside, order we could simply

```Select *
from #Runners
order by Time
```

and then by adding then TOP clause we can filter the Top X (assuming 2 here)

```Select top(2) *
from #Runners
order by Time
```

However, we do need the TOP X for each age category. As mentioned above we will be using the row_number() ranking function to help us do this. If we were to execute

```select * ,row_number() over (order by Time ) as RowN
from #Runners
order by Rown

Runner      Time        Age         RowN
----------- ----------- ----------- --------------------
6           9           40          1
1           10          20          2
3           11          20          3
4           12          30          4
8           13          30          5
2           15          20          6
7           16          40          7
5           18          30          8
```

We get effectively the same output as before but now with an added column, which is an incrementing integer in the order of time. If there were ties and we wanted each of them treated the same then we would use the DENSE_RANK function rather than ROW_NUMBER. In some systems if there is a tie, for example two runners finished in exactly the same time for first place, then there is no following position. That means no second place runner and we skip to third. We would then use the RANK function. In our example, however, we are going to ignore the possibility of ties.

An extension of the Over clause is "PARTITION BY", which will re-initialize the counter for each group specified with it. This is usable for all the Ranking functions and pre-existing aggregate functions. PARTITION BY is very similar to using a GROUP BY clause except now we are able to have a different clause for each ranking or aggregate function used. To see this is action

```select * ,row_number() over (partition by Age order by Time ) as RowN
from #Runners
order by Age,Rown
```

So now RowN is an incrementing counter ordered by time for each distinct value within the age column. This is the column that we will now need to filter upon to find our TOP X runners. In an ideal world we would be able to execute:

```select * ,row_number() over (partition by Age order by Time ) as RowN
from #Runners
where row_number() over (partition by Age order by Time ) <=2
order by Age,Rown
```

However we cannot, so the currently suggested method is to solve this by using a CTE. This will enable us to now filter on the RowN Column.

```with cteRunners
as
(
select * ,row_number() over (partition by Age order by Time ) as RowN
from #Runners
)
Select * from cteRunners
where RowN <=2
order by Age,Rown
```

Itzik Ben-Gan has proposed using a QUALIFY clause as detailed here to resolve this issue.

The caveat

Although this will work perfectly fine, performance over a large number of rows can be relatively poor as the engine will have to read every row in the table and for those rows, evaluate the row number function so that it can be filtered upon. This can be quite a waste of resources if we only require a small amount of the total number of rows. To prove the point, first we need to generate some test data,

```IF OBJECT_ID('tempdb..#RunnersBig') IS NOT NULL drop table #RunnersBig
go
Create Table #RunnersBig
(
RunnerId integer identity ,
Time integer not null,
Age integer not null
)
go
insert into #runnersbig ( Time , Age )
select top 1000000 ABS ( checksum ( newid ()))% 1000 ,
ABS ( checksum ( newid ()))% 99
from sys . columns a cross join sys . columns b cross join sys . columns c
go
create index idxrunnersbig on #runnersbig ( age , time ) include ( runnerid )
```

This will generate a table with one million rows of randomized values and then create an index upon it. Our alternative approach will be to use the CROSS APPLY Operator and a tally table is used as our 'LEFT' table.

```with cteN
as
(
select number from master .. spt_values
where type = 'p' and number between 0 and 100
)
Select *
from cteN cross apply ( Select top ( 2 ) * from #RunnersBig where #RunnersBig . Age = cteN . number order by Time ) as runners
order by cteN . number , runners . Time
```

When compared to this filtered row_number query in Profiler:

```with cteRunners
as
(
select * , row_number () over ( partition by Age order by Time ) as RowN
from #RunnersBig
)
Select * from cteRunners
where RowN <= 2
order by Age , Rown
go
```

We will find that the filtered version executes in ~463ms whereas the CROSS APPLY a mere 1 ms YMMV.

Total article views: 29013 | Views in the last 30 days: 4

Related Articles
FORUM

### Selecting a Value of the Order Within a Group

Select To Indicate Order Number Within a Group

FORUM

### Order by numbers in proper sequence

Ordering by numbers in proper sequence order

BLOG

### ORDER BY the numbers

Have you ever needed to order by a calculated column? You might have written it something like this...

FORUM

### Paging Problem SQL 2005 using row_number and order by

Paging Problem SQL 2005 using row_number and order by

SCRIPT

### Fast Table Valued Function to return all prime numbers within a range

A TVF to select prime numbers

Tags
 sql spackle t-sql