SQLServerCentral Article

Sequential Ordering on SQL 2005

,

The Problem

Once upon a time Mr. Francois Ansee, SQL guy extraordinaire, wanted to get a list of his customers.

The catch? They had to be sequentially numbered, in alphabetical order by CustomerID.

It's one of the most basic tasks a SQL guy (or gal) will face, and in prior versions of SQL Server, the choices could be

frustrating. Let's consider Francois' problem and look at some possible solutions in SQL 2000. Here's the result our

intrepid SQL guru is looking for:

One of the first solutions to present itself is also the simplest. Francois can simply add an IDENTITY column to the

Customers table to automatically generate numeric values. There are several downsides to this method:

  • The values inserted might have gaps in them. SQL Books Online specifies that this is an issue for

    "tables with frequent deletions"; however, I have also identified the gap issue as a problem for tables

    where multiple INSERT statements are running simultaneously.

  • If Francois deletes all the rows from the table, the IDENTITY column does not start counting from 1 when new rows

    are inserted.

  • If Francois wants his customer list sorted by ContactName instead of CustomerID, his IDENTITY column numbers

    will display all out of order.

  • Company policy might not allow "quick and dirty" changes to tables in production databases.

Another option for Francois is to create triggers to generate sequential numeric values and store them in the table

at insert time. Using this method he can also compensate for gaps between values. This can cause a performance hit

and can become complex. Add this to the fact that Francois does not like to work any harder than he has to, and we can

safely move on to the next option.

Francois could use a cursor and a counter variable to iterate the rows one-by-one. Francois prefers set-based solutions,

and he avoids cursors.

Our SQL guy now has three fewer solutions than he started with. He finally decides on a classic self-join with the

SQL COUNT(*) aggregate function. To generate the results above, using the Northwind sample database, Francois uses

the following query:


SELECT COUNT(*) AS [Row Number], c.CustomerID, c.CompanyName, c.ContactName FROM Customers c, Customers d WHERE d.CustomerID GROUP BY c.CustomerID, c.CompanyName, c.ContactName ORDER BY c.CustomerID

This method consistently generates proper sequence ranking for your table, it is 100% set-based, and it can easily be

modified to sort by ContactName (or any other column) instead of CustomerID. The downside to this method is that

self-joining a very large table can be resource-intensive and hurt performance.

SQL 2005 to the Rescue

The SQL Server team has introduced several new features in SQL 2005 to make short work of numbering problems like this.

In particular, to deal with this exact problem they’ve added the ROW_NUMBER function. Let’s say that Francois now

wants to list all the contacts in the SQL 2005 AdventureWorks sample database, with the rows numbered sequentially. He

could use the following query:


SELECT ROW_NUMBER() OVER (ORDER BY LastName, FirstName, MiddleName) AS 'Row Number', ContactID, LastName, FirstName, MiddleName, Suffix FROM Person.Contact

To get these results:

Let's look at the ROW_NUMBER function's format for a moment:

ROW_NUMBER() OVER ([PARTITION BY value_expression, ...] [ORDER BY column, ...])

The OVER clause is the key. It allows you to specify an ORDER BY clause to determine the order of counting. In

the example, we are specifying that the rows will be numbered sequentially in LastName, FirstName, MiddleName sort order.

The optional PARTITION BY clause specifies a partition, after which the row numbering re-starts. If we modify the query

above to include a PARTITION BY LastName, row numbering will re-start after each new LastName in the table. Our modified

query looks like this:


SELECT ROW_NUMBER() OVER (PARTITION BY LastName ORDER BY LastName, FirstName, MiddleName) AS 'Row Number', ContactID, LastName, FirstName, MIddleName, Suffix FROM Person.Contact

And here are the new results:

Notice how the PARTITION BY clause forces the row numbers re-start at 1 each time the LastName changes.

Enter the Sales Team

The members of the sales team at Adventureworks are constantly asking Francois to run queries against the data in the SQL

database. Sometimes they ask for simple queries, like a list of all customers who placed a single

order larger than $1,000. Other times they ask for more complex results, like a list of all the best sales days (in terms

of $$$) for 2001. For the latter query, our high-speed SQL guru decides to use the new SQL 2005 RANK function:


WITH TotalSalesBySalesDate (TotalSales, OrderDate) AS ( SELECT SUM(soh.SubTotal) AS TotalSales, soh.OrderDate FROM Sales.SalesOrderHeader soh WHERE soh.OrderDate >= '1/1/2001' AND soh.OrderDate GROUP BY soh.OrderDate ) SELECT RANK() OVER (ORDER BY TotalSales DESC) AS 'Rank', TotalSales, OrderDate FROM TotalSalesBySalesDate

The RANK function is similar in syntax to ROW_NUMBER, but it serves a slightly different purpose. RANK numbers rows

like ROW_NUMBER but it assigns the same row number to rows in the event of a "tie." You can see this for

yourself in the results of the query above:

Notice that days with the same total sales display with the same rank number, and the rank numbers jump after a

tied result. In this example, the rank jumps from 14 to 16 because of the two-way tie for the 14th position.

DENSE_RANK is a variation of the RANK function. Unlike RANK, the DENSE_RANK function doesn't jump after a tied

result. Here is the query above using DENSE_RANK:


WITH TotalSalesBySalesDate (TotalSales, OrderDate) AS ( SELECT SUM(soh.SubTotal) AS TotalSales, soh.OrderDate FROM Sales.SalesOrderHeader soh WHERE soh.OrderDate >= '1/1/2001' AND soh.OrderDate GROUP BY soh.OrderDate ) SELECT DENSE_RANK() OVER (ORDER BY TotalSales DESC) AS 'Rank', TotalSales, OrderDate FROM TotalSalesBySalesDate

And here is the result:

After the two-way tie for number 14, DENSE_RANK continues counting sequentially with 15.

Even More Complex Numbering

SQL Server 2005 adds another function called NTILE. This function divides your result set into a specified number

of groups (roughly) equal in size. The format to use NTILE is similar to the other ranking/numbering functions,

except that you also specify the number of "tiles" to divide your data into. For the example, we'll use

an even more complex sales report. This time the sales team wants a report that divides up the sales territories into

4 quartiles, based on total sales:


WITH TotalSalesByTerritory (TotalSales, TerritoryID, [Name], CountryRegionCode) AS ( SELECT SUM(soh.SubTotal) AS TotalSales, soh.TerritoryID, st.[Name], st.CountryRegionCode FROM Sales.SalesOrderHeader soh, Sales.SalesTerritory st WHERE soh.TerritoryID = st.TerritoryID GROUP BY soh.TerritoryID, st.[Name], st.CountryRegionCode ) SELECT NTILE(4) OVER (ORDER BY TotalSales DESC) AS 'Quartile', TotalSales, TerritoryID, [Name], CountryRegionCode FROM TotalSalesByTerritory

This query produces the following result:

If you look closely at the results, you will see that the first and second quartiles have 3 rows each, while the third

and fourth have two rows each. Since our total number of rows does not divide evenly into 4, the top quartiles are

expanded slightly to hold more rows.

Of course NTILE does not have to specify "quartiles" (1/4ths). Here is another sales example, which divvies

up the sales team into "quintiles" (1/5ths) based on total sales:


WITH TotalSalesBySalesPerson (TotalSales, SalesPersonID, LastName, FirstName, MiddleName) AS ( SELECT SUM(soh.TotalDue) AS TotalSales, soh.SalesPersonID, co.LastName, co.FirstName, co.MiddleName FROM Sales.SalesOrderHeader soh, Sales.SalesPerson sp, HumanResources.Employee em, Person.Contact co WHERE soh.SalesPersonID = sp.SalesPersonID AND sp.SalesPersonID = em.EmployeeID AND em.ContactID = co.ContactID AND soh.OrderDate >= '1/1/2004' AND soh.OrderDate GROUP BY soh.SalesPersonID, co.LastName, co.FirstName, co.MiddleName ) SELECT NTILE(5) OVER (ORDER BY TotalSales DESC) AS 'Quintile', TotalSales, SalesPersonID, LastName, FirstName, MiddleName FROM TotalSalesBySalesPerson

Conclusions

The SQL Server team has added new functions to SQL Server 2005 to make it easier to perform common row-numbering tasks,

including ROW_NUMBER, RANK, DENSE_RANK and NTILE. These functions make row numbering tasks easier than ever.

Rate

5 (5)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (5)

You rated this post out of 5. Change rating