Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Sequential Ordering on SQL 2005

By Michael Coles,

The Problem

Once upon a time Mr. Francois Ansee, SQL guy extraordinaire, wanted to get a list of his customers. The catch? They had to be sequentially numbered, in alphabetical order by CustomerID.

It's one of the most basic tasks a SQL guy (or gal) will face, and in prior versions of SQL Server, the choices could be frustrating. Let's consider Francois' problem and look at some possible solutions in SQL 2000. Here's the result our intrepid SQL guru is looking for:

One of the first solutions to present itself is also the simplest. Francois can simply add an IDENTITY column to the Customers table to automatically generate numeric values. There are several downsides to this method:

  • The values inserted might have gaps in them. SQL Books Online specifies that this is an issue for "tables with frequent deletions"; however, I have also identified the gap issue as a problem for tables where multiple INSERT statements are running simultaneously.
  • If Francois deletes all the rows from the table, the IDENTITY column does not start counting from 1 when new rows are inserted.
  • If Francois wants his customer list sorted by ContactName instead of CustomerID, his IDENTITY column numbers will display all out of order.
  • Company policy might not allow "quick and dirty" changes to tables in production databases.
Another option for Francois is to create triggers to generate sequential numeric values and store them in the table at insert time. Using this method he can also compensate for gaps between values. This can cause a performance hit and can become complex. Add this to the fact that Francois does not like to work any harder than he has to, and we can safely move on to the next option.

Francois could use a cursor and a counter variable to iterate the rows one-by-one. Francois prefers set-based solutions, and he avoids cursors.

Our SQL guy now has three fewer solutions than he started with. He finally decides on a classic self-join with the SQL COUNT(*) aggregate function. To generate the results above, using the Northwind sample database, Francois uses the following query:


SELECT COUNT(*) AS [Row Number], c.CustomerID, c.CompanyName, c.ContactName FROM Customers c, Customers d WHERE d.CustomerID <= c.CustomerID GROUP BY c.CustomerID, c.CompanyName, c.ContactName ORDER BY c.CustomerID

This method consistently generates proper sequence ranking for your table, it is 100% set-based, and it can easily be modified to sort by ContactName (or any other column) instead of CustomerID. The downside to this method is that self-joining a very large table can be resource-intensive and hurt performance.

SQL 2005 to the Rescue

The SQL Server team has introduced several new features in SQL 2005 to make short work of numbering problems like this. In particular, to deal with this exact problem they’ve added the ROW_NUMBER function. Let’s say that Francois now wants to list all the contacts in the SQL 2005 AdventureWorks sample database, with the rows numbered sequentially. He could use the following query:


SELECT ROW_NUMBER() OVER (ORDER BY LastName, FirstName, MiddleName) AS 'Row Number', ContactID, LastName, FirstName, MiddleName, Suffix FROM Person.Contact

To get these results:

Let's look at the ROW_NUMBER function's format for a moment:

ROW_NUMBER() OVER ([PARTITION BY value_expression, ...] [ORDER BY column, ...])

The OVER clause is the key. It allows you to specify an ORDER BY clause to determine the order of counting. In the example, we are specifying that the rows will be numbered sequentially in LastName, FirstName, MiddleName sort order.

The optional PARTITION BY clause specifies a partition, after which the row numbering re-starts. If we modify the query above to include a PARTITION BY LastName, row numbering will re-start after each new LastName in the table. Our modified query looks like this:


SELECT ROW_NUMBER() OVER (PARTITION BY LastName ORDER BY LastName, FirstName, MiddleName) AS 'Row Number', ContactID, LastName, FirstName, MIddleName, Suffix FROM Person.Contact

And here are the new results:

Notice how the PARTITION BY clause forces the row numbers re-start at 1 each time the LastName changes.

Enter the Sales Team

The members of the sales team at Adventureworks are constantly asking Francois to run queries against the data in the SQL database. Sometimes they ask for simple queries, like a list of all customers who placed a single order larger than $1,000. Other times they ask for more complex results, like a list of all the best sales days (in terms of $$$) for 2001. For the latter query, our high-speed SQL guru decides to use the new SQL 2005 RANK function:


WITH TotalSalesBySalesDate (TotalSales, OrderDate) AS ( SELECT SUM(soh.SubTotal) AS TotalSales, soh.OrderDate FROM Sales.SalesOrderHeader soh WHERE soh.OrderDate >= '1/1/2001' AND soh.OrderDate < '1/1/2002' GROUP BY soh.OrderDate ) SELECT RANK() OVER (ORDER BY TotalSales DESC) AS 'Rank', TotalSales, OrderDate FROM TotalSalesBySalesDate

The RANK function is similar in syntax to ROW_NUMBER, but it serves a slightly different purpose. RANK numbers rows like ROW_NUMBER but it assigns the same row number to rows in the event of a "tie." You can see this for yourself in the results of the query above:

Notice that days with the same total sales display with the same rank number, and the rank numbers jump after a tied result. In this example, the rank jumps from 14 to 16 because of the two-way tie for the 14th position.

DENSE_RANK is a variation of the RANK function. Unlike RANK, the DENSE_RANK function doesn't jump after a tied result. Here is the query above using DENSE_RANK:


WITH TotalSalesBySalesDate (TotalSales, OrderDate) AS ( SELECT SUM(soh.SubTotal) AS TotalSales, soh.OrderDate FROM Sales.SalesOrderHeader soh WHERE soh.OrderDate >= '1/1/2001' AND soh.OrderDate < '1/1/2002' GROUP BY soh.OrderDate ) SELECT DENSE_RANK() OVER (ORDER BY TotalSales DESC) AS 'Rank', TotalSales, OrderDate FROM TotalSalesBySalesDate

And here is the result:

After the two-way tie for number 14, DENSE_RANK continues counting sequentially with 15.

Even More Complex Numbering

SQL Server 2005 adds another function called NTILE. This function divides your result set into a specified number of groups (roughly) equal in size. The format to use NTILE is similar to the other ranking/numbering functions, except that you also specify the number of "tiles" to divide your data into. For the example, we'll use an even more complex sales report. This time the sales team wants a report that divides up the sales territories into 4 quartiles, based on total sales:


WITH TotalSalesByTerritory (TotalSales, TerritoryID, [Name], CountryRegionCode) AS ( SELECT SUM(soh.SubTotal) AS TotalSales, soh.TerritoryID, st.[Name], st.CountryRegionCode FROM Sales.SalesOrderHeader soh, Sales.SalesTerritory st WHERE soh.TerritoryID = st.TerritoryID GROUP BY soh.TerritoryID, st.[Name], st.CountryRegionCode ) SELECT NTILE(4) OVER (ORDER BY TotalSales DESC) AS 'Quartile', TotalSales, TerritoryID, [Name], CountryRegionCode FROM TotalSalesByTerritory

This query produces the following result:

If you look closely at the results, you will see that the first and second quartiles have 3 rows each, while the third and fourth have two rows each. Since our total number of rows does not divide evenly into 4, the top quartiles are expanded slightly to hold more rows.

Of course NTILE does not have to specify "quartiles" (1/4ths). Here is another sales example, which divvies up the sales team into "quintiles" (1/5ths) based on total sales:


WITH TotalSalesBySalesPerson (TotalSales, SalesPersonID, LastName, FirstName, MiddleName) AS ( SELECT SUM(soh.TotalDue) AS TotalSales, soh.SalesPersonID, co.LastName, co.FirstName, co.MiddleName FROM Sales.SalesOrderHeader soh, Sales.SalesPerson sp, HumanResources.Employee em, Person.Contact co WHERE soh.SalesPersonID = sp.SalesPersonID AND sp.SalesPersonID = em.EmployeeID AND em.ContactID = co.ContactID AND soh.OrderDate >= '1/1/2004' AND soh.OrderDate < '1/1/2005' GROUP BY soh.SalesPersonID, co.LastName, co.FirstName, co.MiddleName ) SELECT NTILE(5) OVER (ORDER BY TotalSales DESC) AS 'Quintile', TotalSales, SalesPersonID, LastName, FirstName, MiddleName FROM TotalSalesBySalesPerson

Conclusions

The SQL Server team has added new functions to SQL Server 2005 to make it easier to perform common row-numbering tasks, including ROW_NUMBER, RANK, DENSE_RANK and NTILE. These functions make row numbering tasks easier than ever.

Total article views: 18192 | Views in the last 30 days: 13
 
Related Articles
FORUM

Order by numbers in proper sequence

Ordering by numbers in proper sequence order

SCRIPT

Format Phone Number function

Function to format phone numbers entered as text.

FORUM

Paging Problem SQL 2005 using row_number and order by

Paging Problem SQL 2005 using row_number and order by

SCRIPT

Function to get number of days in month

Function to get number of days in month

FORUM

Order by

Ordering rows by column with fullstops in between numbers

Tags
sql server 2005    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones