Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

The Joy of Numbers

By Michael Coles,

Introduction

"You must unlearn what you have learned" - Yoda

What exactly is a numbers table and more importantly, what can it do for you? To put it simply, a numbers table is cheap (read: "free") tool that should be part of every SQL developer's toolkit. In its simplest form, a numbers table is a table with one INT column. That single column, which also serves as the primary key and clustered index, contains the natural (or "counting") numbers starting with 1 and going up to some specified upper limit. We'll begin by describing how to create a numbers table, and then move on to a discussion of why they are so useful. As always, I'll throw in some sample code (for SQL Server 2000 and SQL Server 2005) so you can play along at home.

Make Your Own Numbers Table!

Let me start by saying that every database should have its own numbers table. They're very useful (as we'll see later), and extremely easy to create. We'll start with the creation of the numbers table. This will work in SQL Server 2000 or SQL Server 2005:

-- Create the numbers table
CREATE TABLE dbo.Numbers (Num INT NOT NULL PRIMARY KEY CLUSTERED);
GO

Well that was simple enough. Now comes the fun part - populating it. There are lots of methods to populate the numbers table, but we'll start with the most obvious solution first. This method is a simple WHILE loop that will populate our numbers table one row at a time:

DECLARE @i INT;
SELECT @i = 1;
WHILE @i <= 10000
BEGIN
    INSERT INTO dbo.Numbers(Num) VALUES (@i);
    SELECT @i = @i + 1;
END;

This simple procedural solution works on both SQL 2000 and SQL 2005. There are numerous other clever ways to create a numbers table. Here's one of my personal favorites (thanks to Jeff Moden and all others who posted it for public consumption). This method uses the Transact-SQL SELECT INTO statement and IDENTITY function to simultaneously create and populate the table, so we have to drop the previously created numbers table. This example works on SQL 2000 or SQL 2005:

-- Be sure to drop the numbers table if it exists
DROP TABLE dbo.Numbers;
GO
-- Now re-create it and fill it with sequential numbers starting at 1
SELECT TOP 10000 IDENTITY(INT,1,1) AS Num
INTO dbo.Numbers
FROM master.INFORMATION_SCHEMA.COLUMNS i1
CROSS JOIN master.INFORMATION_SCHEMA.COLUMNS i2;
GO
-- Add a primary key/clustered index to the numbers table
ALTER TABLE dbo.Numbers
ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Num);
GO

Notice that this set-based method offers a significant speed improvement over the WHILE loop version. This simple query relies on the fact that a self-CROSS JOIN creates an exponential result set. If your master.INFORMATION_SCHEMA.COLUMNS table has over 400 rows in it (not unreasonable), the CROSS JOIN will generate over 160,000 rows. If you need more just slap another CROSS JOIN in there. We're using the TOP keyword on the SELECT statement to limit the results to 10,000 rows for our purposes though. Feel free to populate your numbers table with as many numbers as you need up to the upper limit for an INT, which happens to be 2,147,483,647.

Now that we have a numbers table it's time to put it to work solving some problems.

The Numbers Table in Action

The power of the numbers table is that it can be used to convert procedural solutions to set-based solutions. Consider the ubiquitous comma-delimited string-split function. As before we'll start with the procedural version which basically consists of a WHILE loop. This code works on SQL 2000 and SQL 2005:

CREATE FUNCTION dbo.fnProceduralSplit(@string VARCHAR(8000))
RETURNS @ret TABLE (
    Num INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
    String VARCHAR(255))
AS
BEGIN
    DECLARE @i INT, @j INT;
    SELECT @i = 1;
    SELECT @j = CHARINDEX(',', @string);
    WHILE (@j > 0)
    BEGIN
        INSERT INTO @ret (String)
        VALUES (SUBSTRING(@string, @i, @j - @i));
        SELECT @i = @j + 1;
        SELECT @j = CHARINDEX(',', @string, @i);
    END;
    INSERT INTO @ret (String)
    VALUES (SUBSTRING(@string, @i, LEN(@string) - @i + 1));
    RETURN;
END
GO

All this code does is loop through the string passed in as a parameter looking for commas. It breaks out the strings between any commas it finds. It's probably about as efficient as you can get for procedural code. So now let's see what the equivalent function looks like in a set-based version on SQL 2000 and SQL 2005. In this version we'll let SQL Server do all the heavy lifting. We'll use an inline table-valued function for performance, and we'll return the number representing the position of each comma in the list to keep the relative ordering of strings in the list:

CREATE FUNCTION dbo.fnSetSplit (@String VARCHAR(8000))
RETURNS TABLE
AS
RETURN (
    SELECT Num,
        SUBSTRING(@String,
            CASE Num
                WHEN 1 THEN 1
                ELSE Num + 1
            END,
            CASE CHARINDEX(',', @String, Num + 1)
                WHEN 0 THEN LEN(@String) - Num + 1
                ELSE CHARINDEX(',', @String, Num + 1) - Num -
                    CASE 
                        WHEN Num > 1 THEN 1
                        ELSE 0
                    END
               END
            ) AS String
    FROM dbo.Numbers
    WHERE Num <= LEN(@String)
        AND (SUBSTRING(@String, Num, 1) = ','
            OR Num = 1)
)

You can test the functionality of your new UDF's with queries like the following:

SELECT * FROM dbo.fnSetSplit ('Jack,Jill,Hill,Water');
SELECT * FROM dbo.fnSetSplit ('Alexander The Great');
SELECT * FROM dbo.fnSetSplit ('Red,Green,Blue');
SELECT * FROM dbo.fnProceduralSplit ('Army,Navy,Air Force,Marines');
SELECT * FROM dbo.fnProceduralSplit ('Washington,Jefferson,Adams');
SELECT * FROM dbo.fnProceduralSplit ('1,2,3,4,5,6,7,8');

On an individual basis, for single queries like those above you might not notice a large difference in performance, but for thousands of such queries in a short time period, the difference can be significant. I ran several rounds of comparisons splitting 100,000 randomly generated comma-separated lists of "words"*. Here are the averages:

 SQL 2000SQL 2005
Procedural Function130,783 ms86,033 ms
Set-based Function55,390 ms51,423 ms**

Notes:
*The "words" were acutally randomly generated strings from 1 to 20 characters in length.
**In SQL 2005, using the CROSS APPLY operator reduced the time of the set-based test to 27,296 ms.

Of course performance may vary on your system, but the results I got show a definite improvement when using the set-based version over the procedural version.

Here Comes the Second Example

OK, still not convinced? Then let's try another example. Here we'll eliminate duplicate side-by-side letters from a string. This is a commonly performed operation in phonetic encoding and approximate search routines, and here we'll do it in a simple set-based UDF that will work on SQL 2000 or 2005:

CREATE FUNCTION dbo.fnReduceDupes(@string VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN
    DECLARE @Result VARCHAR(8000);
    SELECT @Result = @string;
    SELECT @Result =
        STUFF(@Result, Num, 1,
            CASE SUBSTRING(@Result, Num, 1)
                WHEN SUBSTRING(@Result, Num + 1, 1) THEN '!'
                ELSE SUBSTRING(@Result, Num, 1)
            END)
    FROM dbo.Numbers
    WHERE Num <= LEN(@Result);
    SELECT @Result = REPLACE(@Result, '!', '');
    RETURN @Result;
END

This UDF reduces sequences of side-by-side duplicate characters to a single character. It does this by first replacing all but the last of any sequence of side-by-side duplicate characters with an exclamation point ('!', although any character could be used.) It then uses the Transact-SQL REPLACE function to remove all exclamation points from the input string. Again, this function would be a perfect candidate for a WHILE loop, but the numbers table lets us tell SQL Server to do the hard work for us. Here are some samples to try:

SELECT dbo.fnReduceDupes ('AAAABBBBBCCDDDDEEEEFFGGGGHIIIIJJKK');
SELECT dbo.fnReduceDupes('HHHHHIIIIII TTTTHHHEEEERRRRRRREEEE');

One Last Time

OK, so we can see the usefulness of a numbers table in creating set-based position-dependent string functions. But all the world is not a string; and often well need to apply business logic that has nothing to do with string manipulation.

Let's consider the SQL 2000 Northwind Database for a moment. This database has a very simple "inventory" system of sorts -- the kind you might create for a very basic Webstore. Basically the dbo.Products table has a single column named UnitsInStock. This is your inventory. I say it's very basic because it lacks any lot/bin or tracking information. The items and quantities of those items ordered are stored in the ProductID and Quantity columns of the dbo.[Order Details] table.

The problem is this: we need to match up the items in inventory to specific customer orders. The only real issue we might face is the total quantity of a particular product on order totals more than the total quantity of that product in inventory. I.e., we've sold or promised more than we have on hand. In a more detailed inventory tracking system (such as the SQL 2005 AdventureWorks shelf/bin inventory system), we are looking at more potential issues (i.e., a customer orders more of a product than is contained in one bin so we have to pull from multiple bins to fill one order, assigning items from bins to customer orders, etc.)

A row-by-row cursor might seem like the obvious solution for this problem. After all, we need to assign each item we have in inventory to a customer order detail row. Also we need to keep track of the inventory quantity to make sure we don't over-commit and try to ship more product than we have on hand. Or do we?

Here I'll present a SQL 2000 Northwind solution to this problem:

SELECT Order_Items.OrderID, Product_Inventory.ProductID,
    COUNT(*) AS ItemsFromInventory
FROM
(
    SELECT ProductID, Num
    FROM dbo.Numbers n
    INNER JOIN dbo.[Products] p
    ON n.Num BETWEEN 1 AND p.UnitsInStock
) Product_Inventory
INNER JOIN
(
    SELECT OrderID, ProductID,
    (
        SELECT COALESCE(SUM(o1.Quantity),0)
        FROM dbo.[Order Details] o1
        WHERE o1.ProductID = o.ProductID
            AND o1.OrderID < o.OrderID
        ) + n.Num AS Num
    FROM dbo.Numbers n
    INNER JOIN dbo.[Order Details] o
        ON n.Num BETWEEN 1 AND o.Quantity
) Order_Items
ON Product_Inventory.ProductID = Order_Items.ProductID
AND Product_Inventory.Num = Order_Items.Num
GROUP BY Order_Items.OrderID, Product_Inventory.ProductID
ORDER BY Order_Items.OrderID, Product_Inventory.ProductID;

OK, so what just happened here?

To begin with, we joined each product we have in inventory to the numbers table using ON n.Num BETWEEN 1 AND p.UnitsInStock. What this does is individually number each product item ordered. By way of example, if we have two of Product ID 998 in stock and three of Product ID 999 in stock, the Product_Inventory subquery will generate the following result:

ProductIDNum
9981
9982
9991
9992
9993

Next we assigned a number to each item on each order. This is the Order_Items subquery. It's more complex than the Product_Inventory query because we have to account for both ProductID and OrderID. The complexity comes because the items ordered and their quantities are spread out over several rows on several different orders. But we want the count to go from 1...n where n is the total quantity of a single ordered item across all orders. Functionally we're just replicating SQL 2005's ROW_NUMBER functionality, which will do this exact same thing for us when the optional PARTITION BY clause is specified.

We INNER JOIN these two subqueries by their ProductID and Num columns, which effectively assigns one item from inventory to one item on each order. We don't have to worry about assigning more items than we have in inventory since the INNER JOINs in the subqueries won't let us assign more of an item than we have in inventory. After assigning each inventory item to an order detail item, we group the results. The COUNT(*) of each group gives us the total number of each item from inventory that was assigned to a given order.

The SQL 2005 version is similar, although the ROW_NUMBER function and CTE's will probably make the job easier. Of course, as mentioned, the AdventureWorks sample database uses a shelf/bin inventory system that's also slightly more complex. I'll leave the SQL 2005 version of this query as an exercise for the reader.

Conclusions

Numbers tables are an extremely useful tool for writing efficient set-based code to replace cursors and other procedural constructs.

Total article views: 16115 | Views in the last 30 days: 13
 
Related Articles
FORUM

Selecting a Value of the Order Within a Group

Select To Indicate Order Number Within a Group

FORUM

Order by numbers in proper sequence

Ordering by numbers in proper sequence order

FORUM

same string selection

same string selection

FORUM

How to optimize Inventory database transaction

How to optimize Inventory database transaction

FORUM

Inventory Management System

Need a database for Inventory Managament System

Tags
advanced querying    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones