The Tally (or Numbers) table is one of the most useful constructs in intermediate-level SQL. Have you ever written a T-SQL WHILE loop? How about a CURSOR? If your answer to either of those questions is yes, you’ve probably been coding some really poor-performing SQL and you may not even realize it. Tally tables are one way that you may be able to avoid using a loop or a CURSOR, instead creating a true set-based solution that is blazingly fast and reliable.
Tally Tables in SQL 2000
The ability to create a Tally table has been around for a long time, going way back to SQL Server’s roots in Sybase. One of the most common methods of creating a permanent table is:
SELECT TOP 1000000 N=IDENTITY(INT, 1, 1) INTO dbo.Numbers FROM master.dbo.syscolumns a CROSS JOIN master.dbo.syscolumns b; ALTER TABLE dbo.Numbers ADD CONSTRAINT NBR_pk PRIMARY KEY(N); SELECT TOP 5 N FROM dbo.Numbers;
Obviously, in order to run this you need access to the master.dbo.syscolumns table that is referenced. Most of the time this particular table has at least 4,000 rows in it so constructing a Tally table with 1,000,000 (or even 10,000,000) rows is no problem at all. Our Tally table will start with 1 (results below are from the final SELECT).
N 1 2 3 4 5
If you create a permanent Tally table, it’s important to have a CLUSTERED INDEX (the PRIMARY KEY) on the single column that’s in the table for performance reasons. We’ll show some examples of unique ways that you can use a Tally table shortly, but first it’s good to know how to create them. The example above works in SQL Server 2000.
Here’s another Tally table that can generate N’s between 0 and 255 (up to 2047 in later versions like 2008) and can be used in SQL Server 2000.
SELECT N=number FROM master..spt_values WHERE type = 'P';
There are those that might be asking why in the world would you use an undocumented system table like this? Well, rest assured that this table is used internally by many of SQL’s internal stored procedures, so it is not going away any time soon.
Tally Tables in SQL 2005
When SQL Server 2005 came along and introduced the Common Table Expression (CTE) and the ROW_NUMBER() window function (both the subject of earlier blogs as linked), it opened up some new ways to create what are known as in-line Tally tables. Let’s look at a couple of examples.
WITH Tally (N) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM sys.all_columns a CROSS JOIN sys.all_columns b ) SELECT TOP 5 N FROM Tally; -- Or if you don't have access to the sys tables use an in-line -- Tally table known as a "Ben-Gan" style Tally WITH lv0 AS (SELECT 0 g UNION ALL SELECT 0) ,lv1 AS (SELECT 0 g FROM lv0 a CROSS JOIN lv0 b) -- 4 ,lv2 AS (SELECT 0 g FROM lv1 a CROSS JOIN lv1 b) -- 16 ,lv3 AS (SELECT 0 g FROM lv2 a CROSS JOIN lv2 b) -- 256 ,lv4 AS (SELECT 0 g FROM lv3 a CROSS JOIN lv3 b) -- 65,536 ,lv5 AS (SELECT 0 g FROM lv4 a CROSS JOIN lv4 b) -- 4,294,967,296 ,Tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM lv5) SELECT TOP (5) n FROM Tally ORDER BY n;
The second Tally table structure was first suggested by SQL MVP Itzik Ben-Gan.
Never mind how they work, just know that the first will generate somewhere around 16M+ rows and the other will create a little over 4 billion rows (!) if we did not apply the TOP 10. The second one is a case of “stacked” or “cascaded” CTEs. Both are extremely fast; the second one especially because it is done entirely in memory.
Tally Tables in SQL 2008
SQL Server 2008 introduced something that’s known as a Table Value Constructor (TVC), which offers a very compact way to create a Tally table tailored specifically to the precise number of rows you need without using a TOP clause. Here’s an example of that:
WITH Tally (n) AS ( -- 1000 rows SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n) CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n) CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n) ) SELECT * FROM Tally;
That Tally table generates exactly 1,000 rows because each VALUES TVC has exactly 10 zeroes in it. We’ll now look at just a few of the remarkable ways that a Tally table can improve your SQL by helping you to write a set-based solution where otherwise you might need to resort to a loop or CURSOR.
Using a Tally Table to Expand a Row Set
Let’s start with some sample data:
CREATE TABLE #Temp ( ID INT IDENTITY PRIMARY KEY ,StartDT DATETIME ); INSERT INTO #Temp SELECT '2014-02-18 09:20' UNION ALL SELECT '2014-02-19 05:35'; SELECT ID, StartDT ,TT=DATEPART(hour, DATEADD(hour, DATEDIFF(hour, 0, StartDT), 0)) ,TD=DATEADD(day, DATEDIFF(day, 0, StartDT), 0) FROM #Temp;
The results returned are this, where DATEADD(hour, DATEDIFF(hour, 0, StartDT), 0) is a nifty way to truncate the DATETIME value to the nearest hour (or use day instead to truncate to the current day). Here are the results from the above query:
ID StartDT TT TD 1 2014-02-18 09:20:00.000 9 2014-02-18 00:00:00.000 2 2014-02-19 05:35:00.000 5 2014-02-19 00:00:00.000
Suppose what we want is 10 rows that correspond to ID=1, where each is incremented to the next hour of the day (starting at midnight). Likewise, for ID=2 we want 6 rows. We can use a small Tally table for this (remember there’s only 24 hours in a day):
WITH Tally (N) AS ( -- Tally table starting at 0 SELECT 0 UNION ALL -- Now 24 more rows SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM (VALUES(0),(0),(0),(0),(0),(0)) a(n) CROSS JOIN (VALUES(0),(0),(0),(0)) c(n) ) SELECT ID, StartDT ,TD=DATEADD(hour, N, DATEADD(day, DATEDIFF(day, 0, StartDT), 0)) FROM #Temp CROSS JOIN Tally WHERE N BETWEEN 0 AND DATEPART(hour, DATEADD(hour, DATEDIFF(hour, 0, StartDT), 0)) ORDER BY ID, TD;
Note how this Tally table has been modified to start at 0 by adding SELECT 0 UNION ALL before the second SELECT. The results of this query are:
ID StartDT TD 1 2014-02-18 19:20:00.000 2014-02-18 00:00:00.000 1 2014-02-18 19:20:00.000 2014-02-18 01:00:00.000 1 2014-02-18 19:20:00.000 2014-02-18 02:00:00.000 1 2014-02-18 19:20:00.000 2014-02-18 03:00:00.000 1 2014-02-18 19:20:00.000 2014-02-18 04:00:00.000 1 2014-02-18 19:20:00.000 2014-02-18 05:00:00.000 1 2014-02-18 19:20:00.000 2014-02-18 06:00:00.000 1 2014-02-18 19:20:00.000 2014-02-18 07:00:00.000 1 2014-02-18 19:20:00.000 2014-02-18 08:00:00.000 1 2014-02-18 19:20:00.000 2014-02-18 09:00:00.000 2 2014-02-19 15:35:00.000 2014-02-19 00:00:00.000 2 2014-02-19 15:35:00.000 2014-02-19 01:00:00.000 2 2014-02-19 15:35:00.000 2014-02-19 02:00:00.000 2 2014-02-19 15:35:00.000 2014-02-19 03:00:00.000 2 2014-02-19 15:35:00.000 2014-02-19 04:00:00.000 2 2014-02-19 15:35:00.000 2014-02-19 05:00:00.000
Wasn’t that easy and really cool?
Using a Tally Table to Remove Repeated Characters in a String
There are a lot of situations in T-SQL where you need to parse a string. Many of the built-in functions that T-SQL provides stop at the first occurrence of something. But what if you need to perform the same operation on many characters within the string and a built-in function simply won’t do it for you?
This is where a Tally table can really save you. We’ll start with one that has 8000 rows so we can parse each character out of VARCHAR(8000) length string. Follow the comments in the code to see how it works.
DECLARE @S VARCHAR(8000) = 'Aarrrgggh!'; WITH Tally (N) AS ( -- 8000 rows (max length of the VARCHAR string) SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n) CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n) CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n) CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0)) e(n) ) SELECT n, s FROM ( -- Always choose the first element SELECT n=1, s=LEFT(@S, 1) UNION ALL -- Include each successive next element as long as it’s different than the prior SELECT N, CASE WHEN SUBSTRING(@S, N-1, 1) <> SUBSTRING(@S, N, 1) THEN SUBSTRING(@S, N, 1) -- Repeated characters are assigned NULL by the CASE END FROM Tally WHERE N BETWEEN 2 AND LEN(@S) ) a -- Now we filter out the repeated elements WHERE s IS NOT NULL;
The results displayed for this are below. Note that it works just fine with strings of length 1 or 0 also.
n s 1 A 3 r 6 g 9 h 10 !
It is also case-insensitive, so it removes the lowercase a (second character) because it is a repeat of the first character (capital A). It could be made case sensitive if necessary. Now all we need to do is combine the rows that remain back into our final string using a technique that is similare to Creating a Comma-separated List by author Microsoft Certified Master (MCM) Wayne Sheffield. It just needs a slight modification to not include the comma separator and it can be applied to this case as follows.
WITH Tally (N) AS ( -- 8000 rows SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n) CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n) CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n) CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0)) e(n) ) SELECT OriginalString=@S, NewString= ( SELECT '' + s FROM ( -- Same base query as the prior example starts here SELECT n, s FROM ( -- Always choose the first element SELECT n=1, s=LEFT(@S, 1) UNION ALL -- Include each successive next element as long as it's -- different than the prior SELECT N, CASE WHEN SUBSTRING(@S, N-1, 1) <> SUBSTRING(@S, N, 1) THEN SUBSTRING(@S, N, 1) -- Repeated characters are assigned NULL by the CASE END FROM Tally WHERE N BETWEEN 2 AND LEN(@S) ) a -- Now we filter out the repeated elements WHERE s IS NOT NULL ) a ORDER BY n FOR XML PATH(''), TYPE ).value('.', 'VARCHAR(8000)' );
Now our result is:
OriginalString NewString Aarrrgggh! Argh!
That is very fast and way cool, so maybe we want to save this as a utility FUNCTION. SQL has many types of User-defined FUNCTIONs, e.g., scalar, table-valued, etc. Scalar-valued functions are known to be total performance nightmares. By far the fastest type of FUNCTION if you can write it this way is to use what’s known as an inline Table Valued Function (iTVF). We can do this for our new utility FUNCTION.
CREATE FUNCTION dbo.RemoveRepeatedChars ( @S VARCHAR(8000) ) RETURNS TABLE WITH SCHEMABINDING RETURN WITH Tally (N) AS ( -- 8000 rows SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n) CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n) CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n) CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0)) e(n) ) SELECT OriginalString=@S, NewString= ( SELECT '' + s FROM ( SELECT n, s FROM ( -- Always choose the first element SELECT n=1, s=LEFT(@S, 1) UNION ALL -- Include each successive next element as long as it's -- different than the prior SELECT N, CASE WHEN SUBSTRING(@S, N-1, 1) <> SUBSTRING(@S, N, 1) THEN SUBSTRING(@S, N, 1) -- Repeated characters are assigned NULL by the CASE END FROM Tally WHERE N BETWEEN 2 AND LEN(@S) ) a -- Now we filter out the repeated elements WHERE s IS NOT NULL ) a ORDER BY n FOR XML PATH(''), TYPE ).value('.', 'VARCHAR(8000)' );
An iTVF with SCHEMABINDING is the fastest possible FUNCTION we can write and it works for this case. We can now call our function to parse as many strings as we need (like from a column in a table).
WITH SampleData (s) AS ( SELECT 'Aarrrgggh!' UNION ALL SELECT 'Noo repeaatting chharracterss wanntted.' ) SELECT * FROM SampleData CROSS APPLY dbo.RemoveRepeatedChars(s);
CROSS APPLY is something we’ll cover in a future blog but we’ve provided some references at the end if you’d like to learn more now. The results we get from the above script are:
OriginalString NewString Aarrrgggh! Argh! Noo repeaatting chharracterss wanntted. No repeating characters wanted.
Parsing a Comma Separated List of Values
A more common problem in SQL is the parsing of a comma-separated list of values. We’ve saved the best for last, because in this case we can truly point you to a best-of-breed solution.
How many of you have created a SQL FUNCTION to split a delimited list of values? Chances are you pulled down some tired old looping code from somewhere on the Internet that works but is dog slow. You probably don’t even realize how bad it is. Now is the time to locate it and replace it as what we’re about to tell you about will simply blow the doors off of any other SQL-based delimited string splitter!
SQL MVP Jeff Moden wrote the outstanding Tally OH! An Improved SQL 8K “CSV Splitter” Function (viewed more than 40,000 times as of this writing), which contains lots of outstanding information about Tally tables (another of his articles is referenced at the end), how they can be used to split a delimited string and the performance comparison with alternate solutions. At the end of the article, you can download DelimitedSplit8K, optimized specifically for use with VARCHAR(8000) strings, so don’t try to modify it to VARCHAR(MAX). I have provided a link to that article as over the years the FUNCTION has been revised and improved a few times, so if you go there you’ll be sure to get the most current version.
It also contains information on CLR-based splitters that can be even faster and handle the case of VARCHAR(MAX).
Conclusions
Let’s do a recap of all of the things we learned today:
- The various ways to create a permanent or generate an in-line Tally table in SQL 2000, 2005 and 2008.
- The formula for using DATEDIFF/DATEADD to truncate a date to just the day or to the hour.
- How to expand a row set using a Tally table.
- How to parse a string using a Tally table.
- How to use the FOR XML PATH method to concatenate rows into a single character string.
- How to create a high-performance, schema-bound, in-line Table Valued FUNCTION in SQL.
- Where to find the fastest SQL-based delimited string splitter on the planet (DelimitedSplit8K).
We’ve also come away with two utility FUNCTIONs we can add to our SQL tool chest (see below for a third one):
- RemoveRepeatedChars
- DelimitedSplit8K
For further reading on CROSS APPLY (and its companion OUTER APPLY), here are two great articles by SQL MVP Paul White to get you started.
Further suggested reading:
- The “Numbers” or “Tally” Table: What it is and how it replaces a loop, which is also by SQL MVP Jeff Moden
- Splitting Strings Based on Patterns, which contains PatternSplitCM – another must have high-performing, utility FUNCTION for your SQL tool box that uses a Tally table to work its magic
- An even faster version of DelimitedSplit8K that works only in SQL 2012 or later in this article by Eirikur Eiriksson: Reaping the benefits of the Window functions in T-SQL
Follow me on Twitter: @DwainCSQL
Copyright © Dwain Camps 2014 All Rights Reserved