SQLServerCentral Article

The "Numbers" or "Tally" Table: What it is and how it replaces a loop

,

Editor's Note:

There is updated code in this article: Tally Oh! An Improved SQL 8K CSV Splitter Function. Please use that code for production purposes and not the code in this article. The updated code is located here: The New Splitter Functions.

Author UPDATE! (13 May 2011)

You'll find the makings of a "CSV Splitter" in this article. Please understand that it's for the explanation of how a Tally Table can be used in place of a WHILE loop and that it is, by no stretch of imagination, an optimal solution for a "CSV Splitter. For an optional solution, please refer to the following URL ( http://www.sqlservercentral.com/articles/Tally+Table/72993/ ) instead of using the code from this article for a "CSV Splitter" of your own.

Thank you for your time

Introduction

I actually started out writing an article on how to pass 1, 2, and 3 dimensional "arrays" as parameters in stored procedures. Suddenly it dawned on me that a lot of people still have no clue what a "numbers" or "Tally" table is, never mind how it actually works.

There are dozens of things we can do in SQL that require some type of iteration. "Iteration" means "counters" and "loops" to most people and recursion to others. To those well familiar in the techniques of "Set-based" programming, it means a "Numbers" or "Tally" table, instead. I like the name "Tally" table because, well, it just sounds cooler and there's no chance of anyone mistaking what I said when I say "Tally Table". Everyone immediately knows what I'm talking about and what it's used for. If they don't, they always stop and ask.

So, with that in mind, we're going to explore what a Tally table is and how it works to replace loops. We'll start out simple and build to the classic example of "splitting" a parameter. We'll throw in the added bonus of how to normalize an entire table worth of a CSVColumn... all with no cursors, no loops, no functions, and no performance problems.

Building a Tally Table

A Tally table is nothing more than a table with a single column of very well indexed sequential numbers starting at 0 or 1 (mine start at 1) and going up to some number. The largest number in the Tally table should not be just some arbitrary choice. It should be based on what you think you'll use it for. I split VARCHAR(8000)'s with mine, so it has to be at least 8000 numbers. Since I occasionally need to generate 30 years of dates, I keep most of my production Tally tables at 11,000 or more which is more than 365.25 days times 30 years.

There are many methods to build a Tally table. Let's use one of the most obvious, a loop. Why? Because it's familiar ground for a lot of folks... trust me for a minute... I have other points to make in this article...

--=============================================================================
--      Setup
--=============================================================================
    USE TempDB     --DB that everyone has where we can cause no harm
    SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance/speed
DECLARE @StartTime DATETIME    --Timer to measure total duration
    SET @StartTime = GETDATE() --Start the timer
--=============================================================================
--      Create and populate a Tally table
--=============================================================================
--===== Conditionally drop and create the table/Primary Key
     IF OBJECT_ID('dbo.Tally') IS NOT NULL 
        DROP TABLE dbo.Tally
 CREATE TABLE dbo.Tally 
        (N INT, 
         CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N))
--===== Create and preset a loop counter
DECLARE @Counter INT
    SET @Counter = 1
--===== Populate the table using the loop and couner
  WHILE @Counter <= 11000
  BEGIN
         INSERT INTO dbo.Tally
                (N)
         VALUES (@Counter)
            SET @Counter = @Counter + 1
    END
--===== Display the total duration
 SELECT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Milliseconds duration'

That will take about 750 milliseconds to generate. Like I said, there are many ways to generate a Tally table. Let me introduce you to my favorite way and then I'll explain why it's a favorite...

--=============================================================================
--      Setup
--=============================================================================
    USE TempDB     --DB that everyone has where we can cause no harm
    SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance/speed
DECLARE @StartTime DATETIME    --Timer to measure total duration
    SET @StartTime = GETDATE() --Start the timer
--=============================================================================
--      Create and populate a Tally table
--=============================================================================
--===== Conditionally drop 
     IF OBJECT_ID('dbo.Tally') IS NOT NULL 
        DROP TABLE dbo.Tally
--===== Create and populate the Tally table on the fly
 SELECT TOP 11000 --equates to more than 30 years of dates
        IDENTITY(INT,1,1) AS N
   INTO dbo.Tally
   FROM Master.dbo.SysColumns sc1,
        Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
  ALTER TABLE dbo.Tally
    ADD CONSTRAINT PK_Tally_N 
        PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
--===== Let the public use it
  GRANT SELECT, REFERENCES ON dbo.Tally TO PUBLIC
--===== Display the total duration
 SELECT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Milliseconds duration'

The first time you run it, it'll take about 220 milliseconds so, right up front, it beats the loop. Run it a second time and you'll find that it takes less time... anywhere from 73 to 186 milliseconds depending on what the system is doing.

The key here is that we've already replaced one loop using an IDENTITY and a CROSS JOIN. Cross Joins can be a real friend because they can be used in place of loops and, on a properly indexed table like a Tally table, they run nasty fast. The reason why this method IS my favorite is because of the Cross Join... it makes the code very short, very easy to remember, and very fast. We'll see another example of a Cross Join actually using the Tally table later in the article.

Don't delete the Tally table we just built. We're going to play with it to show you how it works.

Direct Loop Replacement as a COUNTER

Right after the "Hello World" problem, most programming language instructors teach how to "Loop". The problem normally manifests itself as "Produce and display a count from 1 to 10. In SQL Server, that frequently (and, unfortunately) is demonstrated as the following...

--===============================================
--      Display the count from 1 to 10
--      using a loop.
--===============================================
--===== Declare a counter
DECLARE @N INT
    SET @N = 1
--===== While the counter is less than 10...
  WHILE @N <= 10
  BEGIN
        --==== Display the count
        SELECT @N
        --==== Increment the counter
           SET @N = @N + 1
    END

This works because we do iterations using a counter. Let me repeat, we do iterations using a counter. This can be defined as "For each count from 1 to 10, display the value of the count".

Also notice, the SELECT was executed 10 times and that creates 10 different result sets. That's NOT set based programming. That's what I refer to as "RBAR" (pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row"). Let's see how to do that with a Tally table...

--===============================================
--      Display the count from 1 to 10
--      using a Tally table.
--===============================================
 SELECT N
   FROM dbo.Tally
  WHERE N <= 10
  ORDER BY N

This works because we use existing rows to do the "iterations". Let me repeat, we do the iterations using existing rows. This can be defined as "For each row from 1 to 10, display the value of the row". The BIG difference is that we only use a single SELECT and we don't actually have to count as we go. We just limit how many rows we use and what the values of the rows are. It produces the same result as the loop does (count of 1 to 10), but it does it using a single SELECT. In other words, it produces a single result set with the entire answer. Nothing RBAR about that... that's set based programming.

Stepping Through Characters

Let's say we have a parameter (variable) that looks like this (note the starting and ending commas)...

--===== Simulate a passed parameter
DECLARE @Parameter VARCHAR(8000)
    SET @Parameter = ',Element01,Element02,Element03,Element04, Element05,'

Let's write a loop just to show the character position for each character in the order they appear. Like this...

--===== Simulate a passed parameter
DECLARE @Parameter VARCHAR(8000)
    SET @Parameter = ',Element01,Element02,Element03,Element04,Element05,'
--===== Declare a character counter
DECLARE @N INT
    SET @N = 1
--===== While the character counter is less then the length of the string
  WHILE @N <= LEN(@Parameter)
  BEGIN
        --==== Display the character counter and the character at that
            -- position
        SELECT @N, SUBSTRING(@Parameter,@N,1)
        --==== Increment the character counter
           SET @N = @N + 1
    END

Note that as the loop progresses, the character counter "steps" through the string using SUBSTRING to display each character. It starts at "1" and ends when it gets to the end of the string displaying a character at that particular position for each iteration of the loop. Note also that it creates 51 separate result sets. That means that 51 individual SELECTs were executed. Heh... that qualifies as "RBAR"!

Editor's Note: There is updated code in this article: Tally Oh! An Improved SQL 8K CSV Splitter Function. Please use that code for production purposes and not the code in this article. The updated code is located here: The New Splitter Functions.

Let's see how to do it with the Tally table...

--===== Simulate a passed parameter
DECLARE @Parameter VARCHAR(8000)
    SET @Parameter = ',Element01,Element02,Element03,Element04,Element05,'
--===== Do the same thing as the loop did... "Step" through the variable
     -- and return the character position and the character...
 SELECT N, SUBSTRING(@Parameter,N,1)
   FROM dbo.Tally
  WHERE N <= LEN(@Parameter)
  ORDER BY N

Just like counting from 1 to 10, both the loop and the Tally table count from 1 to the length of the parameter. The Tally table is a direct replacement for the loop. Look at the following graphic. Both the loop and the Tally table do exactly the same thing except the Tally table only uses 1 SELECT and returns a single result set. The rows of the Tally table act as the counter except it's set based.

"Jumping" to Certain Characters

Let's do almost the same thing, but let's just return the positions of the commas. First, one method to do it using a loop...

--===== Simulate a passed parameter
DECLARE @Parameter VARCHAR(8000)
    SET @Parameter = ',Element01,Element02,Element03,Element04,Element05,'
--===== Declare a charaction counter
DECLARE @N INT
    SET @N = 1
--===== While the character counter is less then the length of the string
  WHILE @N <= LEN(@Parameter)
  BEGIN
        --==== Display the character counter and the character at that
            -- position but only if a comma is present at that position.
            IF SUBSTRING(@Parameter,@N,1) = ','
         BEGIN
               SELECT @N, SUBSTRING(@Parameter,@N,1)
           END
        --==== Increment the character counter
           SET @N = @N + 1
    END

Did you notice how the loop is slowing down? Now, let's try the same thing with the Tally table...

--===== Simulate a passed parameter
DECLARE @Parameter VARCHAR(8000)
    SET @Parameter = ',Element01,Element02,Element03,Element04,Element05,'
--===== Do the same thing as the loop did... "Step" through the variable
     -- and return the character position and the character... but only
     -- if it's a comma in that position...
 SELECT N, SUBSTRING(@Parameter,N,1)
   FROM dbo.Tally
  WHERE N <= LEN(@Parameter)
    AND SUBSTRING(@Parameter,N,1) = ','
  ORDER BY N

Notice, that almost didn't slow down at all. It's doing the same as the loop solution. The big difference is that it's doing it all in a single SELECT. It's using the existing rows in the Tally table as a counter. In essence, it' joining to the parameter at the character level... and that makes it very, very fast especially since the values of the Tally table are cached and the counters in the loop are not.

Going all the way... do the "Split".

Ok, we now understand that a Tally table is a direct replacement for some loops... especially loops that count. We've also seen that the Tally table can single out certain characters much faster than using a loop. MUCH faster. Let's go all the way... let's split the parameter we've been using into the individual elements that are marked by the commas. Here's one way to do it with a loop except this time, I've cheated a bit to help the loop find the comma's using CHARINDEX instead of stepping through each character just to give it a fighting chance. I've also made the parameter more like what you'd get from a GUI... no leading or trailing commas...

--===== Simulate a passed parameter
DECLARE @Parameter VARCHAR(8000)
    SET @Parameter = 'Element01,Element02,Element03,Element04,Element05'
--===== Create a table to store the results in
DECLARE @Elements TABLE 
        (
        Number INT IDENTITY(1,1), --Order it appears in original string
        Value  VARCHAR(8000)      --The string value of the element   
        )
--===== Decalre a variable to remember the position of the current comma
DECLARE @N INT 
--===== Add start and end commas to the Parameter so we can handle
     -- single elements
    SET @Parameter = ','+@Parameter +','
--===== Preassign the current comma as the first character
    SET @N = 1
--===== Loop through and find each comma, then insert the string value 
     -- found between the current comma and the next comma.  @N is
     -- the position of the current comma.
  WHILE @N < LEN(@Parameter) --Don't include the last comma
  BEGIN
        --==== Do the insert using the value between the commas
        INSERT INTO @Elements 
        VALUES (SUBSTRING(@Parameter,@N+1,CHARINDEX(',',@Parameter,@N+1)-@N-1))
        --==== Find the next comma
        SELECT @N = CHARINDEX(',',@Parameter,@N+1)
   END
SELECT * FROM  @Elements

Again, all this does is find a comma and "remembers" its position. Then it uses CharIndex to find the next comma and inserts what's between the commas into a table variable. It quits looping when it runs out of commas.

Let's try the same thing with the Tally table...

--===== Simulate a passed parameter
DECLARE @Parameter VARCHAR(8000)
    SET @Parameter = 'Element01,Element02,Element03,Element04,Element05'
--===== Create a table to store the results in
DECLARE @Elements TABLE 
        (
        Number INT IDENTITY(1,1), --Order it appears in original string
        Value  VARCHAR(8000)      --The string value of the element   
        )
--===== Add start and end commas to the Parameter so we can handle
     -- single elements
    SET @Parameter = ','+@Parameter +','
--===== Join the Tally table to the string at the character level and
     -- when we find a comma, insert what's between that command and 
     -- the next comma into the Elements table
 INSERT INTO @Elements
        (Value)
 SELECT SUBSTRING(@Parameter,N+1,CHARINDEX(',',@Parameter,N+1)-N-1)
   FROM dbo.Tally
  WHERE N < LEN(@Parameter)
    AND SUBSTRING(@Parameter,N,1) = ',' --Notice how we find the comma
SELECT * FROM @Elements

The details are in the comments. What I want to point out is that they're both very fast. 5 elements just isn't enough to determine a winner here. Let's make a "monster" parameter of 796 elements (last one will be blank) and see which one wins... here's the full code I ran...

--Loop method
--===== Simulate a passed parameter
DECLARE @Parameter VARCHAR(8000)
    SET @Parameter = REPLICATE('Element01,Element02,Element03,Element04,Element05,',159)
    SET NOCOUNT ON
--===== Create a table to store the results in
DECLARE @Elements TABLE 
        (
        Number INT IDENTITY(1,1), --Order it appears in original string
        Value  VARCHAR(8000)      --The string value of the element   
        )
--===== Decalre a variable to remember the position of the current comma
DECLARE @N INT 
--===== Add start and end commas to the Parameter so we can handle
     -- single elements
    SET @Parameter = ','+@Parameter +','
--===== Preassign the current comma as the first character
    SET @N = 1
--===== Loop through and find each comma, then insert the string value 
     -- found between the current comma and the next comma.  @N is
     -- the position of the current comma.
  WHILE @N < LEN(@Parameter) --Don't include the last comma
  BEGIN
        --==== Do the insert using the value between the commas
        INSERT INTO @Elements 
        VALUES (SUBSTRING(@Parameter,@N+1,CHARINDEX(',',@Parameter,@N+1)-@N-1))
        --==== Find the next comma
        SELECT @N = CHARINDEX(',',@Parameter,@N+1)
   END
GO
--Tally table method
--===== Simulate a passed parameter
DECLARE @Parameter VARCHAR(8000)
    SET @Parameter = REPLICATE('Element01,Element02,Element03,Element04,Element05,',159)
SET NOCOUNT ON
--===== Create a table to store the results in
DECLARE @Elements TABLE 
        (
        Number INT IDENTITY(1,1), --Order it appears in original string
        Value  VARCHAR(8000)      --The string value of the element   
        )
--===== Add start and end commas to the Parameter so we can handle
     -- single elements
    SET @Parameter = ','+@Parameter +','
--===== Join the Tally table to the string at the character level and
     -- when we find a comma, insert what's between that command and 
     -- the next comma into the Elements table
 INSERT INTO @Elements
        (Value)
 SELECT SUBSTRING(@Parameter,N+1,CHARINDEX(',',@Parameter,N+1)-N-1)
   FROM dbo.Tally
  WHERE N < LEN(@Parameter)
    AND SUBSTRING(@Parameter,N,1) = ',' --Notice how we find the comma

And, here's the results according to the Profiler...

The Tally table method wins for Duration, CPU time, and RowCounts. It's looses in Reads and that bothers some folks, but it shouldn't. Part of the reason the Tally table method works so well is because parts of the Tally table get cached and those are "Logical Reads" from memory. If you think these differences are small, multiply them by 10,000 or a million and see how much difference there is. If you want performance, stop using loops and start using a Tally table.

One Final "Split" Trick with the Tally Table

You've seen it... some poor slob posts that (s)he has a table and it has a CSV column in it. "How do you join to it?", they ask. The correct answer, of course, is to normalize the table so that there is no CSV column and everyone says so on the post. Heh, but then the recommendation that follows is usually "Write a cursor to normalize the table."

The Tally table can normalize the whole table in one simple select... and remember when I said that I liked Cross Joins on the Tally table because it makes code short and fast? Check this out...

--===== Create a sample denormalized table with a CSV column
 CREATE TABLE #MyHead
        (PK INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
         CsvColumn VARCHAR(500))
 INSERT INTO #MyHead
 SELECT '1,5,3,7,8,2' UNION ALL
 SELECT '7,2,3,7,1,2,2' UNION ALL
 SELECT '4,7,5' UNION ALL
 SELECT '1' UNION ALL
 SELECT '5' UNION ALL
 SELECT '2,6' UNION ALL
 SELECT '1,2,3,4,55,6'
--===== Split or "Normalize" the whole table at once
 SELECT mh.PK,
        SUBSTRING(','+mh.CsvColumn+',',N+1,CHARINDEX(',',','+mh.CsvColumn+',',N+1)-N-1) AS Value
   FROM dbo.Tally t
  CROSS JOIN #MyHead mh
  WHERE N < LEN(','+mh.CsvColumn+',')
    AND SUBSTRING(','+mh.CsvColumn+',',N,1) = ','

No cursor, no loops, no functions, no performance problems... all done with the great loop substitute, the Tally Table.

Dozens of Other Uses

Once you've made the realization that joining to a Tally table is like creating a loop, you can do dozens of other things. Need to create a derived table with all the dates in a date range so you can outer join to it and get the SUM of 0 for dates you have no data for? That's easy with a Tally table...

--===== Presets
    USE NorthWind
DECLARE @DateStart DATETIME
DECLARE @DateEnd   DATETIME
--===== Find the min and max dates in the range of data
 SELECT @DateStart = MIN(ShippedDate),
        @DateEnd   = MAX(ShippedDate)
   FROM dbo.Orders
--===== Find the total freight for each day even if the day had 
     -- no shipments
 SELECT dates.ShippedDate,
        ISNULL(SUM(o.Freight),0) AS TotalFreight
   FROM dbo.Orders o
  RIGHT OUTER JOIN
        (--==== Create all shipped dates between start and end date
         SELECT t.N-1+@DateStart AS ShippedDate
           FROM dbo.Tally t
          WHERE t.N-1+@DateStart <= @DateEnd
        ) dates
     ON o.ShippedDate = dates.ShippedDate
  GROUP BY dates.ShippedDate

How about making a "shift" table with 3 shifts per day starting on '2008-01-01 06:00' for the next 5 years... again, easy with the Tally table...

--===== Presets
DECLARE @DateStart DATETIME
DECLARE @DateEnd   DATETIME
 SELECT @DateStart = '2008-01-01 06:00',
        @DateEnd   = DATEADD(yy,5,@DateStart)
--===== Display the shift number and date/times
 SELECT (t.N-1)%3 + 1 AS ShiftNumber,
        DATEADD(hh,8*(t.N-1),@DateStart) AS ShiftStart,
        DATEADD(hh,8*(t.N  ),@DateStart) AS ShiftEnd
   FROM dbo.Tally t
  WHERE DATEADD(hh,8*(t.N-1),@DateStart) <= @DateEnd

Notice that the "%3" is used to provide the shift number based on the value of the Tally table. We're also making dates by multiplying the Tally table value by 8 hours which generates the dates/times from the base date. Like I said, there are dozens of uses... Just do a search on "Numbers Table" or "Tally Table".

Conclusion

This article isn't meant to show you every thing you can use a Tally table for. Rather, it was an introduction as to what a Tally table is and how it actually works to replace loops in a set based fashion. In many cases, the Tally table is a direct replacement for a loop that counts. In other cases, something a bit more exotic needs to happen in the form of a Cross Join with the Tally table like when you want to split a whole column of CSV's. I think that you'll find that no matter what the use is, a Tally table will always beat the pants off of a looping solution.

This article hasn't even scratched the surface... there are dozens of uses for this simple, yet mighty tool. Many folks have posted some pretty interesting functions have to do with "Numbers" and "Tally" tables. Just do a search for "Numbers Table" or "Tally Table". You'll be amazed.

What I hope you take from this article is that now you know just exactly how a Tally table works. With that fundamental understanding, you'll be able to easily read the intent of code that uses a Tally table and easily identify "tuning/performance opportunities" in existing and new code.

Thanks for listening, folks.

--Jeff Moden


"Your lack of planning DOES constitute an emergency on my part... SO PLAN BETTER! "

"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row"

Rate

4.94 (493)

You rated this post out of 5. Change rating

Share

Share

Rate

4.94 (493)

You rated this post out of 5. Change rating