Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

The Dynamic Tally or Numbers Table

By Lynn Pettis,

Introduction

A few years ago I was introduced to the concept of the Tally, or Numbers, Table by Jeff Moden on SQLServerCentral.com. The uses of the tally table are numerous including parsing strings, eliminating loops, identifying gaps in numeric sequences, and many others. When I received my copy of the March 2009 SQL Server Magazine in the mail, I was intrigued by the article “Build the Numbers Table You Need” written by Frank Solomon.

I read the article with great interest. In the article he discusses how he developed a function that will dynamically build a tally or numbers table. He goes into great detail in his article describing how his function works and everything really made sense. The only two problems I saw with his solution were that it relied on a recursive CTE and a while loop, and it was written as a multi-statement table-valued function. These issues would mean that the function would most likely not scale very well to large numbers of values. I decided to pursue an alternative solution to his function that would perform better and be scalable to larger result sets.

The solution I developed is also a table-valued function, but I was able to write it as an in-line table-valued function instead of a multi-statement table-valued function. The function uses a modified Itzek Ben-Gan method to generate the base result sets that are used to generate the dynamic tally table.

In addition to generating a dynamic tally table with a user-defined starting and ending values and increment, this function also allows the user to generate the tally table in either ascending or descending order. The result set is returned in ascending order if the starting value is less than or equal to the ending value and the increment is positive. The result set is returned in descending order if the starting value is greater than or equal to the ending value and the increment is negative.

A Closer Look

Let us take a closer look at this new function, dbo.ufn_Tally2, and see how it works. Instead of using a recursive CTE to generate the values for the function, I chose to use a method that uses multiple CTE’s to in the function to generate the necessary result set that would be used to create the dynamic tally table, five CTE’s to be precise. Each of the CTE’s builds on the previous CTE to complete its work.

The first CTE, named BaseNum, is used to generate the initial result set that is used in the function. This CTE consists simply of ten SELECT 1 statements joined together with UNION ALL clauses to generate a result set consisting of ten rows.

The second CTE, named L1, uses the BaseNum CTE to generate a second result set with one thousand rows. It accomplishes this by crossing join BaseNum with itself two times.

The third CTE, named L2, then uses the L1 CTE in a single cross join to generate a third result set. This result set consists of one million rows. For most applications, this would most likely be sufficient, but to be prepared I took it one more level.

The fourth CTE, named L3, then uses the L2 CTE in another single cross join to create the final base result set. This result set returns one trillion rows. As we may never need this many rows, however, I thought it prudent to limit the actual number of rows returned by this CTE. In order to accomplish this, I added a TOP clause to SELECT statement in the CTE definition that included a formula that would calculate the number of values that needed to be returned based on the starting value, ending value, and increment. In mathematical terms, the formula is as follows:

(|max (Starting Value, Ending Value) – min (Starting Value, Ending Value)|/|Increment|) + 1

The fifth CTE, named Tally, then uses this final result set along with the ROW_NUMBER() window function to generate the base values used in the final SELECT statement that actually generates the dynamic tally table. The formula, in mathematical terms again, is as follows:

((N – 1) * Increment) + Starting Value

Here is the code for my function, dbo.ufn_Tally2:

USE [SandBox]
GO
 /****** Object:  UserDefinedFunction [dbo].[ufn_Tally2]    Script Date: 03/31/2009 23:01:03 ******/
 SET ANSI_NULLS ON
GO
 SET QUOTED_IDENTIFIER ON
GO
 CREATE function [dbo].[ufn_Tally2](
     @pStartValue bigint= 1,
     @pEndValue bigint= 1000000,
     @pIncrement bigint= 1
 )
 returns table
 as
 return(
     with BaseNum (
         N
     ) as (
     select 1 union all
     select 1 union all
     select 1 union all
     select 1 union all
     select 1 union all
     select 1 union all
     select 1 union all
     select 1 union all
     select 1 union all
     select 1
     ),
     L1 (
         N
     ) as (
     select
         bn1.N
     from
         BaseNum bn1
         crossjoin BaseNum bn2
     ),
     L2 (
         N
     ) as (
     select
         a1.N
     from
         L1 a1
         crossjoin L1 a2
     ),
     L3 (
         N
     ) as (
     select top ((abs(casewhen @pStartValue < @pEndValue
                          then @pEndValue
                          else @pStartValue
                      end -
                      case when @pStartValue < @pEndValue
                          then @pStartValue
                          else @pEndValue
                      end))/abs(@pIncrement)+ 1)
         a1.N
     from
         L2 a1
         crossjoin L2 a2
     ),
     Tally (
         N
     ) as (
     select
         row_number()over (orderby a1.N)
     from
         L3 a1
     )
     select
         ((N - 1) * @pIncrement) + @pStartValue as N
     from
         Tally
 );

A Quick Look at the Competition

Now let us take a quick look at Frank Solomon’s routine. I’m not going to go into detail trying to explain the logic behind his routine, as he did a very good job of that in his article. If you would like to learn more about his routine, I highly recommend getting a copy of the March 2009 edition of SQL Server Magazine and read his article.

A quick review of his code and you will see that the recursive CTE completes the initial load of the table with up to 32767 values to the @IntegersTable table variable, dependent of the value of the increment. If additional values are required, the function then loops using the existing records to add additional values to the @IntegersTable table variable until all required row groups have been completed. The function then needs to make necessary corrections to the data and delete unneeded values. All this extra work, the use of the recursive CTE, and the WHILE loop actually limits the scalability of this routine.

Here is the code for Frank Solomon’s dbo.CreateIntegersTable function:

USE [SandBox]
GO
 /****** Object:  UserDefinedFunction [dbo].[CreateIntegersTable]    Script Date: 04/02/2009 08:02:25 ******/
 SET ANSI_NULLS ON
GO
 SET QUOTED_IDENTIFIER ON
GO
 create function [dbo].[CreateIntegersTable] (
     @start_int bigint= 1,
     @step_int bigint = 1,
     @end_int  bigint
 )
 returns@FinishedIntegersTable table
 (
     ints bigint
 )
 as
 begin
     declare @IntegersTable table
         (
             ints bigint
         )
     if (@start_int < 1)
         set @end_int = (@end_int +abs(@start_int)+ 1);
     with IntegersTableFill (ints) as
     (
         select
             cast(1 as bigint)as 'ints'
         union all
             select
                 (T.ints + @step_int) as'ints'
             from
                 IntegersTableFill T
             where
                 ints <= (
                     case
                         when (@end_int <= 32767) then @end_int
                         else 32767
                     end
                 )
     )
     insert into @IntegersTable
     select
         ints
     from
         IntegersTableFill
     option
         (maxrecursion 32767);
     if (@end_int > 32767)
     begin
         declare @last_int_inserted bigint,
                 @int_row_groups int,
                 @current_row_group smallint;
         set @int_row_groups = ceiling((log((@end_int * 1.0)/65534))/log(2))+ 1;
         set @current_row_group = 1;
         while(@current_row_group <= @int_row_groups)
         begin
             select @last_int_inserted = max(ints)from @IntegersTable;
             insert into @IntegersTable
             select
                 ints + @last_int_inserted + (@step_int -1)
             from
                 @IntegersTable
             where
                 (ints + @last_int_inserted) <= @end_int;
             set @current_row_group = @current_row_group + 1;
         end
     end
     if (@start_int <> 1)
         update @IntegersTable set
             ints = (ints + (sign(@start_int)* abs(@start_int))-1);
     if (@start_int < 1)
         set @end_int =(@end_int - abs(@start_int)- 1);
     delete from
         @IntegersTable
     where
         ints < @start_int
         or ints > @end_int;
     insert into
         @FinishedIntegersTable
     select
         ints
     from
         @IntegersTable;
     return
 end;

The Test

Which of these two routines will be the most useful in a production environment? We are about to find out, but first I should briefly explain the tests I will use and the environment in which the tests were run.

The test environment is an HP server running dual Quad code Intel Xenon x64 2.8 GHz processors with 8 GB RAM. The disk subsystem is raid 5 SCSI with 5 disks. The OS is the x64 version of Windows Server 2003 R2 Enterprise Edition, and is running x64 version of SQL Server 2005 Developers Edition. This is a development system with little activity during the day.

The test code is shown below, and generates a series of result sets starting with one row and ending with ten million rows. The first test simply assigns the output to a variable declared as a BIGINT. The purpose of this is to test the function itself. The test is then run again, but inserts the values into a temporary table. I reran the tests a second time after modifying the test code, also shown below, to generate result sets starting with six rows and ending with sixty million rows.

Here is the test code I used to compare the two procedures:

create table #TestTable (N bigint);
 declare @LoopCnt int,
         @StartValue bigint,
         @EndValue bigint,
         @Increment bigint,
         @TestVal bigint;
 set @LoopCnt = 1;
 set@StartValue = 1;
 set @Increment = 1;
 while @LoopCnt <= 8
 begin
     set @EndValue = power(10,(@LoopCnt - 1));
     set statistics ioon;
     set statistics timeon;
     insert into #TestTable
     select
         ints
     from
         dbo.CreateIntegersTable(@StartValue, @Increment, @EndValue);
     set statistics timeoff;
     set statistics iooff;
     set @LoopCnt = @LoopCnt + 1;
     truncate table #TestTable;
 end
 drop table #TestTable;
go
 create table #TestTable (N bigint);
 declare @LoopCnt int,
         @StartValue bigint,
         @EndValue bigint,
         @Increment bigint,
         @TestVal bigint;
 set @LoopCnt = 1;
 set@StartValue = 1;
 set @Increment = 1;
 while @LoopCnt <= 8
 begin
     set @EndValue = power(10,(@LoopCnt - 1));
     set statistics ioon;
     set statistics timeon;
     insert into #TestTable
     select
         N
     from
         dbo.ufn_Tally2(@StartValue, @EndValue, @Increment);
     set statistics timeoff;
     set statistics iooff;
     set @LoopCnt = @LoopCnt + 1;
     truncate table #TestTable;
 end
 drop table #TestTable;
go
 declare @LoopCnt int,
         @StartValue bigint,
         @EndValue bigint,
         @Increment bigint,
         @TestVal bigint;
 set @LoopCnt = 1;
 set@StartValue = 1;
 set @Increment = 1;
 while @LoopCnt <= 8
 begin
     set @EndValue = power(10,(@LoopCnt - 1));
     set statistics ioon;
     set statistics timeon;
     select
         @TestVal = ints
     from
         dbo.CreateIntegersTable(@StartValue, @Increment, @EndValue);
     set statistics timeoff;
     set statistics iooff;
     set @LoopCnt = @LoopCnt + 1;
 end
go
 declare @LoopCnt int,
         @StartValue bigint,
         @EndValue bigint,
         @Increment bigint,
         @TestVal bigint;
 set @LoopCnt = 1;
 set @StartValue = 1;
 set @Increment = 1;
 while @LoopCnt <= 8
 begin
     set @EndValue = power(10,(@LoopCnt - 1));
     set statistics ioon;
     set statistics timeon;
     select
         @TestVal = N
     from
         dbo.ufn_Tally2(@StartValue, @EndValue, @Increment);
     set statistics timeoff;
     set statistics iooff;
     set @LoopCnt = @LoopCnt + 1;
 end
go
 create table #TestTable (N bigint);
 declare @LoopCnt int,
         @StartValue bigint,
         @EndValue bigint,
         @Increment bigint,
         @TestVal bigint;
 set @LoopCnt = 1;
 set@StartValue = 1;
 set @Increment = 1;
 while @LoopCnt <= 8
 begin
     set @EndValue = power(10,(@LoopCnt - 1))* 6;
     set statistics ioon;
     set statistics timeon;
     insert into #TestTable
     select
         ints
     from
         dbo.CreateIntegersTable(@StartValue, @Increment, @EndValue);
     set statistics timeoff;
     set statistics iooff;
     set @LoopCnt = @LoopCnt + 1;
     truncate table #TestTable;
 end
 drop table #TestTable;
go
 create table #TestTable (N bigint);
 declare @LoopCnt int,
         @StartValue bigint,
         @EndValue bigint,
         @Increment bigint,
         @TestVal bigint;
 set @LoopCnt = 1;
 set@StartValue = 1;
 set @Increment = 1;
 while @LoopCnt <= 8
 begin
     set @EndValue = power(10,(@LoopCnt - 1))* 6;
     set statistics ioon;
     set statistics timeon;
     insert into #TestTable
     select
         N
     from
         dbo.ufn_Tally2(@StartValue, @EndValue, @Increment);
     set statistics timeoff;
     set statistics iooff;
     set @LoopCnt = @LoopCnt + 1;
     truncate table #TestTable;
 end
 drop table #TestTable;
go
 declare @LoopCnt int,
         @StartValue bigint,
         @EndValue bigint,
         @Increment bigint,
         @TestVal bigint;
 set @LoopCnt = 1;
 set@StartValue = 1;
 set @Increment = 1;
 while @LoopCnt <= 8
 begin
     set @EndValue = power(10,(@LoopCnt - 1))* 6;
     set statistics ioon;
     set statistics timeon;
     select
         @TestVal = ints
     from
         dbo.CreateIntegersTable(@StartValue, @Increment, @EndValue);
     set statistics timeoff;
     set statistics iooff;
     set @LoopCnt = @LoopCnt + 1;
 end
go
 declare @LoopCnt int,
         @StartValue bigint,
         @EndValue bigint,
         @Increment bigint,
         @TestVal bigint;
 set @LoopCnt = 1;
 set @StartValue = 1;
 set @Increment = 1;
 while @LoopCnt <= 8
 begin
     set @EndValue = power(10,(@LoopCnt - 1))* 6;
     set statistics ioon;
     set statistics timeon;
     select
         @TestVal = N
     from
         dbo.ufn_Tally2(@StartValue, @EndValue, @Increment);
     set statistics timeoff;
     set statistics iooff;
     set @LoopCnt = @LoopCnt + 1;
 end
go

In the following tables, you find the results of the test runs.

Assigning to a BIGINT variable (all times in milliseconds):

Number of Rows CPU time CreateIntegersTable Elapsed time CreateIntegersTable CPU time ufn_Tally2 Elapsed time ufn_Tally2
1 0 2 0 0
10 0 1 0 0
100 16 7 0 0
1000 46 55 0 0
10000 485 512 16 5
100000 3375 3481 47 53
1000000 27890 28015 531 528
10000000 280954 281670 5281 5290

Inserting into temporary table (all times in milliseconds):

Number of Rows CPU time CreateIntegersTable Elapsed time CreateIntegersTable CPU time ufn_Tally2 Elapsed time ufn_Tally2
1 0 2 0 0
10 0 2 0 0
100 16 7 0 1
1000 62 64 16 11
10000 578 592 94 98
100000 4297 4347 968 970
1000000 36453 36775 9719 9739
10000000 365422 366977 97063 97822

Assigning to a BIGINT variable (all times in milliseconds):

Number of Rows CPU time CreateIntegersTable Elapsed time CreateIntegersTable CPU time ufn_Tally2 Elapsed time ufn_Tally2
6 0 2 0 0
60 15 5 0 0
600 32 35 0 0
6000 265 313 0 3
60000 2297 2417 47 33
600000 17344 17448 312 315
6000000 167047 169385 3157 3167
60000000 1660937 2187247 31656 31673

Inserting into temporary table (all times in milliseconds):

Number of Rows CPU time CreateIntegersTable Elapsed time CreateIntegersTable CPU time ufn_Tally2 Elapsed time ufn_Tally2
6 0 3 0 0
60 16 5 0 6
600 31 40 0 59
6000 359 365 62 59
60000 2828 2880 625 9618
600000 22297 22414 5844 5888
6000000 218938 220094 58297 58517
60000000 2177328 2186190 583250 586134

Conclusion

Comparing the results of the tests you will notice that the two routines are comparable for small results sets. Beginning at about one thousands rows, however, significant differences appear. At the upper range of the test, the difference is significant. Based on these simple tests it appears obvious that the routine using the recursive CTE and while loop is not one that should be used in a production environment.

Total article views: 9211 | Views in the last 30 days: 15
 
Related Articles
FORUM

Increment data while select

Increment data while select

FORUM

convert DateTime to BigInt

convert DateTime to BigInt

FORUM

Convert alphanumeric to BigInt

Convert alphanumeric to BigInt

FORUM

Incrementing Values

Increment Values

FORUM

convert bigint to int

How to convert bigint to int

Tags
cte    
tally table    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones