SQLServerCentral Article

Generating a Range

,

What is the most efficient method of generating a range of values

A lead-developer friend asked me to review a T-SQL code they had inherited which generated a

number range between 2 integer values.

He believed that the T-SQL was a prime example of spaghetti code and wanted to see some alternatives.

Of course, first port of call was SQLServerCentral to see what alternatives might exist and sure enough I found two scripts.

Chris Cubley's solution was closest to the one I was given to review where as DavidT's script would probably be the one that most procedural programmers would come up with.

I paraphrased all three versions into functions and measured the performance as each of them produced the number

range 5 to 567720.

AuthorFunctionAverage execution time
Chris Cubleydbo.fnCrossJoinRange30 secs.
DavidTdbo.fnRange22 secs.
??????dbo.fnUnionRange12 secs.

Function based on DavidT's script

See Generate Number Range 2 for the original script.

CREATE FUNCTION dbo.fnRange (
@first int , --##PARAM @first The lowest value in the range. 
@last int --##PARAM @last The highest value in the range.
)  
RETURNS @values TABLE ( value int primary key ) AS  
BEGIN
DECLARE @temp int
-- If the values have been supplied transposed then swap them to the correct way around.
IF @first > @last 
BEGIN
    SET @temp = @first
    SET @first = @last
    SET @last = @temp
END
WHILE @first<=@last
BEGIN
INSERT INTO @values
VALUES(@first)
SET @first = @first+1
END
RETURN
END
GO

The above function is straight forward as it simply loops between the first and last value

inserting records into a table variable that is then returned.

This function has three main advantages

  1. It is simple.
  2. Values are inserted in order.
  3. For small number ranges it is very efficient
  4. The range is limited only by the bounds of the SQL INT datatype. This can be extended by using BIGINT

Its disadvantage is that as number ranges get progressively larger its execution time increases. The reality is that execution times become excessive once the range gets into 7 figures.

Chris Cubley's version

See Generate Number Range for the original script.

In Chris's original he used a temporary table to hold the values 0 to 9. As this is a static dataset I decided to use a view instead.

CREATE VIEW dbo.Digits AS
select 0 as value
union allselect 1 as value
union allselect 2 as value
union allselect 3 as value
union allselect 4 as value
union allselect 5 as value
union allselect 6 as value
union allselect 7 as value
union allselect 8 as value
union allselect 9 as value
GO
--##SUMMARY This function provides a table of integers between the two specified arguments.
--##REMARKS The procedure allows for the two arguments being transposed.
CREATE FUNCTION dbo.fnCrossJoinRange (
@first int , --##PARAM @first The lowest value in the range. 
@last int --##PARAM @last The highest value in the range.
)  
RETURNS @values TABLE ( value int primary key ) AS  
BEGIN
INSERT INTO @values(value)
SELECTnum = units.value +
(tens.value * 10) +
(hundreds.value * 100) +
(Thousands.value * 1000) +
(TenThousands.value * 10000) +
(CThousands.value * 100000) +
(Millions.value * 1000000) 
FROMdbo.Digits units
CROSS JOIN dbo.Digits tens
CROSS JOIN dbo.Digits hundreds
CROSS JOIN dbo.Digits Thousands
CROSS JOIN dbo.Digits TenThousands
CROSS JOIN dbo.Digits CThousands
CROSS JOIN dbo.Digits Millions
where units.value +
(tens.value * 10) +
(hundreds.value * 100) +
(Thousands.value * 1000) +
(TenThousands.value * 10000) +
(CThousands.value * 100000) +
(Millions.value * 1000000)
BETWEEN @first and @last
RETURN
END
GO

The idea behind this function is that instead of doing a row by row insert we are playing to the strengths of SQL Server and are using set based operations. The theory behind this is that we cross join our number range (0 to 9) to itself repeatedly.

  • units contain the values 0 to 9
  • tens contain the values 10 to 90 in steps of 10 and zero
  • hundreds contain the values 100 to 900 in steps of 100 and zero
  • ...etc

By performing a CROSS JOIN we are getting the sum of every possible combination of each item.

Although this function was by far the slowest its lack of performance is principally due to the sheer volume of

records it generates before applying a filter. If you genuinely need to generate a range of numbers in the millions then it will actually out-perform

the dbo.fnRange function.

Unknown author's version

The function shown below combines the technique that Chris used in his script with a filter to limit the number of records that are combined.

Like Chris's version it is effectively doing 70 SELECT statements

  • 0 to 9
  • <10 to 90 in steps of 10 and zero
  • <100 to 900 in steps of 100 and zero
  • <1000 to 9000 in steps of 1000 and zero
  • <10000 to 90000 in steps of 10000 and zero
  • <100000 to 900000 in steps of 100000 and zero
  • <1000000 to 9000000 in steps of 1000000 and zero

Where it has the advantage over Chris's version is that it does not select values beyond the end of the range. Chris's version selects ALL values then applies a WHERE clause. This version pre-selects values prior to applying a final WHERE clause

CREATE FUNCTION dbo.fnUnionRange (
@first int , --##PARAM @first The lowest value in the range. 
@last int --##PARAM @last The highest value in the range.
)  
RETURNS @values TABLE ( value int primary key ) AS  
BEGIN
INSERT INTO @values(value)
select 
units.value
 +tens.value 
 +hundreds.value 
+Thousands.value 
+TenThousands.value 
+CThousands.value 
+Millions.value 
AS list
from(
select 0 as value
union allselect 1 as value where 1 <= @last
union allselect 2 as value where 2 <= @last
union allselect 3 as value where 3 <= @last
union allselect 4 as value where 4 <= @last
union allselect 5 as value where 5 <= @last
union allselect 6 as value where 6 <= @last
union allselect 7 as value where 7 <= @last
union allselect 8 as value where 8 <= @last
union allselect 9 as value where 9 <= @last
) AS Units ,
(
select 0 as value
union allselect 10 as value where 10 <= @last
union allselect 20 as value where 20 <= @last
union allselect 30 as value where 30<= @last
union allselect 40 as value where 40 <= @last
union allselect 50 as value where 50 <= @last
union allselect 60 as value where 60 <= @last
union allselect 70 as value where 70 <= @last
union allselect 80 as value where 80 <= @last
union allselect 90 as value where 90 <= @last
) AS Tens,
(
select 0 as value
union allselect 100 as value where 100 <= @last
union allselect 200 as value where 200 <= @last
union allselect 300 as value where 300 <= @last
union allselect 400 as value where 400 <= @last
union allselect 500 as value where 500 <= @last
union allselect 600 as value where 600 <= @last
union allselect 700 as value where 700 <= @last
union allselect 800 as value where 800 <= @last
union allselect 900 as value where 900 <= @last
) AS Hundreds,
(
select 0 as value
union allselect 1000 as value where 1000 <= @last
union allselect 2000 as value where 2000 <= @last
union allselect 3000 as value where 3000 <= @last
union allselect 4000 as value where 4000 <= @last
union allselect 5000 as value where 5000 <= @last
union allselect 6000 as value where 6000 <= @last
union allselect 7000 as value where 7000 <= @last
union allselect 8000 as value where 8000 <= @last
union allselect 9000 as value where 9000 <= @last
) AS Thousands,
(
select 0 as value
union allselect 10000 as value where 10000 <= @last
union allselect 20000 as value where 20000 <= @last
union allselect 30000 as value where 30000 <= @last
union allselect 40000 as value where 40000 <= @last
union allselect 50000 as value where 50000 <= @last
union allselect 60000 as value where 60000 <= @last
union allselect 70000 as value where 70000 <= @last
union allselect 80000 as value where 80000 <= @last
union allselect 90000 as value where 90000 <= @last
) AS TenThousands,
(
select 0 as value
union allselect 100000 as value where 100000 <= @last
union allselect 200000 as value where 200000 <= @last
union allselect 300000 as value where 300000 <= @last
union allselect 400000 as value where 400000 <= @last
union allselect 500000 as value where 500000 <= @last
union allselect 600000 as value where 600000 <= @last
union allselect 700000 as value where 700000 <= @last
union allselect 800000 as value where 800000 <= @last
union allselect 900000 as value where 900000 <= @last
) AS CThousands,
(
select 0 as value
union allselect 1000000 as value where 1000000 <= @last
union allselect 2000000 as value where 2000000 <= @last
union allselect 3000000 as value where 3000000 <= @last
union allselect 4000000 as value where 4000000 <= @last
union allselect 5000000 as value where 5000000 <= @last
union allselect 6000000 as value where 6000000 <= @last
union allselect 7000000 as value where 7000000 <= @last
union allselect 8000000 as value where 8000000 <= @last
union allselect 9000000 as value where 9000000 <= @last
) AS Millions
where 
units.value
 +tens.value 
 +hundreds.value 
+Thousands.value 
+TenThousands.value 
+CThousands.value 
+Millions.value 
between @first and @last
RETURN
END
GO

Verbose though the above function is, it is streets ahead of the others in terms of performance. I decided to combine the filtering techniques in this script with the elegance of Chris's version to produce a final version.

The final version

This final version offers similar performance to the verbose version. but, to my eyes at least, gains in readability.

ALTER FUNCTION dbo.fnCrossJoinRange2 (
@first int , --##PARAM @first The lowest value in the range. 
@last int --##PARAM @last The highest value in the range.
)  
RETURNS @values TABLE ( value int primary key ) AS  
BEGIN
INSERT INTO @values(value)
SELECTnum = units.value +
(tens.value) +
(hundreds.value ) +
(Thousands.value ) +
(TenThousands.value ) +
(CThousands.value ) +
(Millions.value ) 
FROMdbo.Digits units
CROSS JOIN (SELECT value * 10 as value from dbo.Digits WHERE value * 10 <=@last) tens
CROSS JOIN (SELECT value * 100 as value  from dbo.Digits WHERE value * 100 <=@last) hundreds
CROSS JOIN (SELECT value * 1000 as value  from dbo.Digits WHERE value * 1000 <=@last) Thousands
CROSS JOIN (SELECT value * 10000 as value  from dbo.Digits WHERE value * 10000 <=@last) TenThousands
CROSS JOIN (SELECT value * 100000 as value  from dbo.Digits WHERE value * 100000 <=@last) CThousands
CROSS JOIN (SELECT value * 1000000 as value  from dbo.Digits WHERE value * 1000000 <=@last) Millions
where units.value +
(tens.value ) +
(hundreds.value) +
(Thousands.value ) +
(TenThousands.value ) +
(CThousands.value ) +
(Millions.value )
BETWEEN @first and @last
RETURN
END
GO

Conclusions

The important thing to remember with SQL is that it is optimised for set based operations. A piece of code that would be very efficient in a procedural language won't necessarily be the best way to perform a task in T-SQL. This trips up most newbies and also the ORACLE crowd who tend to be much more dependent on cursors in their code.

One insert of 10,000 records is better than 10,000 inserts of 1 record.

However, I found that for small ranges, say under 500 records, there was little difference in performance between the procedural and set based versions. Once beyond the 500 threshold the set based version won hands down.

Rate

5 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (2)

You rated this post out of 5. Change rating