Blog Post

Nifty queries using a Numbers helper table

,

(The complete usable code is in the very bottom of this blog post)

Some of you may have seen this “numbers table” technique in use before, but I thought I would share a few simpel examples to those of you that haven’t seen the light yet Smiley

Lets start with a classic problem: Converting a delimited string into a table.

In many old setups I have seen, it’s not uncommon to have the need for passing multiple values to a stored procedure, to look up data for multiple values in one go. Let me build my demo table, and show how such a stored procedure could look like:

CREATE TABLE MyFavoriteMovies
(
    MovieId INT IDENTITY(1,1) PRIMARY KEY,
    MovieTitle VARCHAR(100),
    NumberOfViews INT
)
GO
INSERT INTO MyFavoriteMovies (MovieTitle, NumberOfViews)
VALUES
    ('Gone with the Wind', 23),
    ('Driving Miss Daisy', 9),
    ('Martha', 16),
    ('Fætrene på Torndal', 33),
    ('Casablanca', 8),
    ('Dumb & Dumber', 6),
    ('Wargames', 24),
    ('Geniiius, the movie', NULL)
GO
CREATE PROCEDURE FetchMovieTitles
@MovieIds VARCHAR(1000)
AS
DECLARE @sqlcmd VARCHAR(MAX) = '
    SELECT * FROM MyFavoriteMovies
    WHERE MovieId IN (' + @MovieIds + ')'
EXEC (@sqlcmd)
GO

 

I created a table to hold a list of my favorite movies, and then I have a stored procedure i can execute, that takes a comma separated list of MovieId’s as parameter. It’s not pretty, I know, but nevertheless it’s a template I have seen numerous times over the time.

So why is this not ideal?

The obvious reason is, that because the stored procedure generates the sql command dynamically, and then executes it without any check whatsoever, it’s open for sql injections. On top of that security issue, it also generates a new execution plan for every different parameter that the procedure is executed with. If my query was more complex in the procedure, the dynamic sql could easily becomre nasty to work with. Think about the single quote which serves both as the escape character but also surrounds strings. An example of this could look like this:

DECLARE @sqlcmd VARCHAR(MAX) = '
    SELECT * FROM MyFavoriteMovies
    WHERE MovieId IN (' + @MovieIds + ')
    AND MovieTitle = ''Wargames'''

 

Ugly, right?

So how can we fix this you ask? Imaging we could convert the list of MovieId’s to a table, then the query could look something like this:

SELECT * FROM MyFavoriteMovies
WHERE MovieId IN (SELECT MovieId FROM #MovieIds)
AND MovieTitle = 'Wargames'

 

Now we got rid of the dynamic sql, and it’s much prettier to read. But we don’t have the #MovieIds table yet, so we need to find a way to convert our delimited string to a table.

Let’s start to build the solution up, by looking at this:

DECLARE @MovieIds VARCHAR(1000) = '1,3,5'
SELECT SUBSTRING(@MovieIds, 1, len(@MovieIds))
SELECT SUBSTRING(@MovieIds, 2, len(@MovieIds))
SELECT SUBSTRING(@MovieIds, 3, len(@MovieIds))
SELECT SUBSTRING(@MovieIds, 4, len(@MovieIds))
SELECT SUBSTRING(@MovieIds, 5, len(@MovieIds))

 

image

Here I have five select statements, which performs a substring on the variable, but with an increasing offset going from 1 to 5. Notice that the length of the @MovieIds string is exactly 5 characters. Does this get my any closer to the solution? Can you spot the pattern? The first character in the first, third and fifth result is actually the values I’m looking for. But how could I identify these? What if we instead of selecting the rest of the string in each of the substring command, could stop at the next comma?

DECLARE @MovieIds VARCHAR(1000) = '1,3,5'
SELECT SUBSTRING(@MovieIds, 1, CHARINDEX(',', @MovieIds+',', 1) - 1)
SELECT SUBSTRING(@MovieIds, 2, CHARINDEX(',', @MovieIds+',', 2) - 2)
SELECT SUBSTRING(@MovieIds, 3, CHARINDEX(',', @MovieIds+',', 3) - 3)
SELECT SUBSTRING(@MovieIds, 4, CHARINDEX(',', @MovieIds+',', 4) - 4)
SELECT SUBSTRING(@MovieIds, 5, CHARINDEX(',', @MovieIds+',', 5) - 5)

 

image

The length of the substring is now replaced by the CHARINDEX function. Once again notice the pattern with the increasing values from 1 to 5. The result now almost look like what I want. If I could filter out the second and fourth row, and merge the rest into a table – then I’m done.

Enter Numbers table

CREATE TABLE Numbers (
    n INT PRIMARY KEY
)
GO
INSERT INTO Numbers (n)
VALUES (1), (2), (3), (4), (5)
GO
SELECT n FROM Numbers
GO

 

image

Do you see a pattern that looks like the substring queries above? Now we can actually use the n column from the Numbers table instead of the hardcoded values for the increasing numbers 1 to 5. That would look like this:

 

DECLARE @MovieIds VARCHAR(1000) = '1,3,5'
SELECT    SUBSTRING(@MovieIds, n, CHARINDEX(',', @MovieIds+',', n) - n)
FROM    Numbers

Now I select from the Numbers table, and the only coulumn I select, is the substring function from before – but now with column n instead of the hardcoded values 1 to 5. The output:

image

 

It’s getting warm… we are almost there.

To filter out the rows I don’t want, I need to find something I can use in a where clause. The easiest is to simply use the very same substring line, and the find the rows with a value <> ‘’. That would look like this:

DECLARE @MovieIds VARCHAR(1000) = '1,3,5'
DECLARE @Delimiter VARCHAR(5) = ','
SELECT    SUBSTRING(@MovieIds, n, CHARINDEX(@Delimiter, @MovieIds+@Delimiter, n) - n) as MyCol
FROM    Numbers
WHERE    SUBSTRING(@MovieIds, n, CHARINDEX(@Delimiter, @MovieIds+@Delimiter, n) - n) <> ''

image

This looks very much like what we need. If we put this in a function called dbo.SplitToTable() that takes the parameters @Input and @Delimiter, we can write our original procedure like this:

CREATE FUNCTION dbo.SplitToTable(@Input VARCHAR(1000), @Delimiter VARCHAR(5))
RETURNS TABLE
AS RETURN
(
SELECT    SUBSTRING(@Input, n, CHARINDEX(@Delimiter, @Input+@Delimiter, n) - n) as val
FROM    Numbers
WHERE    SUBSTRING(@Input, n, CHARINDEX(@Delimiter, @Input+@Delimiter, n) - n) <> ''
)
GO
ALTER PROCEDURE FetchMovieTitles
@MovieIds VARCHAR(1000)
AS
SELECT * FROM MyFavoriteMovies
WHERE MovieId IN (SELECT SUBSTRING(val AS INT) FROM dbo.SplitToTable(@MovieIds, ','))
GO
EXEC FetchMovieTitles @MovieIds = '1,3,5'
GO

 

image

Now I have accomplished my goal to get rid of the dynamic sql to avoid security risks, and to be able to reuse cached execution plans.

There are a few corrections that need to be made to make this work for longer inputs. First of all we need to have more values in our numbers table. 1 to 5 only covers input string up to a length of 5. The where clause in the function can also be optimized a bit. I won’t go into the details, but the setup I often implement uses the code below:

CREATE TABLE Numbers (
    n INT PRIMARY KEY
)
--Fill data in Numbers table (this needs only to be done 1 time)
;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65.536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4.294.967.296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
INSERT Numbers (n)
SELECT Number
FROM Tally
WHERE Number < 1000000 --Fill 1 million rows (or as many as you like)
                       --This will fill approximately 13 MB
GO
CREATE FUNCTION [dbo].[SplitToTable]
(
    @Input  varchar(8000),
    @Delimiter varchar(5)
)
RETURNS TABLE
AS
RETURN
(
    SELECT
        SUBSTRING(@Input+@Delimiter, n, CHARINDEX(@Delimiter, @Input+@Delimiter, n) - n) AS Val
    FROM Numbers
    WHERE
        n <= LEN(@Input)
        AND SUBSTRING(@Delimiter + @Input, n, 1) = @Delimiter
)
GO
SELECT * FROM dbo.SplitToTable('hello|world|pipe|delimited|string', '|')
GO

 

image

 

To use the Numbers table one needs to think a bit differently about things, than the normal way of solving problems. The way I described the solution in this blog post, pretty much covers how I often attach a new problem, where my gut tells me that the numbers table might come in handy. What if I need a table with all dates the next 30 days? Simple:

SELECT DATEADD(DD, n, '2012-04-01')
FROM Numbers
WHERE n <= 30

Sometimes I end up with pretty nifty solutions to crazy problems, that otherwise was really nasty to solve. This SplitToTable function is just one of the many examples of things to use the Numbers table for. If you would like to learn more about this, just google for auxiliary numbers table, and a lot of different solutions and ideas pop up.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating