SQLServerCentral Article

Generating Missing Dates and Numbers

,

Introduction

In the past, several times I had read about generating missing dates and numbers. But I never paid serious attention to the scripts because I never came across an assignment which needed them. But recently I had to write a few stored procedures which dealt with missing numbers and missing dates.

Most of the stuff I read in the past were suggesting a temp table which holds a series of dates. So the 'Missing Dates' query can use a NOT IN clause with the pre-populated table and retrieve the missing dates.

In this article, I am presenting a different approach to generate Missing Dates and Missing Numbers without using a temp table. No, I am not against the temp table approach. But I am trying to present another way (which looks simpler to me) of doing the same thing.

Find Missing Numbers

It is obvious that, to find missing numbers, we need to have a table or resultset which contains all the numbers. Then we can apply a NOT IN clause and identify the missing numbers.

If you are wondering, why I needed missing numbers, here is the reason. In one of my applications, there is an entity that we call coordinator. Each coordinator has a 4 digit numeric code. The code starts at 1000 and ends at 9999. (There is no chance that we will have more than 9000 coordinators at any point of time, so a 4 digit code is sufficient). When the user creates a new coordinator, the system automatically generates a new coordinator number. However, the user can still edit it. The system will allow the user to edit the coordinator number (during a new entry) as long as it does not produce a duplicate code.

With the above functionality, we needed a way to reuse un-used/missing numbers. When we create a new coordinator, we need to find the lowest unused coordinator

code. For this purpose we needed a way to identify the missing numbers.

As mentioned earlier in this article, I did not want to keep a temp table. I want to generate the numbers on the fly. There are several ways to do this. I like to use a CTE to do this.

In one of my previous articles, I had presented a way to generate a sequence of numbers by using a CTE. It was fast. But a faster code fragment was posted by Kathi Kellenberger in the discussion forum. It was taken from Itzik Ben-Gan's book "Inside Microsoft SQL Server 2005 T-SQL Querying". Here is the code fragment which generates 1 million records in the fraction of a second. [code]

    1 WITH

    2 L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

    3 L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    4 L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    5 L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    6 L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    7 L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

    8 num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)

    9 SELECT N FROM NUM WHERE N <= 1000000;

This code is VERY VERY fast and I find it to be the best candidate for generating a sequence of numbers on the fly. I wanted to re-use this code and hence I created a function that takes a Minimum and Maximum value and returns a resultset containing sequence numbers within the given range. [code]

    1 CREATE FUNCTION dbo.GetNumbers

    2 (

    3     @Start BIGINT,

    4     @End BIGINT

    5 )

    6 RETURNS @ret TABLE(Number BIGINT)

    7 AS

    8 BEGIN

    9     WITH

   10     L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

   11     L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

   12     L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

   13     L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

   14     L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

   15     L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

   16     num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)

   17 

   18     INSERT INTO @ret(Number)

   19     SELECT N FROM NUM WHERE N BETWEEN @Start AND @End

   20 RETURN

   21 END

Once we have the function to generate the sequence number, we can easily write the code to identify the missing numbers. [code]

    1 SELECT MIN(Number)

    2     FROM dbo.GetNumbers(1000, 9999)

    3     WHERE Number NOT IN (SELECT CoordinatorID FROM Coordinators)

Find Missing Dates

I guess the requirement to find missing dates must be more common than the missing number requirement I mentioned above. To find the missing dates (Dates at which no sales took place, Dates at which a given employ was absent etc) we need to have a table or resultset which contains all the dates within a given range.

The following example shows how to generate a sequence of dates using the same logic we discussed earlier. [code]

    1 /*

    2     Generate a sequence of all the dates for the month

    3     of October, 2007

    4 */

    5 SELECT

    6     CAST('2007-10-01' AS DATETIME) + Number-1

    7 FROM dbo.GetNumbers(1, 30)

    8 

    9 /*

   10     Generate a sequence of all the dates for

   11     year 2007

   12 */

   13 SELECT

   14     CAST('2007-01-01' AS DATETIME) + Number-1

   15     FROM dbo.GetNumbers(1, 365)

Once we have a sequence of dates, we can easily apply a NOT IN clause and find the missing dates based on the specific requirement.

Conclusions

This article presents an approach to generate missing numbers and dates without using a temp table. I do not intend to claim that this approach is better than the others. There is no programming approach that suites all requirements. Based on the specific application scenario a certain approach may be found suitable than the others. I guess some of you may find this approach easier to integrate into your application specific requirements.

Resources

Rate

4.64 (33)

You rated this post out of 5. Change rating

Share

Share

Rate

4.64 (33)

You rated this post out of 5. Change rating