Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Generating Missing Dates and Numbers


Generating Missing Dates and Numbers

Author
Message
jacob sebastian
jacob sebastian
SSC-Addicted
SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)

Group: General Forum Members
Points: 472 Visits: 2523
Comments posted to this topic are about the item Generating Missing Dates and Numbers

.
corey lawson
corey lawson
SSC Veteran
SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)

Group: General Forum Members
Points: 204 Visits: 544
Another thing that works in order to find the gaps is to "interleave" the numbers together.

Say you have a table where the ID field (works with date values too) is usually incremented, but could have gaps in it...

select t1.id, t1.id+1 as nextval
from t1 left outer join t2
on t1.id = t2.id - 1
where t2 is null

I think I got this from one of JCelko's books.

It of course doesn't help fill in the blanks in and of itself, but by providing the next number for the lowest number available to be filled, it definitely could be used in an application.

The cartesian joins to generate the list of numbers is slick, though!!!
neilmatthews
neilmatthews
SSC Journeyman
SSC Journeyman (95 reputation)SSC Journeyman (95 reputation)SSC Journeyman (95 reputation)SSC Journeyman (95 reputation)SSC Journeyman (95 reputation)SSC Journeyman (95 reputation)SSC Journeyman (95 reputation)SSC Journeyman (95 reputation)

Group: General Forum Members
Points: 95 Visits: 114
How about

SELECT TOP 1 A AS FirstMissingNumber
FROM (SELECT ROW_NUMBER() OVER (ORDER BY CoordinatorID) AS A, CoordinatorID
FROM Coordinators) T
WHERE A < CoordinatorID
brewmanz
brewmanz
SSC-Addicted
SSC-Addicted (489 reputation)SSC-Addicted (489 reputation)SSC-Addicted (489 reputation)SSC-Addicted (489 reputation)SSC-Addicted (489 reputation)SSC-Addicted (489 reputation)SSC-Addicted (489 reputation)SSC-Addicted (489 reputation)

Group: General Forum Members
Points: 489 Visits: 406
I think you'll find that
SELECT
CAST('2007-10-01' AS DATETIME) + Number-1
FROM dbo.GetNumbers(1, 30)

assumes a 30-day month, and misses 2007-10-31
Alocyte
Alocyte
SSC Rookie
SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)

Group: General Forum Members
Points: 33 Visits: 104
Hi. With great interest I note your approach, and the approach using id = id-1.

I have this but more: we get data for (11) different types, each type has it's own incrementing by 1 id. The volume of data is around 3mill rows for all 11 types in any given day. I use a table, refNum (1-3000000), right joined with a Select max(Num) from Msgs group by ServiceCode to get a Range per ServiceCode, and using this I resolve a list of MsgType, MsgIds missing by using a left join.

Select distinct convert(varchar(10),getdate(),120) as EffectiveDate, ALLPOSS.ServiceCode, Number as BeginSeqNo, Number as EndSeqNo
from MsgsReceived right join (
Select servicecode, number
from (
Select ServiceCode, max(MsgSeqNo) LIMIT
from msgsReceived
group by ServiceCode ) T right join refNums N
on Limit >= Number
where servicecode is not null ) as ALLPOSS
on MsgsReceived.ServiceCode = ALLPOSS.ServiceCode
and MsgsReceived.MsgSeqNo = ALLPOSS.Number
where (MsgSeqNo is null )

This results in a row for each missing number, by servicecode.

Alternatively, to get the gaps in the data, I run:
Select EffectiveDate, ServiceCode, ( Select isnull(max(MsgSeqNo),0) + 1
from MsgsReceived M
where M.EffectiveDate = MsgsReceived.EffectiveDate
and M.ServiceCode = MsgsReceived.ServiceCode
and M.MsgSeqNo < MsgsReceived.MsgSeqNo
) as BeginSeqNo , MsgSeqNo - 1 as EndSeqNo
from MsgsReceived
where MsgSeqNo -1 > ( Select isnull(max(MsgSeqNo),0)
from MsgsReceived M
where M.EffectiveDate = MsgsReceived.EffectiveDate
and M.ServiceCode = MsgsReceived.ServiceCode
and M.MsgSeqNo < MsgsReceived.MsgSeqNo
)

The problem is, both of these queries, rows and blocks, kill our server.

The next thought was the queries source table's indexes:

CREATE TABLE [dbo].[MsgsReceived] (
[EffectiveDate] [datetime] NOT NULL ,
[ServiceCode] [varchar] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[MsgSeqNo] [bigint] NOT NULL ,
[IsRequested] [bit] NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[refNums] (
[Number] [bigint] NOT NULL
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[MsgsReceived] WITH NOCHECK ADD
CONSTRAINT [PK_MsgsReceived] PRIMARY KEY CLUSTERED
(
[EffectiveDate],
[ServiceCode],
[MsgSeqNo],
[IsRequested]
) ON [PRIMARY]
GO

CREATE CLUSTERED INDEX [IX_Number] ON [dbo].[refNums]([Number]) ON [PRIMARY]
GO

ALTER TABLE [dbo].[MsgsReceived] ADD
CONSTRAINT [DF_MsgsReceived_IsRequested] DEFAULT (0) FOR [IsRequested],
CONSTRAINT [IX_MsgsReceived] UNIQUE NONCLUSTERED
(
[EffectiveDate],
[ServiceCode],
[MsgSeqNo]
) WITH FILLFACTOR = 90 ON [PRIMARY]
GO

CREATE INDEX [IX_MsgsReceived_1] ON [dbo].[MsgsReceived]([EffectiveDate]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO

CREATE INDEX [IX_MsgsReceived_2] ON [dbo].[MsgsReceived]([MsgSeqNo]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO

CREATE INDEX [IX_MsgsReceived_3] ON [dbo].[MsgsReceived]([ServiceCode]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO

CREATE INDEX [IX_MsgsReceived_4] ON [dbo].[MsgsReceived]([IsRequested]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO

ALTER TABLE [dbo].[refNums] ADD
CONSTRAINT [PK_refNums] PRIMARY KEY NONCLUSTERED
(
[Number]
) ON [PRIMARY]
GO


Any suggestions on how to get the missing seqno by servicecode, using minimum IO / CPU?
Any help will be greatly appreciated.

So long, and thanks for all the fishpaste ;-)
aaron.bertrand
aaron.bertrand
Valued Member
Valued Member (61 reputation)Valued Member (61 reputation)Valued Member (61 reputation)Valued Member (61 reputation)Valued Member (61 reputation)Valued Member (61 reputation)Valued Member (61 reputation)Valued Member (61 reputation)

Group: General Forum Members
Points: 61 Visits: 695
I like it. For the date calculations, I suggest a wrapper function that takes start date and end date, or start date and range type, and calculates the number of days first, this way you don't have to know how many days are in October or whether February 2000 was a leap year...
jacob sebastian
jacob sebastian
SSC-Addicted
SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)SSC-Addicted (472 reputation)

Group: General Forum Members
Points: 472 Visits: 2523
agreed!

.
GSquared
GSquared
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14385 Visits: 9729
Yet again, I have to say this is clever, but a Numbers table is better.

Method in the article, run against a table with 9989 rows:

----------------

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 4 ms.

(1 row(s) affected)
Table 'Table'. Scan count 1, logical reads 39996, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#28D10FF3'. Scan count 1, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 140 ms, elapsed time = 137 ms.

----------------

Numbers table method:


select min(number)
from Common.dbo.Numbers
left outer join dbo.Table
on number = id
where id is null
and number > 0



--------------

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.

(1 row(s) affected)
Table 'Table'. Scan count 1, logical reads 19, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Numbers'. Scan count 1, logical reads 19, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 6 ms.

-----------------

I ran each five times and results were +/- 2 ms total.

To make it more fair, I changed the method in the article to a left join, instead of a Where Not In:


SELECT MIN(Number)
FROM dbo.GetNumbers(0, 9999)
left outer join dbo.Table
on number = id
WHERE id is null



--------------

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.

(1 row(s) affected)
Table 'Table'. Scan count 0, logical reads 19998, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#28D10FF3'. Scan count 1, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 109 ms, elapsed time = 103 ms.

----------------

Which was a measurable improvement over the Not In method, but still much slower than a Numbers table.

Test against code suggested in article: Numbers table 20 times as fast
Test against article modified to Left Join instead of Not In: Numbers table 15 times as fast

Yes, if you for some reason need to create a list of sequential numbers on the fly, this CTE is probably the fastest means of doing so. It certainly is at least A fast means of doing so. But having an actual Numbers table, with a clustered index on it, is MUCH better.

(Yes, this is the third or fourth time I've written on this exact subject. But for whatever reason, it keeps coming up.)

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Christian Buettner-167247
Christian Buettner-167247
SSCrazy
SSCrazy (3K reputation)SSCrazy (3K reputation)SSCrazy (3K reputation)SSCrazy (3K reputation)SSCrazy (3K reputation)SSCrazy (3K reputation)SSCrazy (3K reputation)SSCrazy (3K reputation)

Group: General Forum Members
Points: 2951 Visits: 3889
Hello Jacob,

Beautiful article, nice and crisp!

I have recently used a similar approach for identifying the next business date for a given date which required the recognition of weekends and holidays.
The solution was to
- select the minimum date from this dates set as you described
- which is greater than the date of the transaction
- which is not a weekend (datename <> Sunday / Saturday)
- which is not a holiday (date not in HolidayTable)

Best Regards,

Chris Büttner
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45241 Visits: 39927
Very nicely done, Jacob.

Please... not trying to take anything away from this great article... just sharing a different method...

As a slightly easier to remember (and, a bit faster, too) approach to generating numbers, check out the following (comparison between Itzek's and a method that I and several others use)...

SET STATISTICS TIME ON
GO
DECLARE @BitBucket INT
--=============================================================================
PRINT REPLICATE('=',100)
PRINT 'Itzek''s method:'
;WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows
L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows
L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows
L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows
L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows
num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)
SELECT @BitBucket = N FROM NUM WHERE N <= 1000000;

--=============================================================================
PRINT REPLICATE('=',100)
PRINT 'Jeff Moden''s Method'
; WITH cTally AS
(-----------------------------------------------------------------------------
--==== High performance CTE equivalent of a Tally or Numbers table
SELECT TOP (1000000)
ROW_NUMBER() OVER (ORDER BY t1.ID) AS N
FROM Master.sys.SysColumns t1
CROSS JOIN Master.sys.SysColumns t2
)-----------------------------------------------------------------------------
SELECT @BitBucket = N FROM cTally --Do your outer join with table being checked here
PRINT REPLICATE('=',100)



True, Itzeks's will generate more than what an INT can handle, but, how often do you think you're gonna need to generate more than 121 million numbers? Wink

Here's the same thing as a programmable function...

 CREATE FUNCTION dbo.fnTally
/****************************************************************************************
Purpose:
Given a range of Integers not exceeding a count of 121 million, return the range of
numbers as a table.

Notes: Preserved as an "inline" single statement function for sheer performance.
Therefore, no error checking, etc.

Revision History:
Rev 00 - 23 Dec 2005 - Jeff Moden - Initial creation and unit test
****************************************************************************************/
--===== Declare the parameters
(
@piStartNumber INT,
@piEndNumber INT
)
RETURNS TABLE
AS
RETURN (WITH cTally AS
(--------------------------------------------------------------------------------
--==== High performance CTE equivalent of a Tally or Numbers table
SELECT TOP (@piEndNumber-@piStartNumber+1)
ROW_NUMBER() OVER (ORDER BY t1.ID) AS Number
FROM Master.sys.SysColumns t1
CROSS JOIN Master.sys.SysColumns t2
)--------------------------------------------------------------------------------
SELECT N = Number+@piStartNumber-1 FROM cTally
)



Still, a small (11k rows - 30 years of dates) permanent Tally table with a Clustered Index on N will usually beat calculated table functions once the table is cached.

Again, I say, nice article, Jacob! Do it again!

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search