Generating Missing Dates and Numbers

  • You can also find missing dates with a Numbers table more easily than is outlined in the article.

    To set up the test:

    create table MissingDates (

    Date datetime primary key)

    go

    insert into dbo.missingdates(date)

    select dateadd(second, subtime.number, subdate.date)

    from

    (select dateadd(day, number, '1/1/08') as Date

    from common.dbo.Numbers

    where number <= 100) SubDate

    cross join

    (select number

    from common.dbo.BigNumbers

    where number <= (3600 * 24) - 1) SubTime

    go

    delete from dbo.missingdates

    where date between '1/12/08' and '1/13/08'

    go

    delete from dbo.missingdates

    where date between '1/30/08' and '1/31/08'

    This time, I tried my usual Numbers table method:

    select dateadd(day, number, '1/1/08')

    from common.dbo.numbers

    left outer join dbo.missingdates

    on date >= dateadd(day, number, '1/1/08')

    and date < dateadd(day, number + 1, '1/1/08')

    where date is null

    number between 0 and 31

    ------------------

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    (2 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'MissingDates'. Scan count 32, logical reads 5638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Numbers'. Scan count 3, logical reads 52, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 688 ms, elapsed time = 687 ms.

    ------------------

    Then I tried the method recommended in the article:

    SELECT

    CAST('2008-01-01' AS DATETIME) + Number-1

    FROM dbo.GetNumbers(1, 30)

    where CAST('2008-01-01' AS DATETIME) + Number-1 not in

    (select cast(convert(varchar(100), date, 101) as datetime)

    from dbo.missingdates)

    (The cast-convert is necessary because without that, it will simply detect missing midnights, not whole missing days.)

    I killed that process after it had been running for 3 minutes. Not sure how long it would have taken in all.

    Converted it to:

    select dateadd(day, number, '1/1/08')

    from dbo.getnumbers(1,31)

    left outer join dbo.missingdates

    on date >= dateadd(day, number-1, '1/1/08')

    and date < dateadd(day, number, '1/1/08')

    where date is null

    Which is pretty much identical to the query used with the Numbers table.

    ----------------

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    (2 row(s) affected)

    Table '#28D10FF3'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'MissingDates'. Scan count 31, logical reads 5418, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 673 ms, elapsed time = 366 ms.

    -----------------

    This time, the total elapsed time was shorter, but the CPU time was nearly identical. One less scan count on the MissingDates table, and the server was able to split the process across more than 1 CPU (the computer I'm testing this on is a dual-core Pentium D).

    Some advantage to the CTE, but in a loaded up server, it won't make much difference.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Weird, your CTE code took 16 seconds (twice) on my SQL Server

    that is not VERY VERY fast. Anyway, we don't have the need for it, I was just curious

    16 seconds on a SQL Server on VMWare, and 16 seconds on a physical SQL Server (8 cpu, 24GB RAM)

    WITH

    L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

    L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

    num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)

    SELECT N FROM NUM WHERE N <= 1000000;

    SQLServerNewbieMCITP: Database Administrator SQL Server 2005
  • What do folks think about adding directly to a datetime and allowing implicit conversions to take place?

    CREATE FUNCTION dbo.DateRange( @start DATETIME, @length INT )

    RETURNS @range TABLE (

    dateOf datetime PRIMARY KEY

    )

    AS

    BEGIN

    -- Populate the output table of dates using a number table

    INSERT INTO @range ( dateOf )

    SELECT

    @start + n AS dateOf

    FROM (

    SELECT (n1.n +n10.n +n100.n +n1000.n) AS n

    FROM (SELECT 0 AS n UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) AS n1

    CROSS JOIN (SELECT 0 AS n UNION SELECT 10 UNION SELECT 20 UNION SELECT 30 UNION SELECT 40 UNION SELECT 50 UNION SELECT 60 UNION SELECT 70 UNION SELECT 80 UNION SELECT 90) AS n10

    CROSS JOIN (SELECT 0 AS n UNION SELECT 100 UNION SELECT 200 UNION SELECT 300 UNION SELECT 400 UNION SELECT 500 UNION SELECT 600 UNION SELECT 700 UNION SELECT 800 UNION SELECT 900) AS n100

    CROSS JOIN (SELECT 0 AS n UNION SELECT 1000 UNION SELECT 2000 UNION SELECT 3000 UNION SELECT 4000 UNION SELECT 5000 UNION SELECT 6000 UNION SELECT 7000 UNION SELECT 8000 UNION SELECT 9000) AS n1000

    WHERE (n1.n +n10.n +n100.n +n1000.n) BETWEEN 0 AND @length - 1

    ) AS numbers

    ORDER BY

    n ASC;

    RETURN;

    END

  • I prefer this simple and elequent way to generate numbers.

    I doesn't generate unnecessary numbers and is easy to understand

    [font="Courier New"]CREATE FUNCTION dbo.GetSequence

    (

    @Start BIGINT,

    @End BIGINT,

    @Increment BIGINT

    )

    RETURNS @ret TABLE(Number BIGINT)

    AS

    BEGIN

    WITH

    seq(num)

    as

    (

    select @Start

    union all

    select num + @Increment from seq

    where num + @Increment <= @End

    )

    INSERT INTO @ret(Number)

    Select * From Seq

    END[/font]

  • Jeff/Jacob:

    first - Jacob - very nice. Good solid method for handling a common request.

    Jeff - I'll chalk this up to being the forever contrarian... But the comparison gets more interesting when you "right-size" the Itzik method. Notice what one tiny little change does (since we KNOW how many results we want....)

    SET STATISTICS TIME ON

    GO

    DECLARE @Bitbucket INT

    --=============================================================================

    PRINT REPLICATE('=',100)

    PRINT 'Itzek''s method:'

    ;WITH

    L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

    L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5 AS (SELECT 1 AS C FROM L4 AS A, L0 AS B),--131072 rows

    num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)

    SELECT @Bitbucket = N FROM NUM WHERE N <= 1000000;

    --=============================================================================

    PRINT REPLICATE('=',100)

    PRINT 'Jeff Moden''s Method'

    ; WITH cTally AS

    (-----------------------------------------------------------------------------

    --==== High performance CTE equivalent of a Tally or Numbers table

    SELECT TOP (1000000)

    ROW_NUMBER() OVER (ORDER BY t1.ID) AS N

    FROM Master.sys.SysColumns t1

    CROSS JOIN Master.sys.SysColumns t2

    )-----------------------------------------------------------------------------

    SELECT @Bitbucket = N FROM cTally --Do your outer join with table being checked here

    PRINT REPLICATE('=',100)

    Can you spot the difference (it's in bold....hehe)?

    Of course - we could debate why we're fighting so hard to reclaim 200ms or less, but - what's the fun in that?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • @matt-2

    I think you'll find that if you want to do a "true" (well not true, but closer than it was) "right sizing" of the comparison, then you'll need to make the adjustment in bold.

    L5 AS (SELECT 1 AS C FROM L4 AS A, L2 AS B),--1048576 rows

    I think you'll find that this puts Jeff's method back on top.

    As a frequent reader but not a frequent poster, I know you and Jeff have this back and forth thing...but I felt compelled to jump in here. πŸ˜‰

  • John Beggs (2/6/2008)


    @Matt

    I think you'll find that if you want to do a "true" (well not true, but closer than it was) "right sizing" of the comparison, then you'll need to make the adjustment in bold.

    L5 AS (SELECT 1 AS C FROM L4 AS A, L2 AS B),--1048576 rows

    I think you'll find that this puts Jeff's method back on top.

    As a frequent reader but not a frequent poster, I know you and Jeff have this back and forth thing...but I felt compelled to jump in here. πŸ˜‰

    ACK - caught at my own game I see...hehe. It would help if I returned a million rows when asked for them (and not 100,000)

    You're right I dropped a zero in there. The funny part about it is if you run both version @100K rows, Itzik's does in fact win (47ms to 62ms).

    However - by the time you hit 1M to return, the ORDER BY in the Itzik method makes it less efficient (since sys.all_columns is already indexed).

    Hey - jump in any time! we like to bash each other with tests, demolition derby-style, so - the more the merrier:).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks for the retest, John πŸ™‚

    Sure... Matt and I go round'n'round... it's all in good nature and we both learn things in the process. Hopefully, everyone both understands that and gets a benefit from it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Tony McGarry (2/6/2008)


    I prefer this simple and elequent way to generate numbers.

    I doesn't generate unnecessary numbers and is easy to understand

    [font="Courier New"]CREATE FUNCTION dbo.GetSequence

    (

    @Start BIGINT,

    @End BIGINT,

    @Increment BIGINT

    )

    RETURNS @ret TABLE(Number BIGINT)

    AS

    BEGIN

    WITH

    seq(num)

    as

    (

    select @Start

    union all

    select num + @Increment from seq

    where num + @Increment <= @End

    )

    INSERT INTO @ret(Number)

    Select * From Seq

    END[/font]

    You might want to reconsider, Tony... It takes 47 seconds to generate a million numbers if you need it... and, you need to add a couple of things to it to get it to count that high...

    drop function getsequence

    GO

    CREATE FUNCTION dbo.GetSequence

    (

    @Start BIGINT,

    @End BIGINT,

    @Increment BIGINT

    )

    RETURNS @ret TABLE(Number BIGINT)

    AS

    BEGIN

    WITH

    seq(num)

    as

    (

    select @Start

    union all

    select num + @Increment from seq

    where num + @Increment <= @End

    )

    INSERT INTO @ret(Number)

    Select * From Seq

    OPTION (MAXRECURSION 0)

    RETURN

    END

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Also on Tony's function, unless you're planning on going over 2-billion (American billion), don't use BigInt. Int is good up to 2,147,483,647 per BOL. Making it BigInt just takes more RAM, adding to the probability of having to dump into onto the disk in TempDB, and adds a conversion step, in most cases. That's almost certain to be at least slightly slower.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • -- ΒΏCan anybody explain why the following sentence returns an error?Thanks

    WITH T1 AS

    ( SELECT CAST('2007-01-01' AS DATETIME) + Number - 1 as MYDATE

    FROM dbo.GetNumbers(1, datediff(dd, '31/12/2007', '01/01/2007')

    )

    )

    SELECT * from T1

    /* the GetNumbers function is the one previously defined this post */

  • Which error? The one about conversion from char to date producing an out of range value? And, even if you fix that, you'll still end up with a negative number as a feed to the second operand of the function because your larger date comes before the smaller. You might also come up with a syntax error because of a missing ";".

    Try this and see what you get...

    SET DATEFORMAT DMY

    ;WITH T1 AS

    ( SELECT CAST('2007-01-01' AS DATETIME) + Number - 1 as MYDATE

    FROM dbo.GetNumbers(1, datediff(dd, '01/01/2007','31/12/2007')+1)

    )

    SELECT * from T1

    Just an FYI... if you get into the habit of using ISO date literals and always prefacing the CTE with a ";", you'll never have such problems as you've had with this one... like this...

    ;WITH T1 AS

    ( SELECT CAST('2007-01-01' AS DATETIME) + Number - 1 as MYDATE

    FROM dbo.GetNumbers(1, datediff(dd, '20070101','20071231')+1)

    )

    SELECT * from T1

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • hello Jeff

    thanks a lot for your reply.

    (FYI, I normally don't use literals. I put literals only to make my sentence more readable fro the blog. And sorry about the inverted dates, I copied them reversed from my code)

    PLEASE, CONSIDER THAT IN ANY CASE I GET A SINTAX ERROR. EVEN YOUR LAST PIECE OF CODE RETURNS A SINTAX ERROR ("Sintax Error near Datediff".

    Please verify

    thanks

    C a r l o G r i g o l i n i

  • Then, you need to post the exact code you're getting the error with and the exact error... because the code I posted gives no such error on my machine. You might also want to post the version number of the SQL Server you're using.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • jerryhung (2/6/2008)


    Weird, your CTE code took 16 seconds (twice) on my SQL Server

    that is not VERY VERY fast. Anyway, we don't have the need for it, I was just curious

    16 seconds on a SQL Server on VMWare, and 16 seconds on a physical SQL Server (8 cpu, 24GB RAM)

    WITH

    L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

    L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

    num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)

    SELECT N FROM NUM WHERE N <= 1000000;

    Jerry,

    I believe you're probably looking at the little elapsed time meter in the lower right corner of the display. That includes the total time to execute AND display. Usually, this type of thing is consumed rather than displayed.

    To see how long it actually takes to run, try this...

    SET STATISTICS TIME ON

    GO

    WITH

    L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

    L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

    num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)

    SELECT N FROM NUM WHERE N <= 1000000;

    That'll produce message output like the following...

    [font="Courier New"]SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 922 ms, elapsed time = 19219 ms.[/font]

    The "CPU time" is how long it takes the server to execute the code... the "elapsed time" includes the amount of time it takes to display the million rows.

    Just so you can see the difference, let's jam the results into a throw-away variable and measure the time the same way...

    SET STATISTICS TIME ON

    GO

    DECLARE @Bitbucket INT

    ;WITH

    L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

    L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

    num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)

    SELECT @Bitbucket = N FROM NUM WHERE N <= 1000000;

    That produces ...

    [font="Courier New"]SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server parse and compile time:

    CPU time = 78 ms, elapsed time = 80 ms.

    SQL Server Execution Times:

    CPU time = 891 ms, elapsed time = 1022 ms.[/font]

    Notice that the elapsed time for the final step (the CTE) is much less because the only thing it had to display was the execution time instead of a million rows.

    So, just to summarize... the speed of execution that everyone is talking about is the "CPU Time"... and that doesn't include any of the time it takes to display results (elapsed time) because results for these types of things are normally consumed by the system rather than displayed to the user.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 16 through 30 (of 46 total)

You must be logged in to reply to this topic. Login to reply