The Joy of Numbers

  • Good article. Interesting that my create table/insert while loop (to 10,000 rows) executed in 4 seconds and the set based solution executed in 17 seconds.

  • James,

    Would you mind posting both versions?  I'd like to do a little testing to see what's up... thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi James,

    Not sure why that would happen. I had the exact opposite experience on a few different SQL Server instances I tested on. There could be a lot of factors involved, however. One thing that comes to mind is if your SQL Server decided it needed to AutoGrow the database while creating the table, or some other oddity. If you could post the code you ran I'd love to test it out and see if I can reproduce it.

    Thanks!

  • I wrote the function in the code below to quickly generate number tables.

    It executed this code to load a table with 1,000,000 numbers in 6.780 seconds.  When I ran it to load a table with 10,000,000 numbers, it took about 136 seconds.  100,000 rows took 0.610 seconds, and 10,000 rows took 0.063 seconds.

    declare @t datetime
    declare @n table( number int )
    set @t = getdate()
    insert into @n
    select
     number
    from
     -- Function F_TABLE_NUMBER_RANGE available on this link
     -- http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685
     F_TABLE_NUMBER_RANGE(1,1000000)
    select ElapsedTime = getdate()-@t
     
     
  • If you always want the numbers table to start at 1, this demonstrates both an alternative to the function and a comparison of run times from 10 rows to 10 million rows...

    --===== Setup the test environment

        SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance & speed

    --===== Declare local variables

    DECLARE @DesiredRows INT        --Number of rows desired in the result object

    DECLARE @StartTime   DATETIME   --For calculation of duration times

    --================================================================================

    --===== Loop through the tests from 10 to 10 million =============================

        SET @DesiredRows = 10

      WHILE @DesiredRows <= 10000000

      BEGIN

    --================================================================================

    --      Test the SELECT/INTO Method (temp table)

    --================================================================================

    --===== If the temporary Tally (Numbers) table exists, drop it

         -- (Included just for test repeatability.  Should not have to do in real life

         -- because won't exist when new connection starts)

         IF OBJECT_ID('TempDB..#Tally','U') IS NOT NULL

            DROP TABLE #Tally

    --===== Limit the desired number of rows

        SET ROWCOUNT @DesiredRows

    --===== Start the timer and run the test

        SET @StartTime = GETDATE()

     SELECT IDENTITY(INT,1,1) AS N

       INTO #Tally

       FROM Master.dbo.SysColumns sc1 WITH (NOLOCK),

            Master.dbo.SysColumns sc2 WITH (NOLOCK)

    --===== Report the rowcount and duration in seconds

      PRINT STR(@@ROWCOUNT) + ' Rows Inserted '

          + CONVERT(CHAR(13),GETDATE()-@StartTime,114) + ' Duration SELECT/INTO Method'

    --===== Return to normal unlimited rowcounts

        SET ROWCOUNT 0

    --================================================================================

    --      Test the F_TABLE_NUMBER_RANGE Method (table variables)

    --      Function F_TABLE_NUMBER_RANGE available on this link  

    --      http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685  

    --================================================================================

    --===== Start the timer and run the test

        SET @StartTime = GETDATE()

    DECLARE @N TABLE(Number INT) --Not sure why this works in a loop, but it does

     INSERT INTO @N

     SELECT Number

       FROM dbo.F_TABLE_NUMBER_RANGE(1,@DesiredRows)

    --===== Report the rowcount and duration in seconds

      PRINT STR(@@ROWCOUNT) + ' Rows Inserted '

          + CONVERT(CHAR(13),GETDATE()-@StartTime,114) + ' Duration F_TABLE_NUMBER_RANGE Method'

    --===== Delete rows from the table variable so as not to build up rows between tests

    --TRUNCATE TABLE @N --Doesn't work on table variables...

    --DROP TABLE @N     --Neither does this...

    DELETE @N           --But this does.

      PRINT REPLICATE('=',78)

    --================================================================================

    --===== End of test loop =========================================================

        SET @DesiredRows = @DesiredRows * 10

        END

    ... and here's the results it produced on my humble 1.8 Ghz 1 GB Ram SQL Server 2000 SP 4 Developer's Edition desktop box at home...

            10 Rows Inserted 00:00:00:000  Duration SELECT/INTO Method

            10 Rows Inserted 00:00:00:000  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

           100 Rows Inserted 00:00:00:000  Duration SELECT/INTO Method

           100 Rows Inserted 00:00:00:013  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

          1000 Rows Inserted 00:00:00:000  Duration SELECT/INTO Method

          1000 Rows Inserted 00:00:00:017  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

         10000 Rows Inserted 00:00:00:033  Duration SELECT/INTO Method

         10000 Rows Inserted 00:00:00:077  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

        100000 Rows Inserted 00:00:00:203  Duration SELECT/INTO Method

        100000 Rows Inserted 00:00:00:750  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

       1000000 Rows Inserted 00:00:02:000  Duration SELECT/INTO Method

       1000000 Rows Inserted 00:00:08:093  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

      10000000 Rows Inserted 00:00:20:253  Duration SELECT/INTO Method

      10000000 Rows Inserted 00:01:48:123  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh yeah... almost forgot... if you want a nice clustered primary key on both, the test and test results are as follows:

    --===== Setup the test environment

        SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance & speed

    --===== Declare local variables

    DECLARE @DesiredRows INT        --Number of rows desired in the result object

    DECLARE @MyCount     INT        --Keeps track of row counts

    DECLARE @StartTime   DATETIME   --For calculation of duration times

    --================================================================================

    --===== Loop through the tests from 10 to 10 million =============================

        SET @DesiredRows = 10

      WHILE @DesiredRows <= 10000000

      BEGIN

    --================================================================================

    --      Test the SELECT/INTO Method (temp table)

    --================================================================================

    --===== If the temporary Tally (Numbers) table exists, drop it

         -- (Included just for test repeatability.  Should not have to do in real life

         -- because won't exist when new connection starts)

         IF OBJECT_ID('TempDB..#Tally','U') IS NOT NULL

            DROP TABLE #Tally

    --===== Limit the desired number of rows

        SET ROWCOUNT @DesiredRows

    --===== Start the timer and run the test

        SET @StartTime = GETDATE()

     SELECT IDENTITY(INT,1,1) AS N

       INTO #Tally

       FROM Master.dbo.SysColumns sc1 WITH (NOLOCK),

            Master.dbo.SysColumns sc2 WITH (NOLOCK)

        SET @MyCount = @@ROWCOUNT

    --===== Add a primary key to the new table

      ALTER TABLE #Tally

        ADD PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Report the rowcount and duration in seconds

      PRINT STR(@MyCount) + ' Rows Inserted '

          + CONVERT(CHAR(13),GETDATE()-@StartTime,114) + ' Duration SELECT/INTO Method'

    --===== Return to normal unlimited rowcounts

        SET ROWCOUNT 0

    --================================================================================

    --      Test the F_TABLE_NUMBER_RANGE Method (table variables)

    --      Function F_TABLE_NUMBER_RANGE available on this link  

    --      http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685  

    --================================================================================

    --===== Start the timer and run the test

        SET @StartTime = GETDATE()

    DECLARE @N TABLE(Number INT PRIMARY KEY CLUSTERED WITH FILLFACTOR = 100)

     INSERT INTO @N

     SELECT Number

       FROM dbo.F_TABLE_NUMBER_RANGE(1,@DesiredRows)

    --===== Report the rowcount and duration in seconds

      PRINT STR(@@ROWCOUNT) + ' Rows Inserted '

          + CONVERT(CHAR(13),GETDATE()-@StartTime,114) + ' Duration F_TABLE_NUMBER_RANGE Method'

    --===== Delete rows from the table variable so as not to build up rows between tests

    --TRUNCATE TABLE @N --Doesn't work on table variables...

    --DROP TABLE @N     --Neither does this...

    DELETE @N           --But this does.

      PRINT REPLICATE('=',78)

    --================================================================================

    --===== End of test loop =========================================================

        SET @DesiredRows = @DesiredRows * 10

        END

            10 Rows Inserted 00:00:00:110  Duration SELECT/INTO Method

            10 Rows Inserted 00:00:00:000  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

           100 Rows Inserted 00:00:00:110  Duration SELECT/INTO Method

           100 Rows Inserted 00:00:00:000  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

          1000 Rows Inserted 00:00:00:107  Duration SELECT/INTO Method

          1000 Rows Inserted 00:00:00:017  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

         10000 Rows Inserted 00:00:00:140  Duration SELECT/INTO Method

         10000 Rows Inserted 00:00:00:093  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

        100000 Rows Inserted 00:00:00:500  Duration SELECT/INTO Method

        100000 Rows Inserted 00:00:00:907  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

       1000000 Rows Inserted 00:00:04:063  Duration SELECT/INTO Method

       1000000 Rows Inserted 00:00:09:907  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

      10000000 Rows Inserted 00:00:51:033  Duration SELECT/INTO Method

      10000000 Rows Inserted 00:01:59:123  Duration F_TABLE_NUMBER_RANGE Method

    ==============================================================================

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Like Jackie Chan's Uncle would say "Oh! One more thing"...

    I keep a "Tally" table of 11,000 numbers as a permanent table... good for more than 30 years of dates by days (if you need that type of fuctionality) and certainly good enough for an 8k byte VARCHAR split function (if you need that). Here's how I build it...

    --===== Create and populate the Tally table on the fly

    SELECT TOP 11000 --equates to more than 30 years of dates

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)

    --===== Allow the general public to use it

    GRANT SELECT ON dbo.Tally TO PUBLIC

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, I think your tests may have taken advantage of having master.dbo.syscolumns in cache, something that may or may not be the case.  Also, one was loading a temp table with a SELECT INTO, and the other was doing an INSERT into a declared table.

    I created a test script that clears the data cache and procedure buffers, and does a SELECT INTO for both methods.

    My testing suggests a closer match in performance.  The crosstab on master.dbo.syscolumns was faster than F_TABLE_NUMBER_RANGE in many tests, but not always, as you can see from the test below with 15,000,000 rows.

    These tests were run on my SQL Server 2005 Developer Edition on my desktop PC.

    --Removes all clean buffers from the buffer pool
    dbcc dropcleanbuffers
    --Removes all elements from the procedure cache
    dbcc freeproccache
    go
    print 'Test SysColumns crosstab'
    declare @starttime datetime
    set @starttime = getdate()
    set rowcount 15000000
    select
     number = identity(int,1,1)
    into
     #t
    from
    master.dbo.syscolumns sc1 with (nolock)
    cross join
    master.dbo.syscolumns sc2 with (nolock)
    select Elapsed_Time_Cross = convert(char(13),getdate()-@starttime,114) 
    go
    drop table #t
    go
    --Removes all clean buffers from the buffer pool
    dbcc dropcleanbuffers
    --Removes all elements from the procedure cache
    dbcc freeproccache
    go
    print 'Test F_TABLE_NUMBER_RANGE'
    declare @starttime datetime
    set @starttime = getdate()
    select
     number
    into
     #t
    from
     dbo.F_TABLE_NUMBER_RANGE(1,15000000)
    select Elapsed_Time_Function = convert(char(13),getdate()-@starttime,114)
    go
    drop table #t
     
     
    DBCC execution completed. If DBCC printed error messages, contact your system administrator.
    DBCC execution completed. If DBCC printed error messages, contact your system administrator.
    Test SysColumns crosstab
    (15000000 row(s) affected)
    Elapsed_Time_Cross 
    ------------------ 
    00:00:37:140 
    (1 row(s) affected)
    DBCC execution completed. If DBCC printed error messages, contact your system administrator.
    DBCC execution completed. If DBCC printed error messages, contact your system administrator.
    Test F_TABLE_NUMBER_RANGE
    (15000000 row(s) affected)
    Elapsed_Time_Function 
    --------------------- 
    00:00:36:940 
    (1 row(s) affected)
     
  • Cool... thanks, Michael. 

    Yeah, clearing the cashe didn't seem to matter in my testing so I didn't include it in my code.  And, I very much like the idea of using your function to populate a temp table rather than a table variable.  Thanks for running the test!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm developing a system for tracking how many days it takes to complete a task from when it's requested, but I have to exclude weekends and holidays. The original system was written in Access, and the contractor had a very clunky solution where he looped through a date table, touching every row (RBAR) to do this calculation. I realized that I, too, would need a date table, so I did one, and then came the problem of loading it. I originally did a loop to insert 104ish weekend dates, and then realized that I could do it with a number table! I was very happy, this was the first time that I got to use it. Here's the code:

    [font="Courier New"]CREATE TABLE [dbo].[DatesWeekendsHolidays] (

    [WHDate] [smalldatetime] NOT NULL)

    GO

    --(The table Numbers contains a single smallint field, Number.)

    declare @ThisYear char(2)

    select @ThisYear = '08'

    -- DECLARE @StartDate smalldatetime

    -- DECLARE @EndDate smalldatetime

    -- DECLARE @LoopDate smalldatetime

    --

    -- set @StartDate = cast(('1/1/' + @ThisYear) as smalldatetime)

    -- set @EndDate = cast(('12/31/' + @ThisYear) as smalldatetime)

    -- set @LoopDate = @StartDate

    --Using the Numbers table lets me insert all of the dates IN ONE INSERT STATEMENT!

    insert DatesWeekendsHolidays

    select dateadd(dw, Number, cast(('1/1/' + @ThisYear) as smalldatetime))

    from Numbers

    where Number between 1 and 365

    and (datepart(dw, dateadd(dw, Number, cast(('1/1/' + @ThisYear) as smalldatetime))) = 1

    or datepart(dw, dateadd(dw, Number, cast(('1/1/' + @ThisYear) as smalldatetime))) = 7)

    --The loop method of inserting dates: 104 inserts.

    -- while @LoopDate <= @EndDate

    -- BEGIN

    -- set @dow = datepart(dw, @LoopDate)

    --

    -- if @dow = 1 or @dow = 7

    -- insert DatesWeekendsHolidays select @LoopDate

    -- select @LoopDate = dateadd(d, 1, @LoopDate)

    -- END

    --New Year's Day

    insert DatesWeekendsHolidays

    select cast(('1/1/' + @ThisYear) as smalldatetime)

    where not exists(select WHDate

    from DatesWeekendsHolidays

    where WHDate = cast(('1/1/' + @ThisYear) as smalldatetime))

    --insert 11/11, 12/25 -- holidays with fixed dates

    --user inserts variable date holidays (Thanksgiving) via Access/VB app[/font]

    Ultimately I'll write a SQL job that will run at 00:01 on 1/1 of every year and automatically populate at least the raw holidays since it is easy enough to extract the year from getdate(). It would also send an email to the responsible parties to make sure that the variable date holidays get entered.

    The loop insert method took one second in Query Analyzer to run, the numbers insert took no measurable time. Looking at the statistics, trace, and execution plan are quite amusing, seeing the difference between the single insert statement and 104 is pretty funny.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • i had a need for a large number table, from 1 to 1 billion in just one table so neither of these solutions worked. i used cte to get what i wanted, i was able to do 1 million rows in under 6 seconds on a pentium d with 7200 rpm drives. where as the fastest solution proved thus far was about a minute and a half.

    1 billion in an hour and a half awesome performance, i didnt have 3 days to wait for the others to finish. i thought id share my code

    after the "with" each line you add squares the previous number. so with this example we can put in just over 4 billion rows. the where clause limits the amount returned.

    after it is all inserted we add the pk and clustered indexes

    SET NOCOUNT ON;

    DROP TABLE dbo.NUMBERS_TEST;

    WITH

    N5(i) AS ( SELECT 1 UNION SELECT 0 ), -- 2

    N4(i) AS ( SELECT 1 FROM N5 D1 CROSS JOIN N5 D2 ), --4

    N3(i) AS ( SELECT 1 FROM N4 D1 CROSS JOIN N4 D2 ), --16

    N2(i) AS ( SELECT 1 FROM N3 D1 CROSS JOIN N3 D2 ), --256

    N1(i) AS ( SELECT 1 FROM N2 D1 CROSS JOIN N2 D2 ), --65536

    N0(i) AS ( SELECT 1 FROM N1 D1 CROSS JOIN N1 D2 ) --4294967296

    SELECT i INTO NUMBERS_TEST

    FROM ( SELECT ROW_NUMBER() OVER (ORDER BY i)

    FROM N0) D(i)

    WHERE i <= 1000000000 ; --Put your limit here

    GO

    ALTER TABLE dbo.NUMBERS_TEST

    ALTER COLUMN i BIGINT NOT NULL;

    GO

    ALTER TABLE dbo.NUMBERS_TEST

    ADD CONSTRAINT PK_NUMBERS_TEST PRIMARY KEY CLUSTERED (i);

    GO

  • I've just gotta ask... what did you need the 1 Billion row numbers table for?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • it was something of a math experiment. I was doing a specialized inner join on itself to find all prime numbers from 1 to a billion. its been going for about 3 days now. hahaha

  • Sounds like a lot of fun... thanks for the feedback. Did you index the numbers table with a clustered PK by any chance?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • yeah, i added the clustered index after the insert to improve performance. the insert took just over an hour and a half but adding the clustered index took around 5 and a half hours. haha.

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply