Fastes way to increment an id column

  • Comments posted to this topic are about the item Fastes way to increment an id column

  • ???????????????

  • Hi Carlo

    You have been faster than me... I wanted to explain directly after it becomes published.

    Sure since SQL Server 2005 this is not needed any more because of the ROW_NUBER() function!

    The reason for the script is handling of unique identities within SQL Server 7.0/2000 in bulk operations. It is no big clue but I often get the question from some of my developers how to handle a incremental id without a loop. (In SQL Server 2000)

    For SQL Server 2005 this makes no sense! I will update the description to explain.

    Hope that explains...

    Greets

    Flo

  • -- what about IDENTITY??

    SET NOCOUNT ON

    -- Just a simple test table

    IF (OBJECT_ID('tempdb..#my_tab') IS NOT NULL)

    DROP TABLE #my_tab

    CREATE TABLE #my_tab (id INT NOT NULL IDENTITY(1000,1), any_date DATETIME)

    -- Create 100000 test values

    PRINT 'Test data'

    PRINT '-> Creating'

    DECLARE @i INT

    SET @i = 0

    WHILE (@i < 100000)

    BEGIN

    INSERT INTO #my_tab (any_date) VALUES (DATEADD(MINUTE, @i * -1, GETDATE()))

    SET @i = @i + 1

    END

    PRINT '<- done'

  • Hi Carlo

    E.g. in our business case we are not able to use IDENTITY due to problems with replication. If you can/could use IDENTITY columns in SSE 2000 you don't need it. But if you use replications there some cases which make problems with them because of the seed.

    Greets

    Flo

  • -- try to use this for replication:

    set identity_insert my_table on

  • Hi Carlo

    Thanks for feedback! There is another reason for me that I cannot use IDENTITIES, our C# OR-Mapper cannot handle them (not my design...).

    I'm no DBA, I'm a developer so I cannot explain all the problems with replication. My DBA told me and Noeld confirmed it here:

    http://www.sqlservercentral.com/Forums/Topic669595-338-1.aspx

    I will update the description of the script to ensure that everybody understands the reason of it!

    Greets

    Flo

  • I agree with Flo... if, for some reason, you cannot use IDENTITY and you're stuck with SQL Server 2000 or less which prevents the use of ROW_NUMBER and the like, then the "quirky" update Flo used is absolutely the fastest method available. That method is good for a lot of things like running totals, ranking, and some incredibly odd groupings.

    There are, however, some fairly strict rules to using it if you actually want to control what it does. Those rules and some of the methods are covered by an article (currently being rewritten) that will reappear at the following URL sometime in the next month or so.

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    If I could also make a wee bit of a suggestion... if you want to build a test table for something, there's simply no reason to go through the complexity of building a While Loop nor the wait while it works... not even in SQL Server 2000. To wit, the While Loop in the article takes a little over 7 seconds to build the test table. Using a little set-based programming and a little knowledge of the functions available in SQL Server, it can be done with much less complexity and it executes in about 250 milli-seconds.

    [font="Courier New"]--===== Conditionally drop the test table for test reruns

         IF OBJECT_ID('tempdb..#my_tab'IS NOT NULL

            DROP TABLE #my_tab

    --===== Build the test table and populate it with data on the fly

        SET STATISTICS TIME ON

     SELECT TOP 100000

            CAST(NULL AS INTAS ID,

            DATEADD(mi,-ABS(CHECKSUM(NEWID())%100000),GETDATE()) AS Any_Date

       INTO #My_Tab

       FROM Master.dbo.SysColumns t1 CROSS JOIN Master.dbo.SysColumns t2

        SET STATISTICS TIME OFF[/font]

    I mention this because, frequently, testing against larger numbers of rows is ensential to ensuring performance in the face of scalability. To test against a million rows, the While Loop takes a whopping big 80+ seconds on my box. Worse yet, it also practices you in RBAR thinking instead of set based thinking. The simple set based solution only takes 2.319 seconds to produce the same million row test table. Folks are much more likely to test repeatedly if they can setup test data that quickly.

    Heh... yeah... I know... Someone once told me that this was easy for me but not for them. That's my point... if they keep practicing RBAR methods instead of learning how to do the very high speed set based methods, it will never be easy for them. 😛

    Anyway... good tip, Flo. Keep 'em coming. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff!

    It's not a big deal but if I get one Dollar for every time I showed somebody to remove a cursor I would be a rich man...

    Best wishes

    Flo

  • Florian Reischl (5/17/2009)


    Thanks Jeff!

    It's not a big deal but if I get one Dollar for every time I showed somebody to remove a cursor I would be a rich man...

    Best wishes

    Flo

    Heh... OK... you owe me a Dollar. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Why? :unsure:

    I use this since about 5 years. Long time before I joined SSC 🙂

  • Florian Reischl (5/17/2009)


    Why? :unsure:

    I use this since about 5 years. Long time before I joined SSC 🙂

    Heh... ok... then 5 bucks for posting a cursor solution to a set based problem .:-P

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey Jeff & Flo,

    Just a couple of things... 🙂

    1. This isn't a true 'quirky' update unless there's a clustered index to determine the order of the ID assignment. It is true that the current SQL Server behaviour is always to return rows from heaps in insertion order (at least with a new heap that has never had any update or delete activity on it), but still.

    2. This method runs in less than half the time (2005 only):

    SET NOCOUNT ON

    --===== Conditionally drop the test table for test reruns

    IF OBJECT_ID(N'tempdb..#my_tab', N'U') IS NOT NULL DROP TABLE #my_tab;

    IF OBJECT_ID(N'tempdb..#my_tab2', N'U') IS NOT NULL DROP TABLE #my_tab2;

    --===== Build the test table and populate it with data on the fly

    SELECT TOP 100000

    ISNULL(CAST(0 AS INT), 0) AS ID,

    DATEADD(mi,-ABS(CHECKSUM(NEWID())%100000),GETDATE()) AS Any_Date

    INTO #My_Tab

    FROM Master.dbo.SysColumns t1 CROSS JOIN Master.dbo.SysColumns t2

    DECLARE @now DATETIME;

    SET @now = GETDATE();

    -- Bulk logged *temporary* IDENTITY addition

    SELECT id = IDENTITY(INT, 1000, 1),

    any_date

    INTO #my_tab2

    FROM #my_tab WITH (TABLOCK);

    -- Fast clear of the original table

    TRUNCATE TABLE #my_tab;

    -- Metadata only change

    ALTER TABLE #my_tab2 SWITCH TO #my_tab;

    PRINT 'Duration for 100000 values: ' + CONVERT(VARCHAR(20), DATEDIFF(MILLISECOND, @now, GETDATE()));

    -- Tidy up

    DROP TABLE

    #my_tab, #my_tab2;

    Paul

  • In case you guys think I'm cheating by changing the nullability of the id column, this version is a true replacement, and is just as fast:

    SET NOCOUNT ON;

    -- Conditional drops

    IF OBJECT_ID(N'tempdb..#my_tab', N'U') IS NOT NULL DROP TABLE #my_tab;

    IF OBJECT_ID(N'tempdb..#my_tab', N'U') IS NOT NULL DROP TABLE #my_tab2;

    IF OBJECT_ID(N'tempdb..#my_tab', N'U') IS NOT NULL DROP TABLE #my_tab3;

    -- Sample data

    SELECT TOP (100000)

    CAST(NULL AS INT) AS ID,

    DATEADD(mi,-ABS(CHECKSUM(NEWID())%100000),GETDATE()) AS Any_Date

    INTO #My_Tab

    FROM master.sys.all_columns T1, master.sys.all_columns T2, master.sys.all_columns T3;

    -- Start timing

    DECLARE @now DATETIME;

    SET @now = GETDATE();

    -- Bulk logged identity add (id column is NOT NULL)

    SELECT id = IDENTITY(INT, 1000, 1),

    any_date

    INTO #my_tab2

    FROM #my_tab WITH (TABLOCK);

    -- Intermediate table for nullability change

    CREATE TABLE #my_tab3 (id INT NOT NULL, any_date DATETIME NULL);

    -- Lose the IDENTITY attribute

    ALTER TABLE #my_tab2 SWITCH TO #my_tab3;

    -- Metadata change to id column nullability

    ALTER TABLE #my_tab3 ALTER COLUMN id INT NULL;

    -- Fast clear original table

    TRUNCATE TABLE #my_tab;

    -- Metadata switch data in

    ALTER TABLE #my_tab3 SWITCH TO #my_tab;

    -- Show duration

    PRINT 'Duration for 100000 values: ' + CONVERT(VARCHAR(20), DATEDIFF(MILLISECOND, @now, GETDATE()));

    -- Tidy up

    DROP TABLE

    #my_tab, #my_tab2, #my_tab3;

  • Jeff Moden (5/17/2009)


    .. if, for some reason, you cannot use IDENTITY and you're stuck with SQL Server 2000 or less which prevents the use of ROW_NUMBER and the like, then the "quirky" update Flo used is absolutely the fastest method available.

    Heh :laugh:

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply