How to maintain an incrementing row id

  • This problem is so fundamental, I'm hesitant to ask it, but here goes.

    I have a table with a primary key that consists of two fields. The first key field (sessionID) will have a set of rows. The second key field (logID) is simply the row identifier for that set. I could have made logID an identity field, but there is some utility in each set of sessionIDs having logIDs that begin with 1.

    This table has a high volume of inserts. What is a best practices way of determining the next logID for a set and inserting a new row, without

    1) risking a key violation from a competing transaction?

    2) risking deadlocks?

    3) and minimizing locking overhead

    Sql Server 2005

    The transaction isolation level is Read Committed

    The code to create the table is:

    create table eventLog (sessionID int, logID int, logText varchar(1000), created datetime)

    create unique clustered index eventLog_pk on eventLog (sessionID, logID)

    The solution below was attempted but recently failed with a key violation. It was the only time the error has been encountered in millions of updates though. I believe the code below failed because the isolation level Read Committed does not protect from phantom rows. I considered using table hint Serializable in the subquery, but felt someone would have a better answer.

    declare

    @attempts int,

    @sessionID int,

    @logID int

    -- in practice this would be passed in as a parameter

    select @sessionID = 1, @logText = 'test'

    select @attempts = 0, @logID = isNull(max(logID),0)+1

    from eventLog

    where sessionID = @sessionID

    while @attempts < 5

    begin

    select @attempts = @attempts + 1

    insert into eventLog (sessionID, logID)

    select @sessionId, @logID

    where not exists

    (select 1 from eventLog

    where sessionID = @sessionID

    and logID = @logID)

    if @@rowcount < 1

    select @logID = @logID + 1

    else break

    end

  • Here's something I came up with. Not sure it's bullet-proof, but maybe it will give you

    some ideas. Sorry that it's a bit of a kludge and not very elegant... :crazy:

    First some sample data:

    IF OBJECT_ID('dbo.eventLog') IS NOT NULL

    DROP TABLE dbo.eventLog

    CREATE TABLE dbo.eventLog

    (

    sessionID INT NOT NULL

    ,logID TIMESTAMP NOT NULL

    ,logText VARCHAR(1000) NULL

    ,created DATETIME NULL

    ,PRIMARY KEY CLUSTERED (sessionID, logID)

    ,UNIQUE (sessionID)

    )

    ;WITH cteSampleData (sessionID,logText,created)

    AS

    (

    SELECT 1001,'George Washington','2013-03-31' UNION ALL

    SELECT 1002,'John Adams','2013-02-28' UNION ALL

    SELECT 1003,'Thomas Jefferson','2013-02-15' UNION ALL

    SELECT 1004,'James Madison','2013-02-01' UNION ALL

    SELECT 1005,'James Monroe','2013-01-31' UNION ALL

    SELECT 1006,'John Q Adams','2013-01-22' UNION ALL

    SELECT 1007,'Andrew Jackson','2013-01-13'

    )

    INSERT INTO dbo.eventLog

    (sessionID,logText,created)

    SELECT

    sessionID

    ,logText

    ,created

    FROM

    cteSampleData

    SELECT

    *

    FROM

    dbo.eventLog

    Now of course the following insert will fail (just to set up a base case)

    ;WITH cteNewData (sessionID,logText,created)

    AS

    (

    SELECT 1001,'Ronald Reagan','2013-05-31'

    )

    INSERT INTO dbo.eventLog

    (sessionID,logText,created)

    SELECT

    sessionID

    ,logText

    ,created

    FROM

    cteNewData

    Msg 2627, Level 14, State 1, Line 1

    Violation of UNIQUE KEY constraint 'UQ__eventLog__23DB12CA731B1205'. Cannot insert duplicate key in object 'dbo.eventLog'. The duplicate key value is (1001).

    The statement has been terminated.

    Now the rest of the code below takes some explaining.

    The first block of code will do an insert. But FOR TESTING I've put in a

    wait state so that you can flip over to another window and insert a

    DIFFERENT row while the first is waiting. This is to simulate what might

    happen if the next ID has been assigned but before the insert completes

    a second insert attempt trying to use the same value.

    So this code will INSERT a new row and "hold" that value even if another

    insert takes place in the interim. It does this by inserting only the sessioID

    and then coming back and doing an update even if another row has been

    inserted behind it. To see how this works, before running the code, create another

    window with the second insert command. Then run the first script and while it's waiting

    flip over and insert a record (or two or more) from the second page. After the wait state

    expires (I made it 15 secs for testing) you can go back to the first page and

    see that the first row has been updated properly. The only minor anomaly I

    see is that the timestamp column value may not be in order.

    WINDOW 1:

    DECLARE

    @NextID INT

    BEGIN

    -- get the next id

    ;WITH NextID(sessionID)

    AS

    (

    SELECT

    MAX(sessionID)+1

    FROM dbo.eventLog

    )

    INSERT INTO dbo.EventLog(sessionID)

    SELECT

    sessionID

    FROM

    NextID AS n

    SELECT TOP(1)

    @NextID = SessionID

    FROM

    dbo.EventLog

    WHERE

    logText IS NULL

    ORDER BY

    logText

    --for testing

    SELECT @NextID AS NextID

    /* Put in a wait state for testing only */

    WAITFOR DELAY '000:00:15' -- 15 sec delay for TESTING ONLY!

    --WHILE WAITING, GO TO SEPARATE WINDOW AND RUN AN INSERT--

    UPDATE dbo.eventLog

    SET

    logText = 'Ronald Reagan'

    ,created = GETDATE()

    WHERE

    sessionID = @NextID

    END

    SELECT

    *

    FROM

    dbo.eventLog

    WINDOW 2:

    /* Simulate a second simultaneous insert */

    /* while waiting for the first insert */

    DECLARE @NewID INT

    SELECT

    @NewID = MAX(sessionID)+1

    FROM

    dbo.eventLog

    ;WITH cteNewData (sessionID,logText,created)

    AS

    (

    SELECT @NewID,'Bill Clinton','2013-01-31'

    )

    INSERT INTO dbo.eventLog

    (sessionID,logText,created)

    SELECT

    nd.sessionID

    ,nd.logText

    ,nd.created

    FROM

    cteNewData AS nd

    SELECT

    *

    FROM

    dbo.eventLog

     

  • Somebody's probably going to shoot me for suggesting it but (utilizing Steve's wonderful DDL and sample data), perhaps something like this:

    CREATE TABLE #eventLog

    (

    sessionID INT NOT NULL

    ,logID TIMESTAMP NOT NULL

    ,logText VARCHAR(1000) NULL

    ,created DATETIME NULL

    ,PRIMARY KEY CLUSTERED (sessionID, logID)

    ,UNIQUE (sessionID)

    )

    ;WITH cteSampleData (sessionID,logText,created)

    AS

    (

    SELECT 1001,'George Washington','2013-03-31' UNION ALL

    SELECT 1002,'John Adams','2013-02-28' UNION ALL

    SELECT 1003,'Thomas Jefferson','2013-02-15' UNION ALL

    SELECT 1004,'James Madison','2013-02-01' UNION ALL

    SELECT 1005,'James Monroe','2013-01-31' UNION ALL

    SELECT 1006,'John Q Adams','2013-01-22' UNION ALL

    SELECT 1007,'Andrew Jackson','2013-01-13'

    )

    INSERT INTO #eventLog

    (sessionID,logText,created)

    SELECT

    sessionID

    ,logText

    ,created

    FROM

    cteSampleData

    SELECT

    *

    FROM

    #eventLog;

    WITH LastSessionID AS (

    SELECT LastSessionID=MAX(sessionID) FROM #eventLog WITH(UPDLOCK))

    INSERT INTO #eventLog (sessionID,logText,created)

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) + LastSessionID

    ,logText,created

    FROM (

    -- Your newest inserts go here

    SELECT 'Babe Ruth', '2013-01-20'

    UNION ALL SELECT 'Ty Cobb', '2013-01-21'

    ) a (logText,created)

    CROSS APPLY LastSessionID b

    SELECT *

    FROM #eventLog

    GO

    DROP TABLE #eventLog

    Table name changed to a temp table to protect the innocent.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Steven - BTW.

    If you do it the way you suggested, which is to break it into multiple SQL statements, you should consider 2 things:

    1. Wrap the SQL statements in a TRANSACTION.

    2. Add the UPDLOCK hint on the SELECT MAX (as I did in my example)

    Then in your 2 windows test case, you don't need to introduce the WAIT. You can simply execute the first few statements just prior to the INSERT, switch windows and execute the other SQL (which will hold until you go back to the prior window and execute the final statements up to COMMIT TRANSACTION.

    UPDLOCK holds the locked table until the transaction completes. In my case, since it wasn't in a transaction the lock is freed once the single statement completes.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (7/12/2013)


    Steven - BTW.

    If you do it the way you suggested, which is to break it into multiple SQL statements, you should consider 2 things:

    1. Wrap the SQL statements in a TRANSACTION.

    2. Add the UPDLOCK hint on the SELECT MAX (as I did in my example)

    Then in your 2 windows test case, you don't need to introduce the WAIT. You can simply execute the first few statements just prior to the INSERT, switch windows and execute the other SQL (which will hold until you go back to the prior window and execute the final statements up to COMMIT TRANSACTION.

    UPDLOCK holds the locked table until the transaction completes. In my case, since it wasn't in a transaction the lock is freed once the single statement completes.

    Yeah, in an actual case I'd have added a TRAN block with UPDLOCK and I did that at first. But I wanted instant gratification watching the WINDOW 2 rows get inserted then flip back to WINDOW 1 and see the first row get updated after the delay. I know that it was a contrived simulation.

    If this is representing a high-volume situation I'm not sure what effect the UPDLOCK might have on concurrency. And I'm not sure my method would handle more than two simultaneous insert attempts or what would happen if a duplicate PK still somehow caused an error. I had in an early version a TRY/CATCH block that would keep trying until a valid PK sessionID could be inserted, but it was late and I was really guessing at what the OPs actual requirement was. Seems to me that some sort of additional identity column would be a much easier and more reliable fix.

     

  • Thank you both for your responses. It seems the answer will involve explicit transaction and explicit locks. I was hoping to avoid that. I know there is a proper time to use explicit transactions, but I have also seen a lot of problems introduced (deadlocks, and performance issues). Similarly with optimizer hints... if I have to resort to them, I suspect there is some other design problem.

    With locking hints, I've spent time studying the documentation, and reading blogs, forums, etc, and admit I lack confidence that I (and many others) understand completely what the effect of each hint is. Which again leads me to avoid use when I can.

    The great examples you provided use UPDLOCK. It's not clear to me this would solve the problem because this doesn't prevent other transactions from reading the rows to calculate the next id (which could result in two transactions trying to use the same ID). You may be right, but it seems to me you would need the XLOCK with a PAGLOCK to prevent other transactions from reading the rows until you can insert the desired row. But now it seems I'm really trying too hard.

    And see this for example of how documented behavior isn't always complete:

    http://support.microsoft.com/kb/324417

    I plan on re-writing the table to use an Identity column, and avoid all the complexity.

    Thanks again, and please comment on the lock discussion if you have some clarification!

  • Update: I tried several different combinations of locks, including (updlock) (xlock paglock) (holdlock). The only lock that prevented a second transaction from potentially inserting a key violation was (tablock). Please let me know if anyone sees something I'm missing.

    ( BTW, how do you create those windows of code in the forum?)

    To create the table:

    CREATE TABLE [dbo].[EventLog](

    [sessionID] [int] NOT NULL,

    [logID] [int] NOT NULL,

    [logText] [varchar](50) NULL,

    [created] [datetime] NULL,

    CONSTRAINT [PK_EventLog] PRIMARY KEY CLUSTERED

    (

    [sessionID] ASC,

    [logID] ASC

    )

    -- in window 1 -----------------------------------

    declare

    @nextLogID int,

    @sessionID int,

    @logText varchar(50)

    truncate table eventLog

    select @sessionID = 1, @logText = 'this is a note #1'

    begin tran

    select @nextLogID = isNull(max(logID),0)+1

    from eventLog with (xlock tablock) -- modify the lock top to try variations.

    where sessionID = @sessionID

    waitfor delay '00:00:05'

    insert into eventLog (sessionID, logID, logText)

    values (@sessionID, @nextLogID, @logText)

    commit tran

    select * from eventLog

    -- in window 2, run while window 1 is still waiting --------------

    declare

    @nextLogID int,

    @sessionID int,

    @logText varchar(50)

    select @sessionID = 1, @logText = 'this is a note #2'

    begin tran

    select @nextLogID = isNull(max(logID),0)+1

    from eventLog with (xlock tablock)

    where sessionID = @sessionID

    insert into eventLog (sessionID, logID, logText)

    values (@sessionID, @nextLogID, @logText)

    commit tran

    select * from eventLog

  • rice.tx (7/14/2013)


    Thank you both for your responses. It seems the answer will involve explicit transaction and explicit locks. I was hoping to avoid that. I know there is a proper time to use explicit transactions, but I have also seen a lot of problems introduced (deadlocks, and performance issues). Similarly with optimizer hints... if I have to resort to them, I suspect there is some other design problem.

    ...

    The great examples you provided use UPDLOCK. It's not clear to me this would solve the problem because this doesn't prevent other transactions from reading the rows to calculate the next id (which could result in two transactions trying to use the same ID). You may be right, but it seems to me you would need the XLOCK with a PAGLOCK to prevent other transactions from reading the rows until you can insert the desired row. But now it seems I'm really trying too hard.

    ...

    If you want a 100% certainty that there will not be a key violation, you must use an IDENTITY column (but presumably you know that and have some reason why you think you cannot).

    While there may be edge cases where the locks don't work, I can say my suggestion to use UPDLOCK was more than theoretical. We've used it quite successfully in a very similar case and from what I can tell from the documentation (and actually trying it with two SSMS sessions) that it does exactly what it was designed to do. And the system that uses this is a relatively high volume one.

    I still recommend that since it is possible to combine the SELECT MAX and the INSERT, that you do so in order to avoid latency between queries which might contribute to deadlock.

    And the only reason I've seen where a TRANSACTION might cause unwarranted deadlocks is the case where you've got some inefficient query running within the TRANSACTION (in which case try to improve that query), or you're not following the best practice which is to minimize the amount of work being done within the TRANSACTION.

    For a somewhat contrived example, I've seen TRANSACTIONs that perform SELECTs within their scope that have nothing to do with the data being updated. Those should be removed from the TRANSACTION to minimize the overall processing time.

    Good luck!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply