Reusing Identities

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/dPriyankara/reuseidentities.asp

  • I don't think your second method is correct. You are looking for the lowest number in the first gap, correct? In that case you'll either have to add an ORDER BY clause to your TOP query, or use "SELECT Min(intNumber)" instead of the TOP clause. The point is that the only way to guarantee that results are ordered - and thus to guarantee that the TOP 1 clause delivers the lowest gap - is to add an order by clause.

    Of course, in practice, due to the clustered primary key on the Numbers table you might well witness the correct result every time you try it without the ORDER BY, but there's no certainty. This behaviour might change with differences in the clustered index on the other table, differences in paralellism, or changes in the underlying db engine in future SQL Server versions.

    I would either rewrite your method as:

    SELECT TOP 1 intNumber

    FROM OrderHeader

    RIGHT OUTER JOIN tb_Numbers

    ON tb_Numbers.intNumber = OrderHeader.intID

    WHERE intID IS NULL AND intNumber < = (SELECT MAX(intID) FROM OrderHeader)

    ORDER BY tb_Numbers.intNumber

    - or -

    SELECT Min(intNumber)

    FROM OrderHeader

    RIGHT OUTER JOIN tb_Numbers

    ON tb_Numbers.intNumber = OrderHeader.intID

    WHERE intID IS NULL AND intNumber < = (SELECT MAX(intID) FROM OrderHeader)

    It would be interesting to see if the differences in performance with these (IMHO) correct methods are still as large, and which of these two alternative methods would be the fastest.

    Jeremy

  • Hi Jeremy,

    I agree with you if the OrderHeader table has no index created with IDENTITY column. But as a general rule, we always create clustered or nonclustered unique index.

    If no indexes available, Of course you have to use one of your statements. Then there will be slight performance problems but still the method is the winner.

    Dinesh Priyankara

    mcp mcse mcsd mcdba

  • First: With a Clustered PK the data is *always* returned in the natual table order, so order by is not necessary. Second: why do we need to get the min occurence, all we really care about is a missing ident right? so just give me the first ident and I will be happy.


    Michael R. Schmidt
    Developer

  • Hi Mike,

    Actually what we want is, reuse removed identities. So, whatever the method, whether TOP 1 or MIN, you get desired result. But if you consider about performance, TOP 1 is better.

    [font=Verdana]Dinesh Priyankara[font=Verdana]

    mcp mcse mcsd mcdba

  • In a high volume environment, wouldnt you run into issues with two users trying to use (re-use) the same key?

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • Hi Andy,

    Why can’t we run this insertion process within a transaction?

    Dinesh Priyankara

    Edited by - Dinesh on 03/06/2003 9:17:45 PM

  • You can of course, and that is the right answer. Downside will be (or may be anyway!) blocking.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • Dinesh wrote:

    As per the output, there are 20086 logical reads and it has taken 200 ms for the first method. But in second method there are only 19 logical reads and the execution time is less considerable.

    That’s why I selected to continue in my way. But there may be a side that I have not seen but you can see. So, try on this and see whether how this T-SQL solution suit for you.

    Actually, I think that you are not considering two things:

    1. The overhead of creating a table of 30000 rows is much greater than the difference between the two queries. If you prefer to store this table permanently, you have to consider that it's size will be added to each backup and that you need to adjust the number of rows as more data will be added in the main table.

    2. The example from Books Online doesn't assume that the seed and the interval are 1, instead it determines them and this is the main cause of the many logical reads. Try:

    SELECT MIN(IDENTITYCOL) + 1 as NextIdentityValue

    FROM OrderHeader t1

    WHERE IDENTITYCOL BETWEEN 1 AND 32766

         AND NOT EXISTS (SELECT * FROM OrderHeader t2

            WHERE t2.IDENTITYCOL = t1.IDENTITYCOL + 1)

    You will get only about 120 logical reads (instead of about 20000) and the execution time is much better.

    Razvan Socol<O>

    </O>

  • I am replying after a looooooooooooong time

    On the first method the author is using a not so optimized query.  He is using different methods in each queries (sub query vs outer join and Min vs top 1).  More than that,  Should we declare a variable, get the result into the variable and select the variable?

    I tried with Left outer join and the results are really better.  (I tried with both statistics time and statistics IO ON)

    Select top 1 t1.IDENTITYCOL + IDENT_INCR('OrderHeader') As NextIdentityValue

    from OrderHeader t1 Left Outer join OrderHeader t2

    ON T1.IDENTITYCOL+IDENT_INCR('OrderHeader')=t2.IDENTITYCOL

    Where t2.IDENTITYCOL is null

    This is the result:

    SQL Server Execution Times:

       CPU time = 0 ms,  elapsed time = 0 ms.

    SQL Server Execution Times:

       CPU time = 0 ms,  elapsed time = 0 ms.

    SQL Server parse and compile time:

       CPU time = 0 ms, elapsed time = 2 ms.

    2210

    Table 'OrderHeader'. Scan count 2210, logical reads 4450, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times:

       CPU time = 26 ms,  elapsed time = 26 ms.

    SQL Server parse and compile time:

       CPU time = 0 ms, elapsed time = 0 ms.

    Later I introduced a variable to get the Indentity increment first the the results are even better

    Declare @Increment int

    Select @Increment= IDENT_INCR('OrderHeader')

    Select top 1 t1.IDENTITYCOL + @Increment As NextIdentityValue

    from OrderHeader t1 Left Outer join OrderHeader t2

    ON T1.IDENTITYCOL + @Increment = t2.IDENTITYCOL

    Where t2.IDENTITYCOL is null

    The results are slightly better:

    SQL Server Execution Times:

       CPU time = 0 ms,  elapsed time = 0 ms.

    SQL Server Execution Times:

       CPU time = 0 ms,  elapsed time = 0 ms.

    SQL Server parse and compile time:

       CPU time = 0 ms, elapsed time = 2 ms.

    SQL Server Execution Times:

       CPU time = 0 ms,  elapsed time = 0 ms.

    2210

    Table 'OrderHeader'. Scan count 2136, logical reads 4439, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times:

       CPU time = 7 ms,  elapsed time = 7 ms.

    SQL Server parse and compile time:

       CPU time = 0 ms, elapsed time = 0 ms.

    Finally,  I changed the min into top 1.  the results are almost to my previous results:

    SELECT top 1 IDENTITYCOL + IDENT_INCR('OrderHeader') As NextIdentityValue

    FROM OrderHeader t1

    WHERE IDENTITYCOL BETWEEN IDENT_SEED('OrderHeader') AND 32766

        AND NOT EXISTS (SELECT * FROM OrderHeader t2

            WHERE t2.IDENTITYCOL = t1.IDENTITYCOL + IDENT_INCR('OrderHeader'))

    SQL Server Execution Times:

       CPU time = 0 ms,  elapsed time = 0 ms.

    SQL Server Execution Times:

       CPU time = 0 ms,  elapsed time = 0 ms.

    SQL Server parse and compile time:

       CPU time = 3 ms, elapsed time = 3 ms.

    2210

    Table 'OrderHeader'. Scan count 2210, logical reads 4440, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times:

       CPU time = 15 ms,  elapsed time = 26 ms.

    SQL Server parse and compile time:

       CPU time = 0 ms, elapsed time = 0 ms.

     

     

    Cheers,
    Prithiviraj Kulasingham

    http://preethiviraj.blogspot.com/

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply