Using IDENTITY as a key column

  • Hugo Kornelis (4/20/2010)


    SQL Server will not add this index on its own, so SQL Server has no efficient way to retrieve rows based on a known value for PersonID. We can give SQL Server such an efficient way, but that would require additional DDL.

    That depends on the number of rows in the table.

    I thought we covered this point already 😛

  • Sorry Hugo, but I have to add my voice to the nay-sayers.

    For the first part of the question your answer is based on the fact that by default SQL Server will not allow duplicate IDs, but explicitly a duplicate ID can be created.

    However, for the second part of the question you're arguing things the other way around. By default SQL Server will not add an index, but there is an explicit way to provide an efficient query plan.

    So for the first part you're assuming explicit behaviour but for the second you're assuming implicit behaviour.

    By the way though, I agree with the basic premise of the question. There's many myths around identity fields (can never have duplicates, can never have gaps, ...). My "favourite" that we get tripped on far too often is when developers (especially vendors) don't think about what happens when the maximum field value is reached.

  • Glenn Dorling (4/20/2010)


    Sorry Hugo, but I have to add my voice to the nay-sayers.

    For the first part of the question your answer is based on the fact that by default SQL Server will not allow duplicate IDs, but explicitly a duplicate ID can be created.

    However, for the second part of the question you're arguing things the other way around. By default SQL Server will not add an index, but there is an explicit way to provide an efficient query plan.

    So for the first part you're assuming explicit behaviour but for the second you're assuming implicit behaviour.

    By the way though, I agree with the basic premise of the question. There's many myths around identity fields (can never have duplicates, can never have gaps, ...). My "favourite" that we get tripped on far too often is when developers (especially vendors) don't think about what happens when the maximum field value is reached.

    If multiple people say the same, then there has to some truth in it.

    When I submitted the question, I thought that the words "duplicate values for PersonID might still occur" (not "will occur" or so) would imply that this part of the question is whether it is possible or not, whereas the words "SQL Server does not have an efficient way" (not "might be given an efficient way" or so) would imply that that part of the question is about access based on the table posted.

    Apparently, this was not clear to all. My apologies to all who got confused because of this, and I hope you still learned something from this question.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Glenn Dorling (4/20/2010)


    So for the first part you're assuming explicit behaviour but for the second you're assuming implicit behaviour.

    Glenn, I think that's a stretch, and an artificial distinction to make. The first part was entirely clear and accurate, in my personal view.

  • I'd just like to make clear that although I was (I think) the first person to comment on the answer to the second part, I was reflecting negatively on myself, and not the question (which was fine)!

    If you apply common sense rather than obsessive pedantry* then it's pretty obvious what's meant, isn't it?

    * that's another negative comment against myself, not an attack on others 🙂

  • Toreador (4/21/2010)


    ...although I was (I think) the first person to comment on the answer to the second part...

    Nope! 😀

  • Speaking (typing really) as someone who was initially annoyed that they got the question wrong, I'd like to now say that I'm happy with the overall result.

    I think it's been demonstrated more than a few times (every time?) that the original question is best as a conversation-starter. 🙂

    Thanks first to Hugo and then to the rest!

  • pjdiller (4/21/2010)


    I think it's been demonstrated more than a few times (every time?) that the original question is best as a conversation-starter. 🙂

    Thanks first to Hugo and then to the rest!

    Agree {we need a smiley update, no "thumbs-up" :angry: } Even for question where I get the answer right, going through the conversations make me realize certain intricacies involved that I didn't think through. Thanks to all for those conversations.

  • Hugo Kornelis (4/20/2010)


    If SQL Server had an auto-tune feature that automatically adds indexes it beliefs will benefit performance, than you would be right. But no current version of SQL Server has such a feature (and based on recommendations I sometimes see from the DTA, that might as well be for the best).

    Sometimes SQL Server automatically creates temporary indexes if the optimizer finds out that it will improve query performance.

    Ref: Nonclustered Index Spool Showplan Operator http://msdn.microsoft.com/en-us/library/ms189611.aspx

    Of course, the index spool operation will not happen in the statement 'SELECT * FROM Persons WHERE PersonID = @PersonID', but it may happen in some statement with many JOINs. And of course, it would be less effective than explicit index creation on PersonID and using that index 🙂

  • vk-kirov (4/22/2010)


    Hugo Kornelis (4/20/2010)


    If SQL Server had an auto-tune feature that automatically adds indexes it beliefs will benefit performance, than you would be right. But no current version of SQL Server has such a feature (and based on recommendations I sometimes see from the DTA, that might as well be for the best).

    Sometimes SQL Server automatically creates temporary indexes if the optimizer finds out that it will improve query performance.

    Ref: Nonclustered Index Spool Showplan Operator http://msdn.microsoft.com/en-us/library/ms189611.aspx

    Of course, the index spool operation will not happen in the statement 'SELECT * FROM Persons WHERE PersonID = @PersonID', but it may happen in some statement with many JOINs. And of course, it would be less effective than explicit index creation on PersonID and using that index 🙂

    Thanks, vk-kirov. I was not aware of this operator.

    From the limited information I could find on it, it will be used if the same data is used at least twice in a query, the second needs only a subset of the rows needed for the first (otherwise a normal spool would be used), and the subset is based on a predicate that is sargable and selective enough to earn back the cost of creating the index.

    Do you think you can post a repro script that actually uses this operator? Have you ever seen it used in your code?


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (4/22/2010)


    Thanks, vk-kirov. I was not aware of this operator.

    From the limited information I could find on it, it will be used if the same data is used at least twice in a query, the second needs only a subset of the rows needed for the first (otherwise a normal spool would be used), and the subset is based on a predicate that is sargable and selective enough to earn back the cost of creating the index.

    Do you think you can post a repro script that actually uses this operator? Have you ever seen it used in your code?

    You can generate an Index Spool easily with an unindexed Tally table:

    SELECT N

    INTO #Tally

    FROM dbo.Numbers (50000);

    GO

    SELECT COUNT(*)

    FROM #Tally T1

    JOIN #Tally T2

    ON T2.N < T1.N;

    That uses an in-line TVF I keep handy to generate a temporary tally heap - you can use any method that generates a similar object.

    Index spools are not all that rare, but the estimated cost of spooling the rows to a worktable, creating the index, and dropping it afterward is quite high, so the optimiser will often choose a different approach, purely on cost grounds (usually a hash join - the example above avoids that by joining on an inequality only).

    There are two types of index spool - eager and lazy. The above example generates an eager spool - one that consumes all rows from its input before it can be used to seek with.

    A lazy index spool is a bit more like a regular table spool - except it never truncates its worktable (a normal spool only ever caches one result - it truncates its worktable on every rebind).

    A lazy index spool accumulates results on every rebind (when the correlated parameter changes). Personally, I'd love a way to be able to hint the use of lazy index spools, but that's another story.

    Interestingly, you cannot trust the reported numbers for rebinds and rewinds for an index spool - the internals report a rewind to anything other than the last value seen as a rebind rather than the rewind it actually is. Sad, but true.

  • Hugo Kornelis (4/22/2010)


    Do you think you can post a repro script that actually uses this operator? Have you ever seen it used in your code?

    Yes, sometimes I see this operator when I profile and optimize SQL queries. When I wrote that post, I was optimizing a long-running stored procedure, and there even were both lazy and eager index spools in one query. I tried to make a simplified example (the original query contained 3 tables, 1 view, and 2 subqueries), but all what I got was either hash join or table spool 🙂

    Thanks to Paul for posting a simple example!

  • I am really surprised,

    I thought I was a newbie but now i have to call myself lower than newbie,

    Can you pls give me an instance/how the identity column can have duplicate.

    Thank you

  • Pooja-462303 (4/29/2010)


    Can you pls give me an instance/how the identity column can have duplicate.

    Sure!

    Using SET IDENTITY_INSERT ON:

    CREATE TABLE Demo1

    (Id int NOT NULL IDENTITY,

    Other varchar(20)

    );

    INSERT INTO Demo1 (Other)

    VALUES ('First');

    SET IDENTITY_INSERT Demo1 ON;

    INSERT INTO Demo1 (Id, Other)

    VALUES (1, 'Another first?');

    SELECT * FROM Demo1;

    DROP TABLE Demo1;

    Using DBCC CHECKIDENT with RESEED:

    CREATE TABLE Demo2

    (Id int NOT NULL IDENTITY,

    Other varchar(20)

    );

    INSERT INTO Demo2 (Other)

    VALUES ('First');

    DBCC CHECKIDENT (Demo2, RESEED, 0);

    INSERT INTO Demo2 (Other)

    VALUES ('Another first?');

    SELECT * FROM Demo2;

    DROP TABLE Demo2;


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • An iffy question IMHO. You need to make certain assumptions to come to the right/wrong answer.

    Are the points being returned? 🙂

    "Keep Trying"

Viewing 15 posts - 46 through 60 (of 66 total)

You must be logged in to reply to this topic. Login to reply