Using IDENTITY as a key column

Question

Using IDENTITY as a key column

Hugo Kornelis

SSC Guru

Points: 64780
More actions
April 19, 2010 at 8:10 pm

#220625

Comments posted to this topic are about the item Using IDENTITY as a key column
Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 15 posts - 1 through 15 (of 66 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150467 More actions · Answer 1

First correct answer! Yay for me 😉

I liked this question - especially the first part about IDENTITY allowing duplicates, which I am sure will surprise many people.

I am less sure about the wording in the second part of the question, "SQL Server does not have an efficient way to retrieve rows based on a known value for PersonID." because that rather depends on the number of rows in the table.

I don't think it would have been giving too much away to say something like "SQL Server cannot use an index to locate rows based on a known value for PersonID".

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

SQLRNNR SSC Guru Points: 281334 More actions · Answer 2

Nice question Hugo.

I am with Paul on this one - the wording on the second part is a little iffy. It can still be answered correctly. I just had to think about it for a second.

So far, the answer rate for this question is higher than your previous questions too (50%).

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Deepak Sharma-2311 SSC Eights! Points: 993 More actions · Answer 3

I am agree with the first statement but the second statement "SQL Server does not have an efficient way to retrieve rows based on a known value for PersonID." confusing me.

Deepak Kumar Sharma

SQLRNNR SSC Guru Points: 281334 More actions · Answer 4

Deepak Sharma-752112 (4/19/2010)
I am agree with the first statement but the second statement "SQL Server does not have an efficient way to retrieve rows based on a known value for PersonID." confusing me.

It is in reference to the default of SQL Server creating an index on that column. For Primary key columns, an index is auto-generated - but that is not true of an identity column, unless the identity column is a part of a Primary key.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Paul White SSC Guru Points: 150467 More actions · Answer 5

CirquedeSQLeil (4/19/2010)
...but that is not true of an identity column, unless the identity column is a part of a Primary key.

Or a UNIQUE constraint, as the QotD explanation says 😉

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

SQLRNNR SSC Guru Points: 281334 More actions · Answer 6

Paul White NZ (4/20/2010)
CirquedeSQLeil (4/19/2010)
...but that is not true of an identity column, unless the identity column is a part of a Primary key.
Or a UNIQUE constraint, as the QotD explanation says 😉

Thanks for tidying that up:-D

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Paul White SSC Guru Points: 150467 More actions · Answer 7

CirquedeSQLeil (4/20/2010)
Thanks for tidying that up:-D

You can always rely on me 😉

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Jessica_ST SSC Eights! Points: 892 More actions · Answer 8

Hang on a minute ...

"Even though the IDENTITY property is intended to be used for key columns, SQL Server will not automatically generate a constraint to enforce the uniqueness, so duplicate values (caused by the methods above) will not be banned from the table. SQL Server will also not automatically create an index to speed up access based on the IDENTITY value (though it will do so when you add the PRIMARY KEY or UNIQUE constraint to enforce uniqueness of the IDENTITY column, as it does for each PRIMARY KEY or UNIQUE constraint). "

the question asked if SQL server had an efficient way to query records, not if it would use that method automatically!

Assumption is the mother of all F***ups

tommyh SSCertifiable Points: 6302 More actions · Answer 9

Even though i actually got this right the wording of the question aint that good. And that could be part of the reason why the correct/incorrect ration is out off wack (81% wrong at the time).

The heading said "Using IDENTITY as a key column". Key part there being "key". Now combined with the fact that the create table statement was incomplete "-- other columns);". Got me wondering if the author had a missing "primary key (PersonID)" at the end (or something similar). Or that is was implied that there would be code to create a key. Which of course would have inverted the answer.

Though this could be a result from lack of english skills on my part.

Though i must say i liked the question since i got it right 😀

dimitri.decoene-1027745 SSC Veteran Points: 284 More actions · Answer 10

Although I knew of the possibility to reseed and explicitly adding identity values to a table, I never considered the fact that it made it possible to add non-unique values for the identity column within one table.

Moreover, a part of the microsoft online help was a bit misleading on the subject:

This is because the IDENTITY property is guaranteed to be unique only for the table on which it is used.

How should I interpret "identity property" in this sentence?

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 11

Thanks for the feedback, to all involved in this discussion so far.

I appreciate all the kind words and compliments. And I'l try to address the questions and criticism in the rest of this post.

Paul White NZ (4/19/2010)
I am less sure about the wording in the second part of the question, "SQL Server does not have an efficient way to retrieve rows based on a known value for PersonID." because that rather depends on the number of rows in the table.
I don't think it would have been giving too much away to say something like "SQL Server cannot use an index to locate rows based on a known value for PersonID".

I wanted to disagree with this at first, but since several other people have posted similar comments, I'll have to accept that, obviously, this was not clear enough.

However, I don't understand the argument that this depends on the number of rows in the table. Can you explain for what number of rows a SELECT based on WHERE PersonID = @PersonID will be faster without an index on PersonID, and why?

ST_John (4/20/2010)
Hang on a minute ...
"Even though the IDENTITY property is intended to be used for key columns, SQL Server will not automatically generate a constraint to enforce the uniqueness, so duplicate values (caused by the methods above) will not be banned from the table. SQL Server will also not automatically create an index to speed up access based on the IDENTITY value (though it will do so when you add the PRIMARY KEY or UNIQUE constraint to enforce uniqueness of the IDENTITY column, as it does for each PRIMARY KEY or UNIQUE constraint). "
the question asked if SQL server had an efficient way to query records, not if it would use that method automatically!

Everything you write is true, but I don't really see the point you try to make. Since the CREATE TABLE statement as given will not create any indexes, there is no efficient access path to query records based on known values for PersonID; only an index will enable those. As long as it's impossible to query efficiently, the question of whether the efficient way will indeed be used is immaterial.

tommyh (4/20/2010)
The heading said "Using IDENTITY as a key column". Key part there being "key". Now combined with the fact that the create table statement was incomplete "-- other columns);". Got me wondering if the author had a missing "primary key (PersonID)" at the end (or something similar). Or that is was implied that there would be code to create a key. Which of course would have inverted the answer.

The comment said "-- (other columns)", not "- (other columns and constraints)". Assuming that I meant to include constraints when I wrote columns is quite a long stretch.

The reaason I chose the title "Using IDENTITY as a key column" is because this is exactly the pattern I do sometimes see - tables with an IDENTITY attribute, with queries that use this column as if it were the key, but no constraints to enforce that key. I guess I could also have chosen the title "Using IDENTITY as a key column without enforcing its uniqueness", but I'm afraid that would have given the question away. Annyway, I'm sorry if the chosen title confused you. My intention was to confuse people with believable but incorrect answer options, and nothing else 😀

dimitri.decoene-1027745 (4/20/2010)
Moreover, a part of the microsoft online help was a bit misleading on the subject:
This is because the IDENTITY property is guaranteed to be unique only for the table on which it is used.
How should I interpret "identity property" in this sentence?

To answer the actual question first - you can interpret it exactly as you did; it refers to the property that is assigned by adding "IDENTITY" to a column in a CREATE or ALTER TABLE statement.

I had to google for the page where you took the quote from to see it in its context. I found it at http://msdn.microsoft.com/en-us/library/ms191131.aspx; if you found it somewhere else please give me a link and I'll review it.

I would not go as far as to say that the text there is incorrect, but it is incomplete and might cause confusion. What the IDENTITY property does give you is a "limited" uniqueness guarantee. Values generated by the IDENTITY property will be unique, as long as you never tamper with them. The two ways of tampering I know of are IDENTITY INSERT and DBCC CHECKIDENT ... RESEED (therer might be more, so please don't take this as an authorative complete list). As long as you never tamper with the IDENTITY values in any way, you can rely on the uniqueness of generated IDENTITY values. At least, as long as you limit your scope to a siingle table; as soon as multiple tables are involved, IDENTITY values will no longer be unique - and that last sentence is what the quoted fragment from the online help is trying to tell you.

I hope this clarifies the confusion.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

dimitri.decoene-1027745 SSC Veteran Points: 284 More actions · Answer 12

Thank you for the reply Hugo, that article was indeed the source of the quote.

Abrar Ahmad_ SSCarpal Tunnel Points: 4239 More actions · Answer 13

CirquedeSQLeil (4/19/2010)
It is in reference to the default of SQL Server creating an index on that column. For Primary key columns, an index is auto-generated - but that is not true of an identity column, unless the identity column is a part of a Primary key.

It was the new thing to learn, but at the cost of loss of 2 valuable points. 😎

bad luck... ! & an average question... !

Thanks

Paul White SSC Guru Points: 150467 More actions · Answer 14

Hugo Kornelis (4/20/2010)
I wanted to disagree with this at first, but since several other people have posted similar comments, I'll have to accept that, obviously, this was not clear enough.

Clear enough, possibly...I did after all understand what you were getting at...I just felt the wording could have been improved, and suggested an alternative.

However, I don't understand the argument that this depends on the number of rows in the table. Can you explain for what number of rows a SELECT based on WHERE PersonID = @PersonID will be faster without an index on PersonID, and why?

My difficulty is with the phrase "SQL Server does not have an efficient way to retrieve rows..." - I never said it could be faster without an index (though it could).

Including the trivial case where the table has one row, if the rows in the table fit on a single page, SQL Server has a perfectly 'efficient' way to retrieve the rows. That is why I said it depends on the number of rows.

I suppose I could argue that if the table were filled with very many rows having the same value for PersonID, an IAM-ordered scan of the heap might be more efficient than a partial scan of a very fragmented index, but that was not my original point...;-)

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi