sqlenthu 89358 wrote:
There's different school of thoughts over whether the identity column should be a candidate of clustered index or not (mostly in favour of having it as clustered index).
What's your generic thought of the same @jeff ?
I'll repeat the only generic truth there is in SQL Server... "It Depends".
I first have to agree with a lot of people, including our own Scott Pletcher, that's it's a bit crazy to just up and add a numeric "ID" column to EVERY table.
A good example of a table that doesn't need one is a "State" table. You only need the abbreviation and the state name. Technically, I wouldn't do that either because ISO 3166-2 lists 57 "subdivisions" of the U.S.A. as 50 States, 1 District, and 6 "Outlying areas". The abbreviation for those 6 outlying areas is also the "country code" for the "subdivisions". And, the abbreviations are all unique which qualifies them as a non-numeric code/"Natural Key" that can and should be used as both the PK and the Clustered Index.
It would also be a bit ridiculous to add a numeric "ID" to a table that held the NPA (Area Code) and NXX (Exchange) for telephone numbers in the North American Numbering Plan for Canada, the U.S.A, DC, and the "territories" (outlying areas according to ISO) for telephone numbers.
There are a whole lot of places where it just doesn't make sense to add a numeric (or other) "Identifier" to a table because of some very well though out natural keys.
On the other hand, things like Employee and Customer tables are good places for numeric identifiers because the natural keys would be just to wide to include FirstName, LastName, MiddleName, and something else to break the ties. Even SSNs wouldn't make a good key because they're not actually unique in some places where some pretty nasty mistakes were made. And, just to say it out loud, they should never be "in clear text". That means they need to be encrypted with a "salt", etc, and that's going to turn them into a pretty wide (a 32 byte encryption is the minimum suggested by MS and I don't care for minimums in such cases and would use 64 bytes).
As for when to also make such a numeric identifier the Clustered Index Key, that seriously depends and there are a whole lot of opinions on the subject such as to put it on the columns most used for queries. That can actually be an extraordinarily good idea or, just as extraordinarily, a bad idea typically depending on the nature of most of the queries (you can't rely on just one or a couple just because they're the most used although I've frequently used a combination of date and a "Identity" column) including the number of columns to be returned for any query, the relative size of each row, and a whole bunch more. The "sort order" of the most queries may come into play but a lot of people over-play that without considering page splits and the massive blocking that may occur during INSERTs and UPDATEs, especially when "ExpAnsive" updates (which includes expansion of certain previous fixed with datatypes such as NULL dates that are later populated by an UPDATE) come into play.
Even how "proper" you want your "normalization" to be and the subject of "alternate keys" can come into play but we'll leave those out of this conversation.
And then there are "numeric" Identifiers such as Random GUIDs. I've proven that fragmentation isn't really a problem with those by learning HOW to maintain them correctly when they're the Clustered Index BUT... should they be used as the Clustered Index? The answer is frequently "Yes" (despite what some claim) but, just as frequently, the answer is "No" for the same reason that other numeric "Identifiers" might not be.
On top of all of that, you have to remember the purpose of non-Clustered Indexes and other things like why adding INCLUDE columns was both a miracle breakthrough and the horror they can cause as well as the fact that the keys for the Clustered Index are added to every non-clustered index and whether or not to explicitly name them to make the non-clustered index unique or not.
The bottom line is there is no panacea for when numeric "Identifiers" that are being used as the PK should also be the Clustered Index or not. "It Depends" and "Must Look Eye".
As to my "generic" thought on it, I've found that, more often than not, a numeric "Identifier" is needed to guarantee that a row is unique without have a Clustered Index that is tens or sometimes hundreds of bytes wide but I'll never call such a trend a "general recommendation" because I do agree with and follow the bottom line I previously stated..."It Depends" and "Must Look Eye".