Still amazes me - yet it doesn't surprise me anymore - how this "surrogate Vs natural key debate" goes on and on.
It neither surprises nor amazes me - there are so many people talking nonsense about it that lots of other people will be confused. Then you get the relational "fundamentalists" versus the relational "purists" versus the ISO SQL Standard worshippers versus the real SQL in (pick the dbms of your choice) brigade each chipping in with their take on the issue - and given that these groups can't even agree a story on something as simple as NULLs it's to be expected they will not agree on something more complex like surrogate keys.
Please let me start by stating my position on the issue which is: "Use Natural keys whenever possible, use Surrogate keys whenever necessary"
That's something that should be dinned into every relational database developer and administrator and architect and designer before they are allowed to practise the DB trade - provided "possible" is interpreted as meaning "both possible and reasonable". It has strong support in all the camps I mentioned above (even a relational "fundamentalist" like Fabian Pascal takes pretty much that position, which is what makes it easy for a pragmatist like me to agree with him).
Having said that I think I've figured out why this "debate" gets particularly intense in the SQL Server universe - it has to be with clustered indexes on identity columns, please allow me to explain further.
Let me quote Microsoft recommendations on "Clustered Index Design Guidelines" where it reads...
With few exceptions, every table should have a clustered index defined on the column, or columns, that offer the following:
- Can be used for frequently used queries.
- Provide a high degree of uniqueness.
- Can be used in range queries.
. . .
Generally, you should define the clustered index key with as few columns as possible. Consider columns that have one or more of the following attributes:
- Are unique or contain many distinct values
- Are accessed sequentially
- Used frequently to sort the data retrieved from a table
- Defined as IDENTITY because the column is guaranteed to be unique within the table
I personally find these guidelines sound but I also think some DBA construct them as "you must have a clustered index on an identity column" missing a couple of key words in Microsoft's documentation like "with few exceptions
" and "columns that have one or more of the following attributes
My interpretation of the guidelines goes like: a clustered index is helpfull if chosen column is used in certain ways and is already defined in certain ways - which may (or may not) include the fact of already being of the identity datatype.
The guidelines aren't really sound; for example "can be used on range queries" really means "make things which are used close together live close together in the index" which, in an OLTP system, may have nothing to do with range queries (for example if I have a small numer of interactive users and a system where menus are constructed from text in a database, it's a good idea to cluster on language - if someone's using French menus he or she will probably continue to do so, and having the same text in 104 languages in cache just because it's been pulled in in French isn't useful: there's no range query involved there); the same text example blows away the "high degree of uniqueness" guideline (if it said "sets of columns" which had that property it would be sound, but it says "coumns" not "sets of columns"). The statement about identity columns is just plain false (duplicates can and do occur if uniqueness is not enforced by a constraint).
Hope this brings fresh air to the so called debate - I know it's almost impossible to end it :-D
I think it's about as likely to end as our longest thread.