Surogate Keys are not always the answer but are freqently used

  • I read an article today on this site today about Surrogate Keys.

    I attended my first relational database theory & design training in Princeton NJ.

    in 1988 and I'm familiar with the E. F. Codd & C.F Date Theories.

    I was very fortunate to have attended the class & my instructor was extremely intelligent and I will never forget her for I learned a lot from her. She was very interesting and as a result I developed a strong interest in Data Modeling & Database Design.

    But when Data Modeling I took what learned from the training and customized to make it work.

    I learned that something may look good on paper when implement it is often flawed and needs rework for it is not a solution to the problem.

    Despite the criticism of their theory, I learned a lot and I adapted their theory to deliver practical functional Database Applications.

    I attended the Oracle Data Modeling Course in 1997 and it was not so interesting because I had been performing Data Modeling on the Job for so long.

    It many cases it is appropriate to use a surrogate key when you do not have a candidate key or the key would induce performance problems etc.

    It has been my experience that it depends on the situation.

    I use Surrogate keys a lot but there are pros & cons.

    The decision should be based on the Business requirements, situation and a number of other factors.

    Listed below are some articles on why you should consider using a surrogate or an natural key.:-)

    http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx

    http://www.agiledata.org/essays/keys.html

    http://www.dbdebunk.com/page/page/626995.htm

    http://en.wikipedia.org/wiki/Surrogate_key

    🙂

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Gr8 !!! can you please post the summary of your learnings about surogate keys.

  • Welsh Corgi (3/22/2010)


    It many cases it is appropriate to use a surrogate key when you do not have a candidate key or the key would induce performance problems etc.

    Being one of the "etc" the situation where doing dimensional modeling you want to take advantage of Oracle's star-transformation feature.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Surrogate keys are always the answer if you are dealing with any successful Data Warehouse implementation. For instance take the case of any SCD transformation.

    Raunak J

  • Welsh Corgi (3/22/2010)


    Listed below are some articles on why you should consider using a surrogate or an natural key.:-)

    http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx

    http://www.agiledata.org/essays/keys.html

    http://www.dbdebunk.com/page/page/626995.htm

    http://www.dbdebunk.com/page/page/626995.htm

    http://en.wikipedia.org/wiki/Surrogate_key

    🙂

    The first URL quoted soesn't mention surrogate keys at all, the second returns an error (sometimes 404, but sometimes 500 so there may be something there sometimes), the last (wikipedia) is remarkably unenlightening (needs a rewrite: if I remember and have the time and the energy I may either edit it or put some suggestions into the talk page); that leaves the other URL (of which there are, for some reason, two copies in the list) as the only useful one (two?). I don't always agree with FP, but here I have to: there's only one sensible reaon for using a surrogate key, and that is that using the natural key is too messy (because it has too many colums, is too long) and makes things which are actually simple look complicated. There are of course two nonsensical reasons, which are often quoted as sensible reasons: (a) some of the columns in the natural key change quite often and (b) some relations have no natural key; (a) and (b) are nonsensical for the simple reason that if either occurs this is a clear indication that the entity modelling has been done incorrectly - the solution isn't to introduce spurious surrogate keys, but to redo the model so that every relation has a stable natural key (that of course doesn't mean that you have to use the natural key for an entity as the foreign key when referencing the entity: it may be more convenient to use a surrogate key to reduce apparent complexity).

    edit: spelling

    Tom

  • Tom, I agree with everything you wrote except your suggestion that keys should be stable:

    redo the model so that every relation has a stable natural key

    Stability is usually a desirable property for a key and is certainly an important consideration but it isn't an absolute requirement. Sometimes it may be quite reasonable to implement keys that are unstable by design. One example is a login name attribute. The user may be allowed to change his login name at any time. The login name must remain unique though, so it's perfectly sensible to make it a key in the database.

  • David Portas (10/13/2010)


    Stability is usually a desirable property for a key and is certainly an important consideration but it isn't an absolute requirement. Sometimes it may be quite reasonable to implement keys that are unstable by design. One example is a login name attribute. The user may be allowed to change his login name at any time. The login name must remain unique though, so it's perfectly sensible to make it a key in the database.

    David, I would agree with you in one way and disagree in another. I would say the login name is stable. Yes, it *could* be altered daily. Noone would though. Now, password, on the other hand, is an 'unstable' key. In theory it should be being altered by every user every 30 days. Not quite as unstable as, say, a LastUpdate field, but still unstable.

    So, while the ability to be unstable is there, I would say a login-name would be a stable field.

    To the other part of your statement (before I get shot for symantics about 'login-name'), Unique fields can be controlled very easily with a unique, non-clustered, non-keyed index and get out of the way of the real ones.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I doubt that a password would be a key. Most systems that I know of don't require passwords to be unique.

    In SQL Server a unique index on non-nullable columns is just a key by another name. Unique indexes and uniqueness constraints are simply two different ways to create keys. I would not recommend using a unique index instead of a uniqueness constraint. Constraints, not indexes, are the conventional, SQL standard and most obvious way to implement keys. As far as I know unique indexes alone have no particular advantages over uniqueness constraints. Was there a reason why you mentioned unique indexes specifically? What do you mean by a "non-keyed" index?

  • David Portas (10/13/2010)


    I doubt that a password would be a key. Most systems that I know of don't require passwords to be unique.

    You're right, it was a bad example of the point regarding uniqueness.

    In SQL Server a unique index on non-nullable columns is just a key by another name. Unique indexes and uniqueness constraints are simply two different ways to create keys. I would not recommend using a unique index instead of a uniqueness constraint. Constraints, not indexes, are the conventional, SQL standard and most obvious way to implement keys.

    Same difference. From: http://msdn.microsoft.com/en-us/library/ms177420.aspx

    The Database Engine automatically creates a UNIQUE index to enforce the uniqueness requirement of the UNIQUE constraint. Therefore, if an attempt to insert a duplicate row is made, the Database Engine returns an error message that states the UNIQUE constraint has been violated and does not add the row to the table. Unless a clustered index is explicitly specified, a unique, nonclustered index is created by default to enforce the UNIQUE constraint.

    What do you mean by a "non-keyed" index?

    I meant non primary keyed and typed too fast and didn't re-read what I wrote to translate internal mental lingo into what would be definitive instead of vague. Yes, Unique columns/indexes/constraints can be used as 'alternate keys'. They're just rarely used as them, because usually the need for the constraint means that means the key can be changed, thus possibly destroying the key associations.

    Thus, while a unique can be an alternate key, an alternate key is always unique. The difference is not inconsequential, thus I separate the concept. The fact that some data is usually mobile in a 'natural key', say, like the company's name, is why I like surrogate keys, like identity fields. This means that if a user edits the company's name (which is unique, at least by state), we still haven't lost our unique locator that everything else hooks to it with.

    Of course, if you have the business' FedTaxID, or a SSN, that never changes, this is a proper natural key. Stability is incredibly important to a key, if not a primary factor in deciding to use it.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (10/13/2010)


    In SQL Server a unique index on non-nullable columns is just a key by another name. Unique indexes and uniqueness constraints are simply two different ways to create keys. I would not recommend using a unique index instead of a uniqueness constraint. Constraints, not indexes, are the conventional, SQL standard and most obvious way to implement keys.

    Same difference.

    No it's not the same for the reasons I already stated: Constraints are standard SQL, unique indexes are not; Constraints are understood and used by more people; Some software tools will recognise keys declared as constraints but will not recognise those created using the CREATE INDEX syntax. In my view those are all very good reasons to use uniquness constraints and avoid creating unique indexes directly.

    Thus, while a unique can be an alternate key, an alternate key is always unique. The difference is not inconsequential, thus I separate the concept. The fact that some data is usually mobile in a 'natural key', say, like the company's name, is why I like surrogate keys, like identity fields. This means that if a user edits the company's name (which is unique, at least by state), we still haven't lost our unique locator that everything else hooks to it with.

    That's more or less what I thought you meant, but it's not technically correct. Two things make a key (aka "candidate key") and they are 1) uniqueness and 2) irreducibility. Those are the two fundamental qualifications for a key. Stability is usually desirable as well but the fact that a key changes does not make it any less of a key than some other key that doesn't change. Of course that's just terminology I suppose, the effect is exactly the same whether you choose to call it a key or not - the same considerations apply. But I think it's worth saying because a lot of people do seem to get very confused about what keys are and why they are important.

    I'm avoiding the term "alternate key" because primary and alternate keys are really the same thing. All are candidate keys.

  • David Portas (10/13/2010)


    No it's not the same for the reasons I already stated: Constraints are standard SQL, unique indexes are not; Constraints are understood and used by more people; Some software tools will recognise keys declared as constraints but will not recognise those created using the CREATE INDEX syntax. In my view those are all very good reasons to use uniquness constraints and avoid creating unique indexes directly.

    Here I feel we must disagree. Since finding a free copy of the 97 standard (or 99) has been dubious at best, I can't double check your comment, so I'll just assume it's accurate. Seems to make sense. However, I don't work in the SQL standard. I work in MS SQL Server and T-SQL. Because of that, and because an index can give me performance where a constraint will not, but still takes up the same system space and engine mechanics as a declared index under the hood, I would prefer the index.

    You make a good point though about 3rd party tools. I rarely use them so it rarely comes up as a difficulty for me, and that may be part of the difference.

    That's more or less what I thought you meant, but it's not technically correct. Two things make a key (aka "candidate key") and they are 1) uniqueness and 2) irreducibility.

    ...and is also known as the minimal superkey. I'm familiar with the theory. Practice, however, dictates other usage...

    Those are the two fundamental qualifications for a key. Stability is usually desirable as well but the fact that a key changes does not make it any less of a key than some other key that doesn't change. Of course that's just terminology I suppose, the effect is exactly the same whether you choose to call it a key or not - the same considerations apply. But I think it's worth saying because a lot of people do seem to get very confused about what keys are and why they are important.

    I'm avoiding the term "alternate key" because primary and alternate keys are really the same thing. All are candidate keys.

    You are absolutely correct according to the dictionary, and theoretical design. I grant that the terminology is accurate, and I went and double checked just to make sure I remembered it correctly. 🙂

    This, perhaps, is where we are disconnecting in our discussion, because theoretical design is like the ISO standard. Nice to have, but not always practical. Identification of candidate keys are useless in practical design other then as the first step to identifying useful primary and alternate keys, or perhaps to assist in search mechanic selectivity to acquire more speed and better index usage.

    To me, a useful key is something that if I found it yesterday with that value, I'll find it tomorrow with the same value, and I can use it to connect to data throughout the design's structure in place of the existing 'Primary Key' if I so chose. While this may not fall properly under the definition of generic db design standards (that I can't seem to get my hands on), it does fall under the definition of what myself and a number of DBAs I've worked with understand them to be in practical usage.

    [EDIT:] Aaaaaand I finally look at the top of the screen and realize I'm in the "Relational Theory" Forum... I'll shaddup now. [/EDIT]


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • It is usually preferable to use a natural key as opposed to a surrogate key in a relational Model. It depends on the situation

    In a Dimensional Model you do not use the natural key a Surrogate key is used.

    I can provide additional clarification if necessary.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Craig Farrell (10/13/2010)


    I work in MS SQL Server and T-SQL. Because of that, and because an index can give me performance where a constraint will not, but still takes up the same system space and engine mechanics as a declared index under the hood, I would prefer the index.

    I don't understand your claim that an index alone provides better performance than a uniqueness constraint. Can you back that up with an example? I'm pretty sure you'll find they are always exactly the same, in which case the constraint still has all the advantages I mentioned before.

    To me, a useful key is something that if I found it yesterday with that value, I'll find it tomorrow with the same value, and I can use it to connect to data throughout the design's structure in place of the existing 'Primary Key' if I so chose. While this may not fall properly under the definition of generic db design standards (that I can't seem to get my hands on), it does fall under the definition of what myself and a number of DBAs I've worked with understand them to be in practical usage.

    However, database users would disagree with you. End users don't use surrogate keys to identify data, they use natural keys (actually I much prefer the term "business keys" but it means exactly the same thing). Therefore identifying and enforcing the business keys is normally essential for the users to make sense of the data and to get correct results from it. That includes keys that change. If keys are being used by the business process as identifiers then they are least as important as other keys that don't change - perhaps more so. Keys therefore need to be enforced and properly identified in the database if you want to ensure correct results from it.

    Furthermore, the distinction you are trying to make between changing and unchanging keys is ultimately impossible to determine. If all the key values on a row change then there is no sound basis for saying that any of them "changed" - it is simply a different row because rows are identified by their keys. It is usually impractical to prevent data from being changed at some point during its lifetime and often you may not even know if it has been. Therefore the defining property of a key that really matters is uniqueness, not immutability.

  • Craig Farrell (10/13/2010)


    [EDIT:] Aaaaaand I finally look at the top of the screen and realize I'm in the "Relational Theory" Forum... I'll shaddup now. [/EDIT]

    Yeah - I learned a few years ago not to get in a disagreement with David about relational theory unless I was absolutely certain of my ground. And even then it turned out I was wrong :blush:

    Just to throw in another tack on the surrogate Vs natural key debate, there is a certain class of relationship that can't be modelled declaratively using surrogates, only with natural keys. I've sometimes wondered how the "surrogate and surrogates only" brigade handle them.

  • Still amazes me - yet it doesn't surprise me anymore - how this "surrogate Vs natural key debate" goes on and on.

    Please let me start by stating my position on the issue which is: "Use Natural keys whenever possible, use Surrogate keys whenever necessary(*)"

    (*) Like in FACT-DIM relationships in a dimensional model.

    Having said that I think I've figured out why this "debate" gets particularly intense in the SQL Server universe - it has to be with clustered indexes on identity columns, please allow me to explain further.

    Let me quote Microsoft recommendations on "Clustered Index Design Guidelines" where it reads...

    With few exceptions, every table should have a clustered index defined on the column, or columns, that offer the following:

    - Can be used for frequently used queries.

    - Provide a high degree of uniqueness.

    - Can be used in range queries.

    . . .

    Generally, you should define the clustered index key with as few columns as possible. Consider columns that have one or more of the following attributes:

    - Are unique or contain many distinct values

    - Are accessed sequentially

    - Used frequently to sort the data retrieved from a table

    - Defined as IDENTITY because the column is guaranteed to be unique within the table

    I personally find these guidelines sound but I also think some DBA construct them as "you must have a clustered index on an identity column" missing a couple of key words in Microsoft's documentation like "with few exceptions" and "columns that have one or more of the following attributes".

    My interpretation of the guidelines goes like: a clustered index is helpfull if chosen column is used in certain ways and is already defined in certain ways - which may (or may not) include the fact of already being of the identity datatype.

    Hope this brings fresh air to the so called debate - I know it's almost impossible to end it 😀

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Viewing 15 posts - 1 through 15 (of 38 total)

You must be logged in to reply to this topic. Login to reply