Do You Really Need Surrogate Keys? Part 1 – Concepts and Methods

  • I'd love to enforce the uniqueness, but that it isn't unique and can change shows there's an issue with using it. There may be that case where it's valid to use, I've just not seen it yet.

  • Robert Mahon-475361 (10/26/2010)


    EXCELLENT examples actually. Would you prefer the terrorist can't book until he comes up with a SSN that hasn't been used before, or let him book, but let it be flagged so the scenario you suggest can happen?

    I have said most of what I want to say in my previous post, so I'll keep it short.

    I would prefer the database to enforce whatever the business expects. If I see an error in the expectations, and I can convince the business - so much the better! But that is part of what I do in the analysis phase.

    Once it comes to the logical and physical DB design, I have to stick to what the business expects. If I don't then there will be a mismatch between what how the business works, and how the database supports - or rather, fails to support - those procedures.

    My example was actually not good at all. The airport security business routinely deals with forged IDs; they will never fall for using SSN to identify travelers. I don't know what they do use -probably a combination of attributes, but I have no practical experience there-, but it will be much better than an SSN. But in a company where the people do believe the myth of unique SSns and I can't convince them otherwise, I will enforce the uniqueness the business expects. My job is to support the business, not to force my ways on them (that's another job, one I aspire far less then what I do now).

    (but again, it can change, and keys should never change)

    I do agree with the best practice advice to avoid volatile primary keys, but I will not go as far as to say that a primary key should never change. That would rule out every possible key, including surrogates (or have you never had to merge data from two databases after a company merger?)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Aye. Never had problems merging DB's. That's why GUID's are so excellent and why I refer people to Quazibubble's posts.

    http://www.sqlteam.com/Forums/topic.asp?TOPIC_ID=6136&whichpage=3

    He does a far better job explaining this stuff than I ever could.

  • Robert Mahon-475361 (10/26/2010)


    Aye. Never had problems merging DB's. That's why GUID's are so excellent and why I refer people to Quazibubble's posts.

    http://www.sqlteam.com/Forums/topic.asp?TOPIC_ID=6136&whichpage=3

    He does a far better job explaining this stuff than I ever could.

    Quazibubble, is that you?

    Random Technical Stuff[/url]

  • No, though I did enjoy that series of posts he made. When I ask people to read that stuff, I do wonder where he is now though. Pity he couldn't turn the vitriol down a twee notch or two, I'd have like to see more posts.

  • Ryan C. Price (10/25/2010)

    Agreed.

    Our Database - about 200 tables - is about 80% Surrogate/ 20% Natural.

    The below is an example where we too prefer the Natural.

    In my experience, the only time natural keys can be relied on to be 'constant', is when their definitions are 'cast in stone' by a standards body. E.g. 3-letter country codes and 3-letter currency codes. Other than that, I think the benefits of natural keys are generally out-weighed by benefits of surrogates.

  • denis.hosdil (10/26/2010)


    Ryan C. Price (10/25/2010)

    Agreed.

    Our Database - about 200 tables - is about 80% Surrogate/ 20% Natural.

    The below is an example where we too prefer the Natural.

    In my experience, the only time natural keys can be relied on to be 'constant', is when their definitions are 'cast in stone' by a standards body. E.g. 3-letter country codes and 3-letter currency codes. Other than that, I think the benefits of natural keys are generally out-weighed by benefits of surrogates.

    I encourage you to read the following Wikipedia article. The standards didn't change, but if you stuck with 3-letter country codes then you'd find it fairly hard to move to the numeric country codes, should your organization decide that these are a better fit.

    You see? Even ISO country code standards bodies eventually found it easier to use a natural key.

    Random Technical Stuff[/url]

  • ta.bu.shi.da.yu (10/26/2010)


    Sigh. I'm well aware of what a relation is thanks.

    I suspect it may be you who is not really understanding that a relational database is so called because of the mathematical concept of a relation.

    I'm well aware of the foundational aspects of Codd's relational model, if not the finer points, so I personally don't really see any value with corresponding on this point any more!

    You can punt, dodge, and roll all you wish, but the fact remains that normalization affects the representation of data across the entire database. Your original statement that "You normalize an individual table, no normal forms work on multiple tables simultaneously" was flatly incorrect.

    It seems to be a rather common misunderstanding, actually, especially among DBAs without a background in computer science. Perhaps I should write an article to clear things up. Could we get a show of hands of anyone else who believes this?

  • ta.bu.shi.da.yu (10/26/2010)


    I encourage you to read the following Wikipedia article. The standards didn't change, but if you stuck with 3-letter country codes then you'd find it fairly hard to move to the numeric country codes, should your organization decide that these are a better fit.

    You see? Even ISO country code standards bodies eventually found it easier to use a natural key.

    You disagreed with someone advocating a natural key...but saying they should use a natural key?

    If you meant to say they found it easier to use a surrogate key, you should be aware that ISO-3166-1 country codes are, by definition, natural, not synthetic.

  • ebay-1087924 (10/26/2010)


    It seems to be a rather common misunderstanding, actually, especially among DBAs without a background in computer science. Perhaps I should write an article to clear things up. Could we get a show of hands of anyone else who believes this?

    OK. I think 1nf through 3nf deals with decomposing a single table.

    Please provide an example where normalization operates on more than one table simultaneuosly.

  • Hugo Kornelis (10/26/2010)


    Once it comes to the logical and physical DB design, I have to stick to what the business expects. If I don't then there will be a mismatch between what how the business works, and how the database supports - or rather, fails to support - those procedures.

    Hugo, I have to applaud what appears to a be a deep understanding of data modeling. As the posts here illustrate, far too many developers believe the purpose of modeling is to make our lives easier, or to create a schema that is stable, maintainable, or even simply elegant.

    While those are appropriate tertiary concerns, the overarching goal of modelling is to represent the business environment accurately and fully. Your SSN example was spot-on (though perhaps not relevant to the airline industry). A surrogate key might ensure the database continues to function prima facie in the face of natural key modifications -- but will it still appropriately model the business needs? If the natural key of a table changes, that will usually involve business logic changes that cannot be captured simply by adding a new index and moving on.

  • ebay-1087924 (10/26/2010)


    ta.bu.shi.da.yu (10/26/2010)


    Sigh. I'm well aware of what a relation is thanks.

    I suspect it may be you who is not really understanding that a relational database is so called because of the mathematical concept of a relation.

    I'm well aware of the foundational aspects of Codd's relational model, if not the finer points, so I personally don't really see any value with corresponding on this point any more!

    You can punt, dodge, and roll all you wish, but the fact remains that normalization affects the representation of data across the entire database. Your original statement that "You normalize an individual table, no normal forms work on multiple tables simultaneously" was flatly incorrect.

    It seems to be a rather common misunderstanding, actually, especially among DBAs without a background in computer science. Perhaps I should write an article to clear things up. Could we get a show of hands of anyone else who believes this?

    I'd say that you are both correct. The normalization process deals with one table at a time. But you have to do this for all tables to get a normalized data model. Normalizing half your tables and leaving the rest unnormalized makes about as much sense as following the instructions to change the oil in your car to the letter, but stopping halfway through. You never diverted from the instructions, and yet you'll still blow up your engine, because there is no oil in it.

    The statement ta.bu.shi.da.yu made and quoted by you ("You normalize an individual table, no normal forms work on multiple tables simultaneously") is not incorrect. Just check the short descriptions of the normal forms in this Wikipedia article. But you are absolutley correct that the normal form of a given table is highly irrelevant, what counts is the normal form of the entire schema (which I guess -without mathematic proof- to be the lowest of all table's normal forms)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (10/26/2010)


    I'd say that you are both correct.

    I think you're right Hugo.

  • ebay-1087924 (10/25/2010)


    Robert Frasca (10/25/2010)


    I cited an example earlier where the underlying natural key changes..If you had a surrogate key you wouldn't have to do anything except possibly modify a non-clustered index. It would be completely transparent to any apps and the rest of the data model. It doesn't get any more flexible or scalable.

    I understand the perils of a non-stable PK. (I've mentioned them myself in my prior postings). You're simply misusing the buzzword "scalable". A surrogate key makes (in most, not all cases) your design less brittle, and more maintainable. "Scalable" indicates your design more gracefully handles larger workloads. That has nothing to do with surrogate keys, which can increase your design's ability to adapt to changes in the workload requirements, but do not increase its ability to handle more work in general.

    Actually, your definition is only one definition of scalable. The ability to expand the scope of your applications or add new applications without having to re-architect your data entities also constitutes scalability. If I'm working within the framework of a set of surrogate keys I am not constrained by the limitations inherent in using so called "natural" keys. For example, I have merged billing systems for two completely different services from two different companies into my existing billing system with the ability to plug-in billing capabilities for additional future services without changing the relational architecture of the database. I added a couple of new entities but I didn't have to change the existing relational structure. I couldn't have done that if I'd used just the natural keys because the billable events and even the reportable hierarchies for each customer base were different.

    "Beliefs" get in the way of learning.

  • denis.hosdil (10/26/2010)


    OK. I think 1nf through 3nf deals with decomposing a single table.

    No it doesn't. The rules are typically illustrated by decomposition of a single table, simply because that's easier for beginners to understand.

    Normalization per se expresses how a fact is represented in the database. As a quickie 2NF example, consider this sample set of facts:

    a) People entities are defined by SSN.

    b) People entities have a name, sex, and age attribute

    c) People entities have multiple email addresses

    Now consider the following tables

    [font="Courier New"]Table People: SSN, Name, Age

    Table Emails: SSN, Name, Email Address

    Table PeopleAttributes: Name, Age, Sex[/font]

    Now, normalize that schema. You'll find that placing it in 2NF requires operating on multiple tables at once, i.e.:"

    [font="Courier New"]Table People: SSN, Name, Age, Sex

    Table Emails: SSN, Email Address

    Table PeopleAttributes: (no longer exists)[/font]

    This example seems artificial and trivial, because 2NF is such an easy rule to understand. However, when you begin attempting to normalize beyond 3NF, you'll find cases abound where you must simultaneously consider multiple tables.

    What's ironic is that many of the developers who fail to fully grasp normalization do manage to understand that denormalization means facts (in the form of data columns) are inappropriately expressed across multiple tables.

Viewing 15 posts - 106 through 120 (of 178 total)

You must be logged in to reply to this topic. Login to reply