• MMartin1 (2/4/2016)


    The problem with assigning an auto-incrementing id as the primary key is that it does nothing. Imagine a data entry typist accidentally leaning on the keyboard for a while, causing the same data to submitted multiple times. This surrogate key will keep spawning new values while the same data is entered over and over again.

    The alternative here would be a natural key or a intelligent key? These may be subject to repeat or be accidentally duplicated as well more so. Ex// you get a customer in the 90210 area code last named Barkley. Key becomes like Bark90210. You get another customer in the same area code last named Barker --> Bark90210. Then your managers will really be barking at you.

    As for natural keys ... like a social security number. I would be cautious of using sensitive data like this as a key.

    I think operational keys, like surrogate keys, need not have intelligence or be built with strings. They join better to their related tables as integer indexes. Incrementing numbers are usually associated with a time of entry as well and can be a good indicator of sequence. They need not be a clustered index but they help keep the table size manageable and joins fast in a relational schema.

    My thoughts.

    No, the alternative is to start by looking at how the business identifies instances of the object.

    The way I have been taught to do data modeling is to start by not thinking about IT implementation at all. And in fact, a data model can be useful even when the business then decides not to automate anything.

    Even without databases and computers, people have always had a need to identigy individual instances of an object type. Sometimes that is done by a truly natural (as in: exists as a part of the object itself) key - e.g. in chemistry, the number of atoms in a molecule can be used to identify an element. Much more often, a human-designed and human-assigned key is used, such as the element name in chemistry. Or a street name and house number for an address. Or even a customer number that is assigned when you place your first order. A lot of people like to argue on which of those are natural and which are artificial; I dislike that so I do not use those terms; instead I use the term "business key" for anything the business uses to identify instances, and "surrogate key" for additional keys created (and hopefully contained) in the database to serve as a surrogate for a business key.

    During the modeling phase, I am only interested in business keys. So I visit a warehouse when a truck is unloaded. I see someone using a handscanner to scan all incoming goods, so I look at what he is scanning - bam! there's my business key. I go to the helpdesk and hear that employees pick up the phone and their first question is "what is your customer number" - bam! another business key found. (And then I listen a while longer to see what other candidate keys are used for customers who do not have their customer number).

    Surrogate keys get added at implementation. In my conceptual and logical designs, I have no issue with a table that has a primary key that consists of three long character columns and is referenced by a dozen other tables; in my SQL Server implementation I will add a surrogate key (probably IDENTITY) to that table *while still retaining the uniqueness on the business key* and use that column to implement the foreign keys.

    You also mention "smart keys", such as Bark90210. I do not really like them, as they technically violate 1NF - combining two seperate values into a single attribute. But if the business actually uses that as the business key (as opposed to it being thrust upon them against their will by an IT team), then I will go with it. I always think that IT should try to minimize the changes the force upon the business.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/