Primary Keys

  • What is a good way to determine which combinations of fields should make up a primary key?

  • This completely depends on your data. Sometimes it's obvious that a single column is unique and determines the record. Other times, there is no obvious combination to make up a primary key, in which case a surrogate is often used, although not ideal.

    Converting oxygen into carbon dioxide, since 1955.
  • This is a very old school (and old) summary of the rules of normalization, with examples. The core concept is, a column or columns has to define what makes a row unique. That's a candidate key. It's possible given a set of columns, there's more than one candidate key. However, you're going to pick one key to make the primary key. You can still also define the other candidate key (or keys) as unique indexes. In fact, ALWAYS do this. Unique indexes really help the optimizer, so if you have them, use  them. Identification of these is something you would do with the business. They're going to know the information that makes up the database best.

    Now, this doesn't get into the whole artificial key versus natural key debate. You have to have natural keys, the column or columns that make things unique. Then, if you want to (and I almost always do), for performance and management reasons, add artificial keys (identity, sequence, GUID). However, this is still after we've identified and define the natural key.

    If you really want to get into this, Louis Davidson has a great book on database design on Amazon you should pick up.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Most databases benefits to be normalized to the 3NF. Take a look of sites describing different sql normalization rules.

  • I try to avoid using data columns as Primary Keys, and use artificial columns instead. Data can change, even when the design time business rules say it won't. Modifying a referenced Primary Key column is much more painful than modifying and unreferenced column.

    Secondly, I try to avoid multiple column primary keys, especially when they're clustered, which they usually are. I've seen DBs with up to 8 columns in primary keys/clustered indexes, and they perform like a dog.

  • The textbook definition of a relational key is a subset of columns (NOT fields! Try attributes, if you want a near-synonym) which is unique within each row (NOT record) of a table (NOT a file). A table might have many keys; these are called “candidate keys”. If a subset of columns of a key is also unique, then that key is a “super key”. If the key has more than one column, then it is a “compound key”. A surrogate key is generated and used by the system and never exposed to the users; it does not replace a real key. People who never read Codd screw up this concept all the time. There are other types of keys, but let’s skip that for now.

    Things that are used as physical locators like GUID, UUID or IDENTITY are never keys. The GUID and UUID are “Global” or “Universal” things that are not attributes in the logical data model by definition. The IDENTITY in SQL Server is a left-over from Sybase that is a table property, not a column at all. It counts physical insertion attempts and is not a logical concept. It is a UNIX record number in disguise!

    Primary Key is a more complicated concept than you first think. When Dr. Codd introduced it, all databases were based on existing fie systems, which grew out of magnetic tapes and punch cards. And so was our mindset. Obviously, you can do one and only one sorted ordering on a file. A sequential file has to be sorted to be usable; random access on a magnetic tape is a bitch. So the first SQL engines needed to have a way to mark this access method in code.

    Now, on to heuristics for designing a table.

    1. Look for an industry standard. These are often required by law. Think about a VIN on your automobile, an ISBN on your books, etc.

    2. Look for validation rules. Since keys are modeled with a nominal scale, it should have regular expression. If it is a neat, clean encoding like ZIP codes, then the expression is easy (zip_code LIKE ‘[0-9][0-9][0-9][0-9][0-9]’). If it badly designed, then the expression can a mess, like British Postal Codes. Compound keys may require validation on the components and also the whole of such a key.

    3. Look for verification sources. Call the DMV or AutoMax for the VIN on that car you bought.

    4. If you honestly cannot find a source, then carefully design your keys. I have written several articles on how to do this. This is also in my books.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply