RE: Enforcing Data Quality while using Surrogate Keys

Valued Member

Points: 51

September 8, 2009 at 10:43 am

This article makes it sound like you have to choose between a natural key with a unique index or a surrogate key with a unique index. Every time I use a surrogate primary key I create a unique key on the columns that comprise the natural key. Surrogate and natural keys each have their role and work best when used for the purpose they were intended. Natural keys are the basis of relational database theory and if you do not have them I do not Know why you would use a relational database. Surrogate keys can be used to simplify joins and allow the natural key to change without having to cascade all the dependent foreign keys.

I have to agree that surrogate keys have no role during logical modeling. All entities should have a unique identifier and at the physical level you can add surrogate keys for all tables that have children and multiple column unique identifiers. I have seen "designers" that immediately switch to surrogate keys in the logical modal as soon as the unique identifier goes beyond 2 or 3 attributes without ever identifying the full natural key. It always leads to duplicates and application failure when more than one row is returned. Yes, I agree, your application is perfect and doesn't allow duplicates but users are devious and will always find that odd navigation path that will allow them to insert duplicates!

It is also difficult to find the surrogate key for a given row if you do not know and are not enforcing uniqueness on the natural key. How often to you SELECT * FROM blahblah WHERE surrogate_key_id = 2468? Usually your first query is SELECT surrogate_key_id FROM blahblah WHERE COMPANY_NAME = "ACME" AND LOCATION = 'CINCINATTI' to get the surrogate key to use in subsequent queries. Company name and location are the natural key in BLAHBLAH and should be indexed. Using my method they are indexed as a result of being in a unique key and the surrogate key would also be indexed as the primary key.

As a practical matter, carrying all the unique identifier columns down through all generations can lead to a natural key of 10's of columns. Here is an extreme example using what I believe are natural keys for a person and a department. I propose the natural key of a person is first, middle and last name, suffix, date and time of birth, location of birth, father and mother (who also have the same natural identifier) of the person. In this case the father and mother natural keys are comprised of 8 columns which is 16 columns for two parents plus the other 6 columns for a person makes 22 columns to identify one employee. I propose that the natural key for a department is company name, company location, date incorporated, incorporation location, (natural key for company) department name, department location and date department created. That is 6 columns. Now associate an employee with a department and you have 28 columns in the natural key for the employee department intersection table. You are looking at 34 predicates in your where clause to join these 4 tables.

On the other hand, if you add a surrogate key to the person table (employee and parents) you drop the number of columns in the employee table to 2 surrogate columns for the parents and the 6 others for a total of 8. Now if the company table has a surrogate key the department table drops to 4 columns and if department has a surrogate key the intersaction table now has 2 columns. To join the same 4 tables there are only 3 predicates in the join clause.

I have a question because I am not a Business Rules person. What is this Unification Business Rule? I do not understand the purpose. I searched the web and didn't get any hits in the first few pages. Can anyone give me some references so I can understand?