What is purpose of surrogate key?

  • I have seen may sites refering use of surrogate key. The link http://www.dbmsmag.com/9805d05.html might be useful. Can someone add simple use of surrogate key with examples?

    Thanks - JL

    Regards - JL

  • I would define it as a single field that has no meaning, is system generated, and exists mainly for maintaining relationships. For example, in the US a person would probably be uniquely identified by:

    SSN

    LastName

    FirstName

    MiddleName

    DateOfBirth

    That's a big primary key and if you are building a relationship between a Person and a Family then you would need to include all of those columns in the family table and when joining the families and persons tables you would need to include all those columns in the join. In order to avoid this you would create a unique constraint on the 5 columns that uniquely identify a person, but then add a PersonId column that is system generated (I use identity). Now in the families table I store 1 column, PersonId, and I only have to join on that single column. It is simpler to write queries and the database will perform faster because my keys are smaller and each row in the families table is narrower so I can fit more rows per page reducing my IO's.

    Others may have more to add or better explanations.

  • Just to add to Jack's (very good) explanation:-

    If SSN, LastName, FirstName, MiddleName, DateOfBirth are the primary key, what happens if LastName changes through marriage or mistyping in the first place... you have to change that in every related table as well.

    Using a surrogate key of PersonId means nothing has to change in any of the related records.

    Don't use a surrogate key just for the sake of it.

    Take an order processing system, where OrderNo is the natural primary key. There's no point in using a surrogate key here if the OrderNo is guaranteed to be unique, will never change, and is fairly short already.

  • In this particular example SSN can be treated as unique number so there is no need to join on all 5 columns. If you want to transfer this data to DW, you can have PK in Person dimension table.

    Can I use surrogate key to speed up dimension data loading in DW. In your above example if I create surrogate key in staging then I will have all the SK created already. Then during data load I will have to simply insert data in dimension table as bulk inert. This will save the DW load for generating PK and surrogate keys will act as PK in DW.

    Regards - JL

  • jluniya (9/5/2008)


    In this particular example SSN can be treated as unique number so there is no need to join on all 5 columns. If you want to transfer this data to DW, you can have PK in Person dimension table.

    SSN is not guaranteed to be unique and can be re-used, so while you will likely be okay with this it is not a recommended practice to use SSN as the unique key.

  • But I think you will have to include all the columns in join with source to create surrogate key.

    So in relational DB PK will not define uniqueness of the data; UK should be defined for this purpose. But in case of DW, the dimension PK defines uniqueness of the record which is nothing but the surrogate key.

    According to Ralph Kimball...

    "Actually, a surrogate key in a data warehouse is more than just a substitute for a natural key. In a data warehouse, a surrogate key is a necessary generalization of the natural production key and is one of the basic elements of data warehouse design. Letýs be very clear: Every join between dimension tables and fact tables in a data warehouse environment should be based on surrogate keys, not natural keys. It is up to the data extract logic to systematically look up and replace every incoming natural key with a data warehouse surrogate key each time either a dimension record or a fact record is brought into the data warehouse environment"

    Regards - JL

  • Maybe it's just me, but I was under the impression that SSN is not unique across the planet... that's why you need to have another column such as CountryId to complete the PK.

  • The UK doesn't use SSN. There is an equivalent but it's rarely used to identify someone and and individual may have more than one.

    Also, if you wish to hold historical versions of your conformed dimension record, this would be an attribute, not a primary or composite key value.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply