Using surrogate keys for

  • I'm working on my first data warehouse project and was wondering what is the industry norm when it comes to surrogate keys on fact tables. I know the Kimball group doesn't explicitly endorse it but one of their design tips mentions advantages (although another one mentions why it might be a bad idea to have one) so I was wondering people's opinion on this and how many are using them.

  • Surrogate keys can be very useful for some ETL operations, especially for updates and deletes on the fact table. They can also be useful to support a partitioning or archiving strategy or for references from other tables if you have them. My advice would be to create a surrogate key if and when you find a reason to, not because you think it's the "norm".

  • Thanks, David for your thoughts and explaining reasons why you use them. I do see the advantage of having them for partitioning purposes although you can theoretically partition by using the date key as well. I'm not too sure how useful they are for ETL unless doing auditing as by their nature, surrogate keys won't exist on source systems, but then again I'm not too experienced on data warehouses which is why I'm asking for people's opinions.

  • emiranda 59653 (4/2/2014)


    I'm not too sure how useful they are for ETL unless doing auditing as by their nature, surrogate keys won't exist on source systems, but then again I'm not too experienced on data warehouses which is why I'm asking for people's opinions.

    I always do my surrogate key processing in a staging area. By the time the data is prepared the keys are already populated - with the already-existing key values for rows to be updated/deleted and with newly generated key values for new rows. That way the final step of actually refreshing the data is kept as small and efficient as possible. If you do need to do updates and deletes against your table then having the surrogate key avoids another set of more complex joins to perform those updates. If your fact table is insert-only then the surrogate key probably won't help you.

  • +1 for surrogate keys. the keep the fact table small (in terms of bytes, not columns) and provide you with a level of 'protection' from changes in the source systems. E.g. if you acquire a new company and their system has customer references which are alpha-numberic and yours are numeric. If you did't use surrogate keys to link to the customer dimension you would have to update all your fact table DDLs.

    Keys into dimension tables should have no business meaning

  • +1 for surrogates. Keep the business keys out of fact tables. Resolve via Lookups during ETL

  • +1 for surrogate keys into the fact table.

    Very useful if you update facts.

    Even if you only insert rows, it can be useful to have a surrogate key, for example to keep track of what was inserted in the latest run, so you can configure a Process Add in Analysis Services.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Thanks for the replies. As a follow-up, how about bridge tables? Normally if you would model multi-valued dimensions with a bridge table, you will have a group key and a key to the outrigger table/dimension. The group key unlike a regular dimension surrogate key is obviously not unique.

    Would it make sense using the same arguments for using surrogate keys in fact tables (ETL, auditing) to create surrogate key for bridge tables or creating one overkill and just unnecessary overhead?

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply