SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Using surrogate keys for


Using surrogate keys for

Poll
Do you use surrogate keys (integer primary key) for fact tables?

100% - 9 votes Yes
100% 9 votes
0% - 0 votes No
0% 0 votes
Member votes: 9, Guest votes: 0. You don't have permission to vote in this poll
Author
Message
emiranda 59653
emiranda 59653
SSC Veteran
SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)

Group: General Forum Members
Points: 242 Visits: 800
I'm working on my first data warehouse project and was wondering what is the industry norm when it comes to surrogate keys on fact tables. I know the Kimball group doesn't explicitly endorse it but one of their design tips mentions advantages (although another one mentions why it might be a bad idea to have one) so I was wondering people's opinion on this and how many are using them.
sqlvogel
sqlvogel
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2056 Visits: 3706
Surrogate keys can be very useful for some ETL operations, especially for updates and deletes on the fact table. They can also be useful to support a partitioning or archiving strategy or for references from other tables if you have them. My advice would be to create a surrogate key if and when you find a reason to, not because you think it's the "norm".
emiranda 59653
emiranda 59653
SSC Veteran
SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)

Group: General Forum Members
Points: 242 Visits: 800
Thanks, David for your thoughts and explaining reasons why you use them. I do see the advantage of having them for partitioning purposes although you can theoretically partition by using the date key as well. I'm not too sure how useful they are for ETL unless doing auditing as by their nature, surrogate keys won't exist on source systems, but then again I'm not too experienced on data warehouses which is why I'm asking for people's opinions.
sqlvogel
sqlvogel
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2056 Visits: 3706
emiranda 59653 (4/2/2014)
I'm not too sure how useful they are for ETL unless doing auditing as by their nature, surrogate keys won't exist on source systems, but then again I'm not too experienced on data warehouses which is why I'm asking for people's opinions.


I always do my surrogate key processing in a staging area. By the time the data is prepared the keys are already populated - with the already-existing key values for rows to be updated/deleted and with newly generated key values for new rows. That way the final step of actually refreshing the data is kept as small and efficient as possible. If you do need to do updates and deletes against your table then having the surrogate key avoids another set of more complex joins to perform those updates. If your fact table is insert-only then the surrogate key probably won't help you.
aaron.reese
aaron.reese
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2339 Visits: 907
+1 for surrogate keys. the keep the fact table small (in terms of bytes, not columns) and provide you with a level of 'protection' from changes in the source systems. E.g. if you acquire a new company and their system has customer references which are alpha-numberic and yours are numeric. If you did't use surrogate keys to link to the customer dimension you would have to update all your fact table DDLs.

Keys into dimension tables should have no business meaning
gbritton1
gbritton1
Ten Centuries
Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)

Group: General Forum Members
Points: 1371 Visits: 879
+1 for surrogates. Keep the business keys out of fact tables. Resolve via Lookups during ETL
Koen Verbeeck
Koen Verbeeck
SSC Guru
SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)

Group: General Forum Members
Points: 60713 Visits: 13297
+1 for surrogate keys into the fact table.
Very useful if you update facts.

Even if you only insert rows, it can be useful to have a surrogate key, for example to keep track of what was inserted in the latest run, so you can configure a Process Add in Analysis Services.


How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?
My blog at SQLKover.

MCSE Business Intelligence - Microsoft Data Platform MVP
emiranda 59653
emiranda 59653
SSC Veteran
SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)

Group: General Forum Members
Points: 242 Visits: 800
Thanks for the replies. As a follow-up, how about bridge tables? Normally if you would model multi-valued dimensions with a bridge table, you will have a group key and a key to the outrigger table/dimension. The group key unlike a regular dimension surrogate key is obviously not unique.

Would it make sense using the same arguments for using surrogate keys in fact tables (ETL, auditing) to create surrogate key for bridge tables or creating one overkill and just unnecessary overhead?
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search