Generating Surrogate-Keys for Type 1 and Type 2 dimensions using SSIS

Question

Post reply

Generating Surrogate-Keys for Type 1 and Type 2 dimensions using SSIS

Harsh Bhaiya

SSC Journeyman

Points: 99
More actions
December 27, 2006 at 12:44 pm

#170799

Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/hbhaiya/2783.asp

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Clarold Britton Valued Member Points: 59 More actions · Answer 1

Clarold Britton

Valued Member

Points: 59

January 10, 2007 at 7:28 am

#682369

Why don't you use GUIDs for the surrogate-keys?

DavidSimpson Hall of Fame Points: 3562 More actions · Answer 2

I think this was a good and thorough article on SCD's.

One thing of note if you get the highest key value and manually set surrogates versus using identity columns then you cannot load a dimension in parallel.

Also be wary of using the checksum to determine record changes, it is possible, however unlikely, that two different records could produce the same checksum. Here is a quote from BOL:

"CHECKSUM satisfies the properties of a hash function: CHECKSUM applied over any two lists of expressions returns the same value if the corresponding elements of the two lists have the same type and are equal when compared using the equals (=) operator. For the purpose of this definition, NULL values of a given type are considered to compare as equal. If one of the values in the expression list changes, the checksum of the list also usually changes. However, there is a small chance that the checksum will not change."

GUID's may be great in that you don't need to worry about uniqueness, but there are a couple of reasons not to use GUID's as surrogates. One is that they are 4 times larger then an integer. May not sound like a big deal but when you are dealing with potentially hundreds of millions of records with an expectation of sub-second response time on reports, every little bit helps. Also you may want to create clustered indexes on the surrogate in the fact tables. Since GUID's are not sequential, at least in not in SQL2K, then you will quickly fragment your fact tables.

David

VincentRainardi SSCrazy Points: 2905 More actions · Answer 3

Thank you for writing this article, Harsh. It's a good article. I learned about the version number from your article; I've used effective and expiry date and current version column but I have come across the version number, so thanks for the valuable input.

I tend to agree with David that using identity is better than GUID or max+1. As David correctly say if we use max+1 we could have problem when loading in parallel. I've seen an application (not DW though, it was an OLTP) used max+1 to generate ID and we experienced contention and locking problems. Last year when building a DW in a project somebody suggested to use GUID and we decided not too because of the same reason as David mentioned, i.e. performance. Not on the dimension tables, but performance of the fact table as fact tables have many SKs columns. I used big int sometimes for customer and product dimensions but all other dimensions normal int usually sufficient.

Looking forward to your next article.

Kind regards,

Vincent

ranmel Old Hand Points: 364 More actions · Answer 4

can I download the code in your article .

both the database and the ssis package.

thanks in advance

Ran

Generating Surrogate-Keys for Type 1 and Type 2 dimensions using SSIS

Cookies on SQLServerCentral