Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Generating Surrogate-Keys for Type 1 and Type 2 dimensions using SSIS


Generating Surrogate-Keys for Type 1 and Type 2 dimensions using SSIS

Author
Message
Harsh Bhaiya
Harsh Bhaiya
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 47
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/hbhaiya/2783.asp
Clarold Britton
Clarold Britton
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 36
Why don't you use GUIDs for the surrogate-keys?
DavidSimpson
DavidSimpson
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1076 Visits: 1074

I think this was a good and thorough article on SCD's.

One thing of note if you get the highest key value and manually set surrogates versus using identity columns then you cannot load a dimension in parallel.

Also be wary of using the checksum to determine record changes, it is possible, however unlikely, that two different records could produce the same checksum. Here is a quote from BOL:

"CHECKSUM satisfies the properties of a hash function: CHECKSUM applied over any two lists of expressions returns the same value if the corresponding elements of the two lists have the same type and are equal when compared using the equals (=) operator. For the purpose of this definition, NULL values of a given type are considered to compare as equal. If one of the values in the expression list changes, the checksum of the list also usually changes. However, there is a small chance that the checksum will not change."

GUID's may be great in that you don't need to worry about uniqueness, but there are a couple of reasons not to use GUID's as surrogates. One is that they are 4 times larger then an integer. May not sound like a big deal but when you are dealing with potentially hundreds of millions of records with an expectation of sub-second response time on reports, every little bit helps. Also you may want to create clustered indexes on the surrogate in the fact tables. Since GUID's are not sequential, at least in not in SQL2K, then you will quickly fragment your fact tables.

David





VincentRainardi
VincentRainardi
SSC Veteran
SSC Veteran (227 reputation)SSC Veteran (227 reputation)SSC Veteran (227 reputation)SSC Veteran (227 reputation)SSC Veteran (227 reputation)SSC Veteran (227 reputation)SSC Veteran (227 reputation)SSC Veteran (227 reputation)

Group: General Forum Members
Points: 227 Visits: 191

Thank you for writing this article, Harsh. It's a good article. I learned about the version number from your article; I've used effective and expiry date and current version column but I have come across the version number, so thanks for the valuable input.

I tend to agree with David that using identity is better than GUID or max+1. As David correctly say if we use max+1 we could have problem when loading in parallel. I've seen an application (not DW though, it was an OLTP) used max+1 to generate ID and we experienced contention and locking problems. Last year when building a DW in a project somebody suggested to use GUID and we decided not too because of the same reason as David mentioned, i.e. performance. Not on the dimension tables, but performance of the fact table as fact tables have many SKs columns. I used big int sometimes for customer and product dimensions but all other dimensions normal int usually sufficient.

Looking forward to your next article.

Kind regards,
Vincent


ranmel
ranmel
Grasshopper
Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)

Group: General Forum Members
Points: 24 Visits: 182

can I download the code in your article .

both the database and the ssis package.

thanks in advance

Ran





Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search