SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Surrogat Key be Created in DW or Stage?


Surrogat Key be Created in DW or Stage?

Author
Message
akirajt
akirajt
SSC-Enthusiastic
SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)

Group: General Forum Members
Points: 146 Visits: 191
Should the surrogate key, for instance customerKey for dimension customer, be created in the stage process and then the all data should be loaded into the dimCustomer or should the customerkey be created automatically in the dimCustomer when you are loading the data to the dimCustomer table?

What is your reflection and experience?
herladygeekedness
herladygeekedness
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1330 Visits: 813
akirajt (7/7/2012)
Should the surrogate key, for instance customerKey for dimension customer, be created in the stage process and then the all data should be loaded into the dimCustomer or should the customerkey be created automatically in the dimCustomer when you are loading the data to the dimCustomer table?

What is your reflection and experience?


We use the Kimball Component and it creates our surrogate key and I output that data flow to Staging. If all checks out with our validation, I replace live table with Staging.
PaulB-TheOneAndOnly
PaulB-TheOneAndOnly
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14043 Visits: 4639
herladygeekedness (7/12/2012)
akirajt (7/7/2012)
Should the surrogate key, for instance customerKey for dimension customer, be created in the stage process and then the all data should be loaded into the dimCustomer or should the customerkey be created automatically in the dimCustomer when you are loading the data to the dimCustomer table?

What is your reflection and experience?


We use the Kimball Component and it creates our surrogate key and I output that data flow to Staging. If all checks out with our validation, I replace live table with Staging.


I agree the approach works fine in an environment where the full DIM table gets replaced every time for the Staging one but, in a standard ETL process where DIM table gets incrementally loaded with the daily content of Staging I would create the surrogate key at the time new rows get inserted into DIM table - not at the staging phase, I would keep staging table as close - in terms of schema - as the OLTP source table is.

Hope this helps.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
jake.mayher
jake.mayher
SSC Rookie
SSC Rookie (28 reputation)SSC Rookie (28 reputation)SSC Rookie (28 reputation)SSC Rookie (28 reputation)SSC Rookie (28 reputation)SSC Rookie (28 reputation)SSC Rookie (28 reputation)SSC Rookie (28 reputation)

Group: General Forum Members
Points: 28 Visits: 91
I agree with Pablo, but it also depends on how you are handling slowly changing dimensions, if at all.

I personally only use the natural key(s) in the staging and SK's in the final table.

Jake
herladygeekedness
herladygeekedness
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1330 Visits: 813
We do incremental, SCD1, we simply don't insert, we replace existing dimension table with the newly staged table. Partition switches occur very fast, inserts may or may not be swift.

Kimball takes existing dimension and preserves the keys as it determines changes from source system. Kimball outputs show changed, new, deleted, etc (altho we do not delete) and attaches next available key to new records.

It's a very easy way to do things with little downside that I've seen. It's made development very rapid and the table swaps have never failed.
sqlvogel
sqlvogel
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2082 Visits: 3706
akirajt (7/7/2012)
Should the surrogate key, for instance customerKey for dimension customer, be created in the stage process and then the all data should be loaded into the dimCustomer or should the customerkey be created automatically in the dimCustomer when you are loading the data to the dimCustomer table?


Populate surrogate keys in staging. Otherwise you have to do joins back to the warehouse / mart tables while you are populating the referencing tables. Pre-populating the surrogate keys and the references to them minimises the number and size of joins and shortens the transaction during which the DW is being touched.

Avoid using IDENTITY columns in the data warehouse. If you are using SQL Server 2012 then use sequences instead.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search