Surrogat Key be Created in DW or Stage?

  • Should the surrogate key, for instance customerKey for dimension customer, be created in the stage process and then the all data should be loaded into the dimCustomer or should the customerkey be created automatically in the dimCustomer when you are loading the data to the dimCustomer table?

    What is your reflection and experience?

  • akirajt (7/7/2012)


    Should the surrogate key, for instance customerKey for dimension customer, be created in the stage process and then the all data should be loaded into the dimCustomer or should the customerkey be created automatically in the dimCustomer when you are loading the data to the dimCustomer table?

    What is your reflection and experience?

    We use the Kimball Component and it creates our surrogate key and I output that data flow to Staging. If all checks out with our validation, I replace live table with Staging.

  • herladygeekedness (7/12/2012)


    akirajt (7/7/2012)


    Should the surrogate key, for instance customerKey for dimension customer, be created in the stage process and then the all data should be loaded into the dimCustomer or should the customerkey be created automatically in the dimCustomer when you are loading the data to the dimCustomer table?

    What is your reflection and experience?

    We use the Kimball Component and it creates our surrogate key and I output that data flow to Staging. If all checks out with our validation, I replace live table with Staging.

    I agree the approach works fine in an environment where the full DIM table gets replaced every time for the Staging one but, in a standard ETL process where DIM table gets incrementally loaded with the daily content of Staging I would create the surrogate key at the time new rows get inserted into DIM table - not at the staging phase, I would keep staging table as close - in terms of schema - as the OLTP source table is.

    Hope this helps.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • I agree with Pablo, but it also depends on how you are handling slowly changing dimensions, if at all.

    I personally only use the natural key(s) in the staging and SK's in the final table.

    Jake

  • We do incremental, SCD1, we simply don't insert, we replace existing dimension table with the newly staged table. Partition switches occur very fast, inserts may or may not be swift.

    Kimball takes existing dimension and preserves the keys as it determines changes from source system. Kimball outputs show changed, new, deleted, etc (altho we do not delete) and attaches next available key to new records.

    It's a very easy way to do things with little downside that I've seen. It's made development very rapid and the table swaps have never failed.

  • akirajt (7/7/2012)


    Should the surrogate key, for instance customerKey for dimension customer, be created in the stage process and then the all data should be loaded into the dimCustomer or should the customerkey be created automatically in the dimCustomer when you are loading the data to the dimCustomer table?

    Populate surrogate keys in staging. Otherwise you have to do joins back to the warehouse / mart tables while you are populating the referencing tables. Pre-populating the surrogate keys and the references to them minimises the number and size of joins and shortens the transaction during which the DW is being touched.

    Avoid using IDENTITY columns in the data warehouse. If you are using SQL Server 2012 then use sequences instead.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply