Is a durable key always necessary in a dim?

  • imani_technology

    SSCrazy Eights

    Points: 8002

    I want to use a Type 2 SCD SSIS transform on a dimension table that does not have a durable key.  However, the table does have a 4-column natural key.  Please bear in mind, these 4 columns are also foreign keys to other tables.  Is a single durable key necessary in this case?

  • Martin Schoombee

    SSCoach

    Points: 19026

    It is recommended, yes...chances are that you most likely would want to add the durable key (or a surrogate key if that's what you meant) to a fact table as foreign key, and having a composite key will not be ideal from a performance or design perspective.

    I would also recommend that you don't use the Type-2 SCD component in SSIS...it's one of the worst-performing RBAR-type data flow tasks.

  • Chris Harshman

    SSC-Forever

    Points: 41850

    Durable keys are important because you don't want to have to update all the existing records in a Fact table if the key in the Dimension table changes.  Dimension table keys should also be separate from the "natural" key or foreign key used in the OLTP system because you will need to represent the data at different points in time for your type 2 slowly changing Dimension.

    Say you have a customer John Doe who is CustomerID 12345 in your OLTP system.  If he lives in Ohio in 2018, then moves to Kentucky in January 2019, you would want to know both of those addresses in your Demographics Dimension table.  Any orders he made in 2018 should be reflected in Ohio, and any orders he made in 2019 should be reflected in Kentucky, so different records in your Fact table will point to different records in your Dimension table based on what was current at the time the Fact occurred.

    If you try to use a multi-column key, you will end up making the Fact table wider than it should be.  Your Fact table will typically be narrow (few columns) and long (many rows), while your Dimension tables will typically be wide (many columns) and short (fewer rows).  You would also need a way to distinguish the different point in time records for the same "natural" key, so you might as well use a single surrogate key anyway.

    I also agree with Martin that there are better ways to handle type 2 SCD than the SSIS transformation task.  Take a look at what you can do with the MERGE statement OUTPUT clause:

    https://wateroxconsulting.com/archives/type-2-scd-with-sql-merge/

     

  • imani_technology

    SSCrazy Eights

    Points: 8002

    I totally agree that the SCD object is far from ideal.  However:  1) the data set is very small (just a few hundred megabytes of data), 2) we are under a tight deadline and the SCD object is the quickest to implement.

     

    That being said, shouldn't the SCD object automatically increment the Durable Key/Surrogate Key when it adds a new row?

  • nova

    SSC Journeyman

    Points: 79

    SCD requires a Business Key in the source. In the target dimension table you can use an IDENTITY column for a surrogate key. You don't need to do anything in the SCD component to achieve that.

  • Martin Schoombee

    SSCoach

    Points: 19026

    imani_technology wrote:

    I totally agree that the SCD object is far from ideal.  However:  1) the data set is very small (just a few hundred megabytes of data), 2) we are under a tight deadline and the SCD object is the quickest to implement.

    That being said, shouldn't the SCD object automatically increment the Durable Key/Surrogate Key when it adds a new row?

    Cutting corners will cost you in the long run, and I'd strongly advise against it. If the data set is small now, it doesn't mean that it will remain small in the long run.

    Durable keys and surrogate keys are different things. A surrogate key is unique for every row in the dimension table, and you can use an identity field for that. A durable key remains the same for all historic records of an entity, and gives you the ability to access all records of a customer etc. without the need for self-joins or complicated queries.

  • Jeff Moden

    SSC Guru

    Points: 995161

    And surrogate keys can and usually are durable keys.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Martin Schoombee

    SSCoach

    Points: 19026

    Jeff Moden wrote:

    And surrogate keys can and usually are durable keys.

    yes, true 🙂

  • imani_technology

    SSCrazy Eights

    Points: 8002

    Okay, so let's say we use the MERGE statement instead of an SCD object.  How do I get data from the preceding step in the SSIS package (a Derived Column transform, in this case) into the MERGE statement?

    Also, how would the MERGE statement deal with both the PK (business/surrogate key) and the Durable Key (used for joining the fact table to the dimension table)?

  • nova

    SSC Journeyman

    Points: 79

    You cannot directly use SSIS data flow to drive a MERGE statement. The MERGE statement would have to be part of your control flow instead and you should use a staging table to supply the data for the MERGE.

    You may be confusing the idea of a durable key with a surrogate key. Durable key is Ralph Kimball's jargon for a value which is not unique (and therefore not really a key at all) in a Type 2 dimension but which is a surrogate for an entity in the business domain where there is no other stable business key. There is no method for maintaining such durable values unless you have some kind of indication that a business key value has changed and some other means to identify the old version of the row in the dimension - usually some alternative key from the source data.

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply