Unsure about relationship with Slowly Changing dimensions.

  • Hi

    Sorry datawarehouse newbie question! I am not sure how to maintain the relationship between Fact and Type 2 Dimension.

    i.e if a new row is inserted into a dimension table as an updated version of a previous row how is the relationship to the fact table maintained. Do you need to insert a new row in the fact table also or is there a better way to maintain this relationship via keys?

    Thanks in advance.

    Cheers, Paul

  • PugMaster (7/7/2010)


    I am not sure how to maintain the relationship between Fact and Type 2 Dimension.

    i.e if a new row is inserted into a dimension table as an updated version of a previous row how is the relationship to the fact table maintained. Do you need to insert a new row in the fact table also or is there a better way to maintain this relationship via keys?

    The idea behind Type 2 Dimensions is to keep accurate historic track therefore each FACT row should be pointing to the DIM row that better represents the facts at the time the row was created.

    In my experience the DIMENSION row includes either a VERSION column or a EFFECTIVE_DATE (from/since) set of columns.

    Usually PK in this DIM table will be a surrogate key, new row gets then a new PK and new information entered into the FACT table will point to the new (most udated) row.

    Hope this helps.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • As Pablo said, you shouldn't have to do anything...because at the time you populated the FACT, it should have pointed to the relevant DIM record (via the surrogate key).

    Unless you have some other non-standard requirement or situation...

  • PugMaster (7/7/2010)


    Hi

    Sorry datawarehouse newbie question! I am not sure how to maintain the relationship between Fact and Type 2 Dimension.

    i.e if a new row is inserted into a dimension table as an updated version of a previous row how is the relationship to the fact table maintained. Do you need to insert a new row in the fact table also or is there a better way to maintain this relationship via keys?

    I too have been trying to work through the relationship between the Fact table and a SCD. Suppose I start with the following entries in my fact and dimension tables.

    FactOrders

    OrderKey Revenue

    100 $5,000

    DimOrder

    OrderKey Order_No Status Current

    100 100001 Open Yes

    Now the status for order number 100001 is changed to Closed. So DimOrder now contains the following:

    OrderKey Order_No Status Current

    100 100001 Open No

    101 100001 Closed Yes

    I have come to the conclusion that FactOrders has to have two records as well.

    OrderKey Revenue

    100 $5,000

    101 $5,000

    An order could have a dozen changes and was thinking the changes would only have to be made in the dimension. However, I now believe that there will need to be an entry in the fact table for each change. Correct?

    Thank you for your input.

    Rob

  • No, you shouldn't have an extra entry in the fact table...because there isn't another transaction for that amount and your numbers will double when aggregated.

    There are a few ways to deal with this situation:

    1. If there are only a limited number of statuses, you can have dates for the statuses (like Open_Date and Close_Date) as well as the current status. Doing it in this way means that you don't have to enforce a type 2 change for the change in status.

    2. Make the status field a type 1 change instead of a type 2 change. This will depend on the business requirement.

    3. Make use of a helper table.

    4. In some cases, you might even want to consider 2 surrogate keys in the fact. One that points to the dimension member at the time of inserting the fact record, and one that point to the current record of the dimension member. This would mean a lot of updates on your fact though, and should only be used if it makes sense base on the requirements.

    All of the above might be used as a possible solution, but in the end it all comes down to the business requirements and how you need to report on the data.

    Hope this helps.

    Martin.

  • Martin Schoombee (10/22/2010)


    No, you shouldn't have an extra entry in the fact table...because there isn't another transaction for that amount and your numbers will double when aggregated.

    Martin - thank you for your response.

    There are several fields that will be tracked in the SCD. I only showed Status to keep it simple.

    If I only keep one record in the Fact table, how do I join to the SCD? Do I join using the natural key (order number)?

    Rob

  • I hate to say this...but it depends.

    The general best practice would be to have a helper table, that contains the surrogate keys of both the fact and dimension...and a factor. So if you have 2 dimension members (orders) for a fact (transaction), the factor would be 0.5.

    HelperOrders

    OrderKey FactKey Factor

    100 3 0.5

    100 5 0.5

    The reason for the factor is that when you join and aggregate, it would be as simple as multiplying the measure (revenue in your case) with the factor to still get the correct values if aggregated at a higher level.

    The risk with this approach is when you view all the statusses of an order. It will appear as if only half of the revenue is assigned to a specific status:

    Orders

    OrderKey Status Revenue

    100 Open $2,500

    100 Closed $2,500

    With the above said, if your natural key is as simple as only an order number then it might make sense to use that in combination with the current indicator in query joins.

    You still have to take into consideration what the purpose of your design is. How do you want to report on the data? Does it maybe make sense to have 2 dimensions, one that tracks changes and one that doesn't? Are you going to build cubes?

    I suppose the most important question is how are you going to report on the history of an order. Would that simply be for a drill-through kind of report where you do not have to associate the measure with the history, or are you going to build a cube in which case you will need some way to aggregate the measures correctly for the different hierarchies you have in the cube...

    It all seems straight forward, but it all depends on what the output needs to be.

    Hope this helps.

    Martin.

    Edit

    Have look at the following, which describes all the intricacies around many-to-many dimension relationships: http://www.sqlbi.eu/Default.aspx?tabid=80

  • PugMaster (7/7/2010)


    Hi

    Sorry datawarehouse newbie question! I am not sure how to maintain the relationship between Fact and Type 2 Dimension.

    i.e if a new row is inserted into a dimension table as an updated version of a previous row how is the relationship to the fact table maintained. Do you need to insert a new row in the fact table also or is there a better way to maintain this relationship via keys?

    Thanks in advance.

    To answer your question: Yes, a new row will be inserted into the fact table that points back to the latest dimension record.

    Obviously, there are a handful of different types of fact tables and countless variations of each of those, so I'm just giving you a simple example.

    It is helpful to always include concepts of time when dealing with the relationship between dimensions and facts. In an accumulating snapshot fact table that runs monthly, sometimes a Datekey is the only thing that changes between records.

    And don't forget to include a RowIsCurrent flag column in your dimension so that you create the fact table with the latest key.

  • Usually PK in this DIM table will be a surrogate key, new row gets then a new PK and new information entered into the FACT table will point to the new (most udated) row.

    I would strongly recommend that the PK for all dimensions be a surrogate key. Even if it doesn't seem necessary at construction, it allows for flexibility with future changes.

  • Deleted comment.

  • I would suggest that at least for order status you should use a Type 1 SCD. In other words, just update the original record in your order dimension rather than create a new one.

  • Daniel Bowlin (12/9/2010)


    I would suggest that at least for order status you should use a Type 1 SCD. In other words, just update the original record in your order dimension rather than create a new one.

    If they were to do that, then the following business question would not be able to be answered using the fact table if the order went from Open to Closed last month: "How many orders were in the Open state in the first week of December?".

    If they were to include a new entry into the FACT table for every SCD2 change in the dimension, then this question could be answered fairly easily (providing the status change date key is included in the FACT table).

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply