Tracking History in Fact Table

  • Hi,

    I was hoping somebody would be able to offer advice in relation to tracking history in a fact table, in particular whether #2 below would be a viable option? I would have thought that #1 would be bad for performance given the volume of records created through history so maybe not best practice?. If anyone could suggest another viable approach that would be great.

    Any feedback would be greatly appreciated!

    Thanks in advance,

    N

    1) Implementing a SCD Type2 approach on the fact table with a FromDate, ToDate & IsActive flag using a the MERGE approach (In know this is common practice with dimensions)

    2) Creating a separate Audit table and logging the actual output from the MERGE (i.e. $Action, deleted.col1, deleted.col2...etc) into the Audit table to track history (populating the FromDate & ToDate after using the Insert date of these records perhaps?). I was thinking this could be done for $Action IN (UPDATE,DELETE). This seems more straight forward to me but I could be wrong.

  • Just to add that I am currently using SQL Server 2012 so don’t have temporal table functionality available in 2016

    Thanks

    N

  • It's important to understand your use-case better, before making decisions on a best way forward. Could you provide more details about that?

  • Please post the DDL, including any triggers, constraints, or indexes, that are currently on the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • We had a need to track the changes happening to an order over time (some of our orders can be active for years). There were many columns of the source order table we need not history for so we have a list of columns of concern would we review on a daily basis to determine whether a new 'snapshot' of the order was needed, with it's selected list of columns. So we did not 'snapshot' an order every day, only when something of interest changed. Snapshots were assigned a system-generated key. Then we constructed a 'map' table which was date stamped and was used to identify which 'snapshot' of an order was valid on any particular day. This basically gave us an audit trail of the order allowing us to go 'back in time' to see what the status of an order was historically, what changed when. This also allowed us to reproduce a historical 'order backlog' report at any time going forward. It also allowed us to calculate 'elapse time' between certain events an order can experience over time (i.e. days from when parts were picked from inventory until order was ready for shipment). This design has proven to be highly useful to us.

  • KerryH wrote:

    We had a need to track the changes happening to an order over time (some of our orders can be active for years). There were many columns of the source order table we need not history for so we have a list of columns of concern would we review on a daily basis to determine whether a new 'snapshot' of the order was needed, with it's selected list of columns. So we did not 'snapshot' an order every day, only when something of interest changed. Snapshots were assigned a system-generated key. Then we constructed a 'map' table which was date stamped and was used to identify which 'snapshot' of an order was valid on any particular day. This basically gave us an audit trail of the order allowing us to go 'back in time' to see what the status of an order was historically, what changed when. This also allowed us to reproduce a historical 'order backlog' report at any time going forward. It also allowed us to calculate 'elapse time' between certain events an order can experience over time (i.e. days from when parts were picked from inventory until order was ready for shipment). This design has proven to be highly useful to us.

    That sounds complicated.  A simple "UPDATE/DELETE" trigger to write to the history table seems like it would do the trick.

    You don't want to audit inserts because it's a totally unnecessary duplication of data.  Only modifications.  Unmodified orders would simply live in the original table with no history.  Once modified, that original row would be moved to the history table by the trigger and the new (lastest) information would live in the original table.

    That's basically how the Temporal Tables starting in 2016 work.

    The trigger would update the SCD 2 dates in the history table as the rows were moved to the history table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Our source system is Oracle.  Data is going into a SQL Server data warehouse.  Being a large corporation with multiple divisions operating on a single instance of Oracle means methods for identify / triggering of data changes of only a portions of the data of concern for our divisions versus the entire set of Oracle data is the issue.  And implementing things like triggers on Oracle tables in the multiple divisions environment for just our division's purposes isn't going to happen.  Performance of the Oracle ERP system was the main concern and nothing that could impact that was allowed.  Also, we were not concerned with every change that occurred to the data over the course of a day, only what the data looked like at the end of the day since this is for historical DW reporting, and the fact that reporting was for business metrics purposes, our orders are typically longer term engineer-to-order type (typically weeks and months), not commodity type, and the need for near real-time DW reporting is not any concern at this point. We simply read the data from Oracle and look for changes on the SQL side.  We could have implemented triggers in the ODS environment in SQL Server but we tried to keep the ODS tables as close to copies of the Oracle data and structure as possible.  (And there's the issue of re-syncing data between Oracle and SQL Server.)  And considering we only need a much smaller subset of all the data columns available in Oracle is was much simpler downloading the data and then using check sums to identify changes to the columns of interest.

  • Also, our DW tables contains calculated / transformed data that does not exist in the Oracle source.  So our ETL logic can be somewhat complex for determining some things based not only on what exists in the current data but also what exists on historical DW data.  A trigger in Oracle hat has no visibility to our DW will not work for this need.

  • So and end of day "upsert" would seem to have been in order.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    First of all thanks for your input much appreciated. The trigger sounds like an interesting approach although I have heard that they are not good performance wise (I am going to test). Given that there would be roughly 50,000 facts loaded per day (so not a huge amount) maybe performance is not an issue. I guess the question is whether using a trigger would give any benefit over using the inserted & deleted tables populated as part of the merge output? I like the idea of populating the effective start & end dates in he history table using the trigger...good call

    Thanks again,

    N

  • Just to answer the question in relation to indexes & constraints on the fact table. Currently history is being tracked in the fact table using the SCD Type2 approach but I have learned that the history is rarely accessed but needs to be kept for contractual reasons. The fact table is bloated essentially so I am looking to rebuild from scratch. There are default constraints on the EffectiveToDate & IsActive fields (which default to 9999-01-01 & and “1’ respectively for newly inserted records. There are no triggers In the data warehouse currently.

    Cheers,

    N

  • Thanks to everyone who chimed in...I’m trying to figure out how to reply directly to a comment but haven’t got there yet 🙂

  • Niggles wrote:

    Hi Jeff,

    First of all thanks for your input much appreciated. The trigger sounds like an interesting approach although I have heard that they are not good performance wise (I am going to test). Given that there would be roughly 50,000 facts loaded per day (so not a huge amount) maybe performance is not an issue. I guess the question is whether using a trigger would give any benefit over using the inserted & deleted tables populated as part of the merge output? I like the idea of populating the effective start & end dates in he history table using the trigger...good call

    Thanks again,

    N

    To respond "directly" to a post, you can just quote it like I did this one.  There is a "quote" selection on the far right of the same line that contains the date of the post.

    As far as performance of triggers go, they've earned a reputation as being poorly performing mostly because people write terrible code for their triggers.  The huge advantage of triggers are that they're automatic and you don't need to include (for example) audit code in all other code.  The trigger will have you covered even if someone forgets.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Niggles wrote:

    Hi Jeff,

    First of all thanks for your input much appreciated. The trigger sounds like an interesting approach although I have heard that they are not good performance wise (I am going to test). Given that there would be roughly 50,000 facts loaded per day (so not a huge amount) maybe performance is not an issue. I guess the question is whether using a trigger would give any benefit over using the inserted & deleted tables populated as part of the merge output? I like the idea of populating the effective start & end dates in he history table using the trigger...good call

    Thanks again,

    N

    To respond "directly" to a post, you can just quote it like I did this one.  There is a "quote" selection on the far right of the same line that contains the date of the post.

    As far as performance of triggers go, they've earned a reputation as being poorly performing mostly because people write terrible code for their triggers.  The huge advantage of triggers are that they're automatic and you don't need to include (for example) audit code in all other code.  The trigger will have you covered even if someone forgets.

    Thank Jeff. Just to be inclusive...the only other option that I am aware of is to use Change Tracking to track DML changes on the primary key and then use the CHANGETABLE function to retrieve the historic values and log them. One downfall I guess is that it only tracks the last modified action (e.g. if a record is inserted & then updated & then deleted then only the deleted action is recorded and the previous inserted & updated actions are lost). Just wondering if you have any opinion on Change Tracking as a possible audit solution?

    Thanks again,

    N

     

  • My personal feeling is that CDC is incredible overkill for simple DML logging of historical data both in complexity of use and design as well as causing excessive duplication of data and a shedload of other stuff that goes with it like LSNs, etc, etc.  It even creates a sequence number within transactions.  All of that just isn't necessary for simple logging of historical data.

    If you need to synchronize devices, then I can see the utility in using it for that but not much more.  Even if you have to do column based rather than row based auditing, there are better and much simple/less resource expensive methods than CDC.

    Again, that's just my opinion.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply