Temporal Data

  • Have you experienced the need for analyzing data temporally in your business? Do your existing information management systems allow you to view historical data? Typically, when you update or delete information in any application the previous state of that data is lost forevever. Have you ever wanted to be able to see what your information "used to be" as opposed to what it is now? In addition, have you ever wanted to know the state of your data in the past "as I knew it to be then" in comparison to "as I currently know it to have been then"?

    What is your experience in solving these problems?

    [font="System"]Jay Quincy Allen, Managing Partner, Relational Models LLC[/font]

  • I haven't done this in an OLTP system, but in an ODS environment I have setup the tables to allow me to see what day to day changes have occurred in the data.

    This could be expanded to an OLTP system, but would require some additional work to ensure that response time is not adversely affected and to minimize the effects of locking/blocking on the application.

    What exactly are you attempting to accomplish?

  • Lynn,

    We already have experience accomplishing answers to these questions. This post of more about inviting the participation of others to tell us about their experiences. Memebers of Adama Systems have experience successfully implementing temporal structures for both OLTP and BI/DW. In particular, our approach to BI/DW differs in many ways to the more established "Kimball" or "Inmon" strategies. Problems arise when dealing with temporal data in both relational and dimensional structures, and there are several solutions that are practiced, some good and some not so good. Listening to information management professionals around the world on how they approach these issues helps us to better understand how to communicate our approach to these issues. In some ways, they can be contreversial. We know this to be the case because of our failure to properly communicate these strategies. It is made more difficult by the fact that so many myths surround database deisgn in general, let alone the difficulty of gaining understanding in the community about the importance of a sound architectural approach to solving the problems of storing and analyzing temporal data. As we move forward, posting about specific technical solutions to these problems, we will certainly invite criticism. We view this a "a good thing" since it will hopefully help us to better communicate our ideas.

    [font="System"]Jay Quincy Allen, Managing Partner, Relational Models LLC[/font]

  • JQAllen (2/29/2012)


    Have you experienced the need for analyzing data temporally in your business? Do your existing information management systems allow you to view historical data? Typically, when you update or delete information in any application the previous state of that data is lost forevever. Have you ever wanted to be able to see what your information "used to be" as opposed to what it is now? In addition, have you ever wanted to know the state of your data in the past "as I knew it to be then" in comparison to "as I currently know it to have been then"?

    What is your experience in solving these problems?

    That's why Data Warehouses are created.

    Changes on DIMensional tables are tracked via SCD (Slowly Changing Dimensions) techniques - in general Type 2 SCD does the trick for most scenarios.

    Changes on FACTual tables are tracked using snapshot-fact techniques.

    In both case what happens is that you insert new rows to the target table and timestamp the valid period of each particular row.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Dimensional Models, particularly the SCDs) do not track more than one dimension of time per entity. This is the value of temporal relational modeling. Dimensional systems are typically processed from a relational source, giving you the flexibility of varying time slices.

    [font="System"]Jay Quincy Allen, Managing Partner, Relational Models LLC[/font]

  • Also, datawarehouses, or more precisely, dimensional models are meant to solve aggregation requirements. OLTP requirements are far more complex. Not all database requirements can be met by the dimensional model.

    [font="System"]Jay Quincy Allen, Managing Partner, Relational Models LLC[/font]

  • Depending on the number of changes, how often change happens, and how wide you're tables are, you'd have to implement different methods.

    The only change tracking I've had to implement in an OLTP system deals with settings so it was a fairly simple creation of a Settings_Log table that has a CreatedDateTime and DeletedDateTime. The Settings table has an ID, Name, Value. The Settings_Log table has ID, Name, Value, CreatedDateTime, DeletedDateTime. Whenever a record is updated or deleted, a trigger captures the deleted record and inserts it into the Settings_Log.

    I'd assume you'd do something similar to this in systems where there are small changes (small number of records changed) on narrow tables.

    I've also implemented change tracking in my DW for Type 1 dimensions. For the dimensions, I have a log table with an ID, Description, StartDateTime, EndDateTime, RowCount. The way I maintain history of the dimension is by creating a dimension_Log table with all the same fields and ChangeLogID. The dimension table would have ID, name, descr, etc..., LogID. The log table would have ID, name, descr, etc..., LogID, ChangeLogID. The change tracking logic is the same as with the settings (only updates and deletes). The method to capture this change is however very different. Since the amount of change is potentially a lot bigger I use the output clause during the UPSERT into the dimension and insert the output into the log table.

    This works pretty well, but there is quite a bit of storage wasted. If a dimension has 20 fields, when a single field is updated, the whole record gets logged. The storage waste isn't as much of an issue if the updates happen across most of the fields. Also, this assumes update to the table happens at a set interval versus per change that happens on the source table.

    One option I've read about but never implemented was CDC. I would think Change Data Capture would be something that you'd want to use in most other cases.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply