Slowly Changing Facts

  • Comments posted to this topic are about the item Slowly Changing Facts

  • Your example is a transaction fact table. While I can see the advantages in this case for auditing, I'm not sure a data warehouse is the best place to do this (although I suppose if that's the business requirement, then it is). A transaction fact table shouldn't have many changes once the record is posted, however, so I'm not sure about the application in this case.

    It would perhaps be more interesting to see this technique applied to an accumulating snapshot fact table. These fact tables track an item throughout its lifetime and are normally characterized by multiple date fields which get filled in as the item is updated. In this case, there would be an additional date, and the table would become more like a transaction fact table, except that the entries could then be strung together to trace the history of the item. The record would need some kind of active flag as well because the normal count function would not work in this case.

  • RonKyle (12/22/2010)


    ....The record would need some kind of active flag as well because the normal count function would not work in this case.

    Good point. By including "differential" records, counts can get thrown off. In the case of reversing out SalespersonID 101 transactions and attributing to SalespersonID 100, an "active flag" or IsRowCurrent flag would help when counting. But, even this might have limitations, such as the very first example where the sale price was reduced by $20 -- which row gets the active flag? In these cases, then, perhaps a value would need to be allocated between rows that match by SalesPersonID, ProductID, and SaleDate in order to achieve a sum of 1 for each sale?

  • RonKyle: That's a very good point...this technique is perhaps most appropriate for an accumulating snapshot table, where by definition each record is changing with time. The boiled-down example I've given here instead is a transaction fact table that's changing, yet not accumulating over time. I've put this technique to use in financial applications, where there are lots of adjustment entries to past data, e.g. late revisions.

    Also, the count issue can be accounted for if there a unique identifying attribute added to the incoming records. For example the natural key from the source system plus an identifier of the source system combined would be a unique key candidate. This field wouldn't change over time, so you could run a COUNT DISTINCT on it to get the correct count of unique records.

    Peterzeke: marking one record as active defeats the purpose of effective dating -- because if you queried for a partial audit range, the sums wouldn't necessarily equal 1. Again, a correct count is achievable with this technique only if a unique identifying column is sourced from the original feed.

  • I don't have enough experience to comment on the techniques presented, but I wanted to congratulate you on a well-written article. Some authors expect the pictures should speak for themselves but without operational familiarity with those screens they provide little context; your pictures, code examples and results nicely clarify the writeup. The text itself is very easy to read exchanging too much detail (that would likely not apply in my usage anyway) for a great overview of a technique and how/why it may be applied. That's a nice takeaway.

    Though you're still working on this series, I look forward to your next!

  • Re: "see this technique applied to an accumulating snapshot fact table"

    Actually the technique is likely most appropriate for a transaction fact table of the type discussed in the article. By definition an accumulating snapshot fact table has a fixed number of dated buckets into which its facts are recorded. This type of fact table is a natural fit for workflow type situations where there are designated points in the workflow that we're interested in. A transaction fact table, on the other hand, is by definition atomic. The bulk of the transactions are likely never adjusted so even one additional set of attributes to accomodate an adjustment will be sparse. Furthermore one must look up the original fact in an accumulating fact table and update the row. An expensive operation when compared to an insert. The point of throwing the count off is well taken but I think the alternatives given accomodate that difficulty quite well. Nice article. Thanks

  • I've used this method for a number of years, and commonly explain it to business users as a Contra-Correction, which they seem to readily grasp, in one instance a client had an Archive Database that took daily snapshots of about 5000 records, after 3 years it was touching 70GB, when this method was introduced in to a new DW, we cut the database size to around 10GB, and lost none of the history.

    This method has its good and bad points, the most obvious flaw is with rapidly changing facts on very large datasets which can cause the data to explode as you tend to be creating 2 new fact records every other day rather than one 'movement' fact record that records the differential, so you need to be careful and limit the monitored fact attributes.

    The upside is that you will always be able to view a record AS WAS at a given point in time, as well as AS IS. This is most useful in the financial world where regulatory reporting is required on a monthly basis and you have to be able to prove to auditors the numbers at a given point in time, and saves you having to create an archive silo every month.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • I dont see any easy way to maintain AVG measures in this model. We need two layers of aggregations.

    SUM to construct the full amount and only then AVG. This is not possible for some applications.

  • Great article. Well written and well demonstrated.

    I racked my brain on the best way to achieve snapshot tracking until I came across this. I still have some uncertainties in regards to performance, such as how far to look back and how best to model the data when tracking more than one figure (changing commission rates, product costs, etc) so that reporting queries run optimally. But I see this as a great start.

    While it may be more transactional than traditional data warehouses, sometimes rules need to be bent to achieve the business need and this is a good example of just that.

    Thanks!

  • While I totally agree that this is a well documented and well researched article and many of the previous comments are valid, but when dealing with transaction tables there should also be a balance between achieving the business requirement while also implementing a sensible solution.

    A transactional fact table in nature will offer you the ability to report AS IS and AS WAS from a financial perspective. It then only comes down to the transaction attributes that the business want to be Type-2 in nature. From a regulatory reporting and audit perspective tracking changes to these attributes and the related contra corrections in the Fact table makes sense, but surely it is more sensible to try and address this at source? i.e. If a any of the transactional attributes change in source they should not be overtyped or script updates applied but there should be a contra correction in source? This way the source will still reconcile with the ledger and the Data Warehouse will also reconcile? Other than the possibility of data explosion depending on the number of Type-2 attributes I also think it is quite dangerous to allow a Warehouse to be essentially masking business process issues.

    Can you let me know if you know of any other legitimate reasons for implementing this solution or of any other potential pitfalls you may have seen when implementing it in the past please? Thanks.

  • Please please please refer to these sorts of articles as 'techniques' or 'methods' or 'solutions' -valuable mind you, but 'design pattern' mens something quite specific, and strategies for using sql and related tools ain't it

    A minor point, but not everything is a 'design pattern', nor does it need to be to be useful.

    Just trying to prevent more abuse of the language...

  • Rexell (12/13/2012)


    Can you let me know if you know of any other legitimate reasons for implementing this solution or of any other potential pitfalls you may have seen when implementing it in the past please? Thanks.

    I can tell you this kind of type 1 and type 2 transactional fact table data update collection is so popular now that M$ put CDC into SQL 2008 to make it easier to track and load into Data Warehouses and what I like to call Data Audit Warehouses.

    Of course most anyone that is not the lead DBA or in executive management will see the goodness the provides.

  • markoos1 - Great article. That is all 😉

    😎

  • psychodba (4/17/2012)


    I dont see any easy way to maintain AVG measures in this model. We need two layers of aggregations.

    SUM to construct the full amount and only then AVG. This is not possible for some applications.

    In what way do you see this? Sorry but I cant see your point.

  • Good article.

    There is no "right" or "wrong" way to handle this; however, this methodology (ledgering changes) has two main advantages:

    1. Complete auditability and data transparency.

    2. Allows for incremental processing of multidimensional Analysis Services cubes instead of an expensive full reprocessing of a cube.

    No solution is without risk.

    Steven Neumersky, CBIP, MCITP

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply