Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Slowly Changing Facts Expand / Collapse
Author
Message
Posted Tuesday, December 21, 2010 10:27 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, March 24, 2014 8:30 AM
Points: 18, Visits: 192
Comments posted to this topic are about the item Slowly Changing Facts
Post #1038068
Posted Wednesday, December 22, 2010 4:55 AM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: 2 days ago @ 7:28 AM
Points: 788, Visits: 1,915
Your example is a transaction fact table. While I can see the advantages in this case for auditing, I'm not sure a data warehouse is the best place to do this (although I suppose if that's the business requirement, then it is). A transaction fact table shouldn't have many changes once the record is posted, however, so I'm not sure about the application in this case.

It would perhaps be more interesting to see this technique applied to an accumulating snapshot fact table. These fact tables track an item throughout its lifetime and are normally characterized by multiple date fields which get filled in as the item is updated. In this case, there would be an additional date, and the table would become more like a transaction fact table, except that the entries could then be strung together to trace the history of the item. The record would need some kind of active flag as well because the normal count function would not work in this case.



Post #1038180
Posted Wednesday, December 22, 2010 7:24 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 10:49 AM
Points: 314, Visits: 1,448
RonKyle (12/22/2010)
....The record would need some kind of active flag as well because the normal count function would not work in this case.


Good point. By including "differential" records, counts can get thrown off. In the case of reversing out SalespersonID 101 transactions and attributing to SalespersonID 100, an "active flag" or IsRowCurrent flag would help when counting. But, even this might have limitations, such as the very first example where the sale price was reduced by $20 -- which row gets the active flag? In these cases, then, perhaps a value would need to be allocated between rows that match by SalesPersonID, ProductID, and SaleDate in order to achieve a sum of 1 for each sale?






Post #1038269
Posted Wednesday, December 22, 2010 7:44 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, March 24, 2014 8:30 AM
Points: 18, Visits: 192
RonKyle: That's a very good point...this technique is perhaps most appropriate for an accumulating snapshot table, where by definition each record is changing with time. The boiled-down example I've given here instead is a transaction fact table that's changing, yet not accumulating over time. I've put this technique to use in financial applications, where there are lots of adjustment entries to past data, e.g. late revisions.

Also, the count issue can be accounted for if there a unique identifying attribute added to the incoming records. For example the natural key from the source system plus an identifier of the source system combined would be a unique key candidate. This field wouldn't change over time, so you could run a COUNT DISTINCT on it to get the correct count of unique records.

Peterzeke: marking one record as active defeats the purpose of effective dating -- because if you queried for a partial audit range, the sums wouldn't necessarily equal 1. Again, a correct count is achievable with this technique only if a unique identifying column is sourced from the original feed.
Post #1038283
Posted Wednesday, December 22, 2010 7:55 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Yesterday @ 11:13 AM
Points: 257, Visits: 901
I don't have enough experience to comment on the techniques presented, but I wanted to congratulate you on a well-written article. Some authors expect the pictures should speak for themselves but without operational familiarity with those screens they provide little context; your pictures, code examples and results nicely clarify the writeup. The text itself is very easy to read exchanging too much detail (that would likely not apply in my usage anyway) for a great overview of a technique and how/why it may be applied. That's a nice takeaway.

Though you're still working on this series, I look forward to your next!
Post #1038287
Posted Thursday, December 23, 2010 11:25 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, August 27, 2013 8:33 AM
Points: 2, Visits: 47
Re: "see this technique applied to an accumulating snapshot fact table"
Actually the technique is likely most appropriate for a transaction fact table of the type discussed in the article. By definition an accumulating snapshot fact table has a fixed number of dated buckets into which its facts are recorded. This type of fact table is a natural fit for workflow type situations where there are designated points in the workflow that we're interested in. A transaction fact table, on the other hand, is by definition atomic. The bulk of the transactions are likely never adjusted so even one additional set of attributes to accomodate an adjustment will be sparse. Furthermore one must look up the original fact in an accumulating fact table and update the row. An expensive operation when compared to an insert. The point of throwing the count off is well taken but I think the alternatives given accomodate that difficulty quite well. Nice article. Thanks
Post #1038903
Posted Friday, December 24, 2010 3:12 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 7:37 AM
Points: 860, Visits: 2,323
I've used this method for a number of years, and commonly explain it to business users as a Contra-Correction, which they seem to readily grasp, in one instance a client had an Archive Database that took daily snapshots of about 5000 records, after 3 years it was touching 70GB, when this method was introduced in to a new DW, we cut the database size to around 10GB, and lost none of the history.

This method has its good and bad points, the most obvious flaw is with rapidly changing facts on very large datasets which can cause the data to explode as you tend to be creating 2 new fact records every other day rather than one 'movement' fact record that records the differential, so you need to be careful and limit the monitored fact attributes.

The upside is that you will always be able to view a record AS WAS at a given point in time, as well as AS IS. This is most useful in the financial world where regulatory reporting is required on a monthly basis and you have to be able to prove to auditors the numbers at a given point in time, and saves you having to create an archive silo every month.


_________________________________________________________________________
SSC Guide to Posting and Best Practices
Post #1039171
Posted Tuesday, April 17, 2012 9:46 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, October 28, 2013 1:51 PM
Points: 2, Visits: 8
I dont see any easy way to maintain AVG measures in this model. We need two layers of aggregations.
SUM to construct the full amount and only then AVG. This is not possible for some applications.
Post #1285050
Posted Sunday, April 29, 2012 9:36 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, April 15, 2014 7:11 AM
Points: 1, Visits: 91
Great article. Well written and well demonstrated.

I racked my brain on the best way to achieve snapshot tracking until I came across this. I still have some uncertainties in regards to performance, such as how far to look back and how best to model the data when tracking more than one figure (changing commission rates, product costs, etc) so that reporting queries run optimally. But I see this as a great start.

While it may be more transactional than traditional data warehouses, sometimes rules need to be bent to achieve the business need and this is a good example of just that.

Thanks!
Post #1292253
Posted Thursday, December 13, 2012 4:25 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, December 17, 2012 6:10 AM
Points: 1, Visits: 15
While I totally agree that this is a well documented and well researched article and many of the previous comments are valid, but when dealing with transaction tables there should also be a balance between achieving the business requirement while also implementing a sensible solution.

A transactional fact table in nature will offer you the ability to report AS IS and AS WAS from a financial perspective. It then only comes down to the transaction attributes that the business want to be Type-2 in nature. From a regulatory reporting and audit perspective tracking changes to these attributes and the related contra corrections in the Fact table makes sense, but surely it is more sensible to try and address this at source? i.e. If a any of the transactional attributes change in source they should not be overtyped or script updates applied but there should be a contra correction in source? This way the source will still reconcile with the ledger and the Data Warehouse will also reconcile? Other than the possibility of data explosion depending on the number of Type-2 attributes I also think it is quite dangerous to allow a Warehouse to be essentially masking business process issues.

Can you let me know if you know of any other legitimate reasons for implementing this solution or of any other potential pitfalls you may have seen when implementing it in the past please? Thanks.
Post #1396108
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse