January 17, 2012 at 9:22 am
Hi,
I’m reviewing a data warehouse and I had a question around one of our fact tables. It’s a periodic snapshot taking data for each account at the end of each day (so a grain of account per day) but the accounts will only be changing once or twice a month, meaning we are repeating the same row many times with a different DateID. Our idea is to treat this as if it is a type 2 slowly changing dimension with start and end date IDs and then use a view to join this to the date dimension.
So, the factAccount table would have the following fields:
AccountID (foreign key to dimAccount)
Other foreign keys to dimensions
All facts
StartDateID
EndDateID
And we would then generate a view over the top using some variant of
SELECT factAccount.*, dimDate.DateID
FROM factAccount
INNER JOIN dimDate
ON dimDate.DateID BETWEEN factAccount.StartDateID and factAccount.EndDateID
This gives us something that is identical to our current factAccount in therms of the data available to query but which only need about 1/10th the number of rows.
Does this seem like a valid design? Has anyone tried this approach before and how did it affect performance?
Thanks
January 17, 2012 at 10:04 am
As far as I can see design solves the issue at hand.
Having said that, let me put my nose where nobody asked me to put it... Are you positive the changing part of the account shouldn't be part of a Dimension rather than part of a Factual table?
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.January 17, 2012 at 10:18 am
Yes, I am currently using a similar type of design in a very specific application where it would not have been feasible (or even possible) to do daily snapshots.
There was an interesting presentation at last year's PASS Summit about this topic, presented by Davide Mauri and you may want to look it up.
I have also blogged about this type of design, and you can have a look at my blog post on "Temporal Fact Tables" if you like. You can find it here: http://www.did-i-say-that.com
Edit: Paul also has a very valid point. If the only thing you want to keep track of is an account's attributes, you may want to consider making these part of the account dimension and tracking it as type-2 changes.
January 18, 2012 at 2:44 pm
Hi Chris, I suggest you specify periods of time as half-open intervals (also called half-closed intervals). That means x is within the valid time period iff (start_time >= x < end_time). The half-open convention makes it easier to derive the starting and ending points without overlaps (the start point of the next interval is the same as the end point of the one immediately previous). It also removes the ambiguity about times that would otherwise fall on or near the boundary, regardless of the precision of the end_time - potentially important if your time periods are specified only as whole days. I would also use dates or datetimes rather than surrogate DateIds though.
Davide Mauri's presentation "Temporal Snapshot Fact Table" as mentioned by Martin is here: http://www.sqlpass.org/summit/2011/Speakers/CallForSpeakers/SessionDetail.aspx?sid=1115
It's worth checking out. He's due to present on the same topic at SQL Bits 10:
http://www.sqlbits.com/Sessions/Event10/Temporal_Snapshot_Fact_Table
All credit to Mr Mauri but what worries me is that something here is being presented as a "new" technique. It isn't. On my bookshelf I have the book "Temporal Data and The Relational Model" by Date, Darwen and Lorentzos, covering the same problems and solutions. It was published nearly 10 years ago and includes a bibliography of more than 100 previous publications. Rick Snodgrass's book "Developing Time Oriented Database Applications in SQL" is even older, has lots of SQL examples and another 100+ references. There is a very significant body of literature on the topic of how to represent and manage temporal data accurately and how to query and analyse it. Anyone engaged in BI or data warehouse design ought to take an interest in it.
Of course it's possible that there is something new here to those familiar only with the meagre set of design patterns and copious buzz-words endorsed by Mr Ralph Kimball. RK's marketing machine no doubt has a lot to answer for. No criticism intended of Davide Mauri who does a very decent job of explaining the topic for his audience.
January 18, 2012 at 3:23 pm
I would agree with David that the topic of temporal data (and how to manage it) is not new. It has been around for a very long time. In my opinion, what is in fact new is its application in a data warehouse fact table.
To my knowledge, very few (if any) resources advocate or explain the application of temporal principles in fact tables similar to that of a type-2 dimension. I have seen many theoretical references on temporal principles, but none seem to connect the dots or articulate a "real world example" for this.
With specific reference to Kimball, the closest any of his papers come to that concept is the "flight recorder" fact table...but I stand to be corrected. The popular answer for these types of problems seem to be snapshots...
I don't think D. Mauri (or anyone else for that matter) is trying to claim new ground here, but emphasis is being placed on it being a plausible design option for fact tables...and I think this is necessary for anybody who wants to know.
With regards to the comment about using an open-ended dates and datetime/date types, I think it depends both on the characteristics of the data and business requirements. We can engage in a very lengthy debate over why it is or isn't a good idea, and I think a lot of it also comes down to personal choice.
For now, we will agree to disagree on that one, but hopefully the OP have some options to consider in order to make an informed decision.
January 18, 2012 at 7:43 pm
I am new to this forum, but specifically interested in this topic.
From my perspective this thread is all are all over the place and a little behind the times.
The specific fact table compression technique being discussed has been around since at least 1998 (that is when Laura Reeves, co-author of one of Kimball’s books, introduced it to me).
At a more general level on the topic of temporal/bitemporal data, it is hard to think where to start !
The summary is that this topic has been debated for many years and considered for inclusion in standard ANSI/ISO SQL for the last 16 years.
But the good news is that temporal/bitemporal data is supported by the newly approved/published latest verions of ansi/iso sql called SQL:2011.
If you want a bit of a history and summary of the support for temporal features in SQL:2011 the following link may be helpful.
http://temporaldata.com/forum_uploads/Temporal_extensions_to_SQL_2011_20120104.pdf
There is a very active discussion on this topic (including status of major dbms vendor’s integration into dbms engines) in the "Temporal Data" LinkedIn group:
http://www.linkedin.com/groups?home=&gid=3885228
If you want some more general information on the topic you may want to reference this series of videos recorded at a presentation I gave to the New Jersey Microsoft SQL Server User group in November 2010.
http://www.youtube.com/watch?v=PuocT5wUgJ4
A more recent summary of where vendors are with adoption of the standard and similar features is provided in this “Bitemporal Data Quarterly Update”.
http://temporaldata.com/bitemporal-data-quarterly-update-4th-quarter-2011/
I do think this is a very important topic.
I have seen some references around to an article that Adam Machanic wrote on this topic. IMHO this article is very limited and uses terminology incorrectly.
I hope this helps and I am very up for being corrected / getting feedback on points above,
craig
January 19, 2012 at 8:07 am
Interesting comments, and thanks for the references Craig.
I will browse through your site in more detail a little later. I understand the notion that this concept isn't new by any means, and I am certainly not trying to lay claim to any of it.
From my perspective though, through my involvement with Microsoft BI solutions which certainly has not been as long as some of you folks, there seems to be a lack of awareness and definitely a lack of workable "real world" solutions or examples.
Again, this is from my perspective only...I am very interested in dimensional modeling and all aspects related to it, and looking forward to continue some of this conversation around temporal/bi-temporal data on another platform maybe.
January 19, 2012 at 10:50 am
Thanks all, very useful replies.
I hadn't thought about taking the fields into a dimension table, but I don't think they would be appropriate. The changing values are things like current balance or arrears balance and seem naturally to be facts to me. Also the table has quite a few dimensions that feed into it to enable us to slice the balances in different ways.
Someone has suggested that we use a bridge table between the fact and the date dimension, so each fact row would have a DateGroupID field and this would join through the bridge to various rows in the date dimension, and we have also been looking at the possibilities of using page compression instead of the temporal snapshot fact, although as the identical rows will not necessarity be near each other in the page structure I'm not sure how much benefit we will get.
January 19, 2012 at 11:00 am
chris.johnson 46127 (1/19/2012)
The changing values are things like current balance or arrears balance and seem naturally to be facts to me.
I remember seeing in a banking oriented DWH a fact_daily_balance fact table with one row per account per day - this fact table can easily be joined to whatever dimensions point to fact_account; dim_date table will help to find the desired row.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.January 19, 2012 at 12:36 pm
Martin,
I think the important thing is that the fact table compression technique and SQL support for temporal data are pretty different topics (related but different).
The “dated” part of the thread I was referring to was actually around the SQL support for temporal data as there is pretty wide spread use of temporal extension built into RDBMS engines at this point and now syntax to support bitemporal data in included in ansi/iso SQL.
More/better documentation on this “Temporal Snapshot Fact Table" or compression technique as I call it is a good thing as I do not think Kimball has written about it so it is in the unofficial dimensional modeling technique list. I have not seen Snodgrass or Date/Darwe/Lorentzos talk about it either (but I could have missed it). But they do talk about a lot of other stuff which is really important especially the stuff which has just been added to ansi/iso SQL and integrated into database engines by major DBMS vendors.
Craig
January 19, 2012 at 12:42 pm
TemporalCraig (1/19/2012)
Martin,I think the important thing is that the fact table compression technique and SQL support for temporal data are pretty different topics (related but different).
The “dated” part of the thread I was referring to was actually around the SQL support for temporal data as there is pretty wide spread use of temporal extension built into RDBMS engines at this point and now syntax to support bitemporal data in included in ansi/iso SQL.
More/better documentation on this “Temporal Snapshot Fact Table" or compression technique as I call it is a good thing as I do not think Kimball has written about it so it is in the unofficial dimensional modeling technique list. I have not seen Snodgrass or Date/Darwe/Lorentzos talk about it either (but I could have missed it). But they do talk about a lot of other stuff which is really important especially the stuff which has just been added to ansi/iso SQL and integrated into database engines by major DBMS vendors.
Craig
True, and I fully agree.
January 20, 2012 at 2:37 am
Paul, that is essentially what we have at the moment, the balance table joins to a few type 2 and type 1 dimensions and forms the heart of our data warehouse. The data we are managing are loans and investments and so the balances will usually update once or twice a month in the case of loans or very rarely in the case of investments. As we have approx 20 balances being kept on the fact table we are looking to save space by turning it into a temporal snapshot fact table but the measures need to remain as facts as that is how they are used throughout our queries.
Viewing 12 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply