Temporal SQL Server Databases

Question

Temporal SQL Server Databases

Randy AV

Old Hand

Points: 358
More actions
October 18, 2014 at 3:06 pm

#300247

I am one of the authors of "Managing Time in Relational Databases" published by Morgan Kaufmann and reviewed by Joe Celko. While writing the book, we also created a Temporal framework for SQL Server which maintains Temporality (Bi-Temporal) in SQL Server Databases. It supports Temporal Entity Integrity(TEI), Temporal Uniqueness(TU) and Temporal Referential Integrity(TRI). It is closely aligned with the Temporal SQL Standards for Temporal Updates, Deletes and Queries, hiding most of the complexity from the typical programmer.
See http://www.assertedversioning.com for an explanation of Temporal Databases, examples and a SQL Server demo of this framework.
We have just completed a utility that helps convert a typical, existing SQL Server Database into a Temporalized Database, including all of the Framework Code that transparently maintains Temporality (TEI, TU, TRI). After the conversion, most of the existing application code works as-is, but now data can be queried as it appeared at any point-in-time and at any effective time with a few simple parameters. It is really, really cool in my humble opinion..
We also have two patents in this area covering 42 patent claims, including techniques to improve performance of temporal systems and future assertions. SQL Server seems to be way behind Teradata, Oracle and DB2 in this area, and this framework and utility could quickly bring this functionality to SQL Server installations. Temporal DB Extensions make it a lot easier to model data because they can model data and relationships at one point in time rather then trying to model across multiple dimensions of time (e.g. DataVault, Anchor, etc.).
I am trying to determine if this would be a product that DBAs, Modelers and maybe Business Users might be interested in, and if we should consider packaging it as a tool, or combine it as a consulting engagement.
I'd appreciate any feedback, opinions and ideas on this topic.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398680 More actions · Answer 1

Sounds useful to me. I worked for an insurance company where we had to maintain our data in just such a manner with differing versions of data with differing effective dates. We managed to build a system that worked very well, but it took a lot of time and effort. Picking one up off the shelf would have been handy.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1003861 More actions · Answer 2

Randy AV (10/18/2014)
I am one of the authors of "Managing Time in Relational Databases" published by Morgan Kaufmann and reviewed by Joe Celko. While writing the book, we also created a Temporal framework for SQL Server which maintains Temporality (Bi-Temporal) in SQL Server Databases. It supports Temporal Entity Integrity(TEI), Temporal Uniqueness(TU) and Temporal Referential Integrity(TRI). It is closely aligned with the Temporal SQL Standards for Temporal Updates, Deletes and Queries, hiding most of the complexity from the typical programmer.
See http://www.assertedversioning.com for an explanation of Temporal Databases, examples and a SQL Server demo of this framework.
We have just completed a utility that helps convert a typical, existing SQL Server Database into a Temporalized Database, including all of the Framework Code that transparently maintains Temporality (TEI, TU, TRI). After the conversion, most of the existing application code works as-is, but now data can be queried as it appeared at any point-in-time and at any effective time with a few simple parameters. It is really, really cool in my humble opinion..
We also have two patents in this area covering 42 patent claims, including techniques to improve performance of temporal systems and future assertions. SQL Server seems to be way behind Teradata, Oracle and DB2 in this area, and this framework and utility could quickly bring this functionality to SQL Server installations. Temporal DB Extensions make it a lot easier to model data because they can model data and relationships at one point in time rather then trying to model across multiple dimensions of time (e.g. DataVault, Anchor, etc.).
I am trying to determine if this would be a product that DBAs, Modelers and maybe Business Users might be interested in, and if we should consider packaging it as a tool, or combine it as a consulting engagement.
I'd appreciate any feedback, opinions and ideas on this topic.

It would be interesting to have a look at the patents to see if they contain any prior art.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ian Massi SSCertifiable Points: 5931 More actions · Answer 3

I've added effective dates to one of the fact tables in my data warehouse to allow reporting using different points in time. The application that generates the data for this only deals with history in a limited way, but I could see how an off the shelf solution to handle temporal stuff cleanly would easily be worth many developer-hours.

Jeff Moden SSC Guru Points: 1003861 More actions · Answer 4

Ian Massi (10/20/2014)
I've added effective dates to one of the fact tables in my data warehouse to allow reporting using different points in time. The application that generates the data for this only deals with history in a limited way, but I could see how an off the shelf solution to handle temporal stuff cleanly would easily be worth many developer-hours.

I guess I don't understand the problems people are having with temporal data. Type 2 SCDs are easy to setup, implement, and use. As with certain COTS automated testing systems, I wonder if the cure might be worse that the original problem. Even hierarchical data is fairly easy to query for Point-In-Time results using Type 2 SCDs.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sqlvogel SSCrazy Eights Points: 9526 More actions · Answer 5

Jeff Moden (10/20/2014)
I guess I don't understand the problems people are having with temporal data. Type 2 SCDs are easy to setup, implement, and use. As with certain COTS automated testing systems, I wonder if the cure might be worse that the original problem. Even hierarchical data is fairly easy to query for Point-In-Time results using Type 2 SCDs.

Kimball's "Type 2" pattern tracks changes to attributes of a dimension - apparently not what Ian was trying to achieve. Type 2 is essentially a uni-temporal model only. Because the dimension is usually assumed to be denormalized much of the temporal data that might have been preserved from source is normally lost.

The Type 2 model (at least as described by RK and as often implemented) is also unsatisfactory when it comes loading historical changes. Implementations and ETL tools that support Type 2 often just assume newly arriving dimension data is always dated at or after the latest (i.e. "current") row in the dimension.

Interestingly, another limitation of Kimball's Type 2 is also shared by Johnston and Weis's Asserted Versioning approach. Both Type 2 and AV assume only one business key per temporal table (or dimension). A truly general purpose temporal model ought to preserve multiple keys where necessary. Maintaining more than one business key per table is only slightly more tricky than dealing with a single key and of course it does generalise: solve the problem for N keys and you've solved it for just the one.

I hope Randy will correct me if I'm wrong about Asserted Versioning. I enjoyed his book very much and I recommend it, but it has been a while since I read it.

Jeff Moden SSC Guru Points: 1003861 More actions · Answer 6

sqlvogel (10/21/2014)
Jeff Moden (10/20/2014)
I guess I don't understand the problems people are having with temporal data. Type 2 SCDs are easy to setup, implement, and use. As with certain COTS automated testing systems, I wonder if the cure might be worse that the original problem. Even hierarchical data is fairly easy to query for Point-In-Time results using Type 2 SCDs.
Kimball's "Type 2" pattern tracks changes to attributes of a dimension - apparently not what Ian was trying to achieve. Type 2 is essentially a uni-temporal model only. Because the dimension is usually assumed to be denormalized much of the temporal data that might have been preserved from source is normally lost.
The Type 2 model (at least as described by RK and as often implemented) is also unsatisfactory when it comes loading historical changes. Implementations and ETL tools that support Type 2 often just assume newly arriving dimension data is always dated at or after the latest (i.e. "current") row in the dimension.
Interestingly, another limitation of Kimball's Type 2 is also shared by Johnston and Weis's Asserted Versioning approach. Both Type 2 and AV assume only one business key per temporal table (or dimension). A truly general purpose temporal model ought to preserve multiple keys where necessary. Maintaining more than one business key per table is only slightly more tricky than dealing with a single key and of course it does generalise: solve the problem for N keys and you've solved it for just the one.
I hope Randy will correct me if I'm wrong about Asserted Versioning. I enjoyed his book very much and I recommend it, but it has been a while since I read it.

Thanks for the info. It does, however, support my quandry. I'm certainly not an expert in temporal science but I don't understand why so many have made it so difficult. It may be that I'm doing something wrong but I just haven't had any of the problems you've sighted above with temporal data especially with TYPE 2 SCDs.

I will admit that it does take a bit of planning on my part and a "general purpose" solution would eliminate that need but I've also found that just about anything that is truly "general purpose" frequently suffers in the area of performance and/or resource usage for the sake of flexibility. A great example of that is SQL Server itself.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ian Massi SSCertifiable Points: 5931 More actions · Answer 7

Yeah in my case at first I tried to tackle the fact table like I would a Type 2 SCD. Part of my problem was that the data was spread among 6 different tables which I've been told the original solution "architect" (saboteur?) suggested in order to improve performance. There are conflicts between them and a lot of updates that don't align. The system schedules work to employees and there's more than one record per employee per day. Should be pretty simple but it certainly isn't. This architect was also very liberal in his use of dynamic SQL and cursors :pinch:. Thankfully the dev team is doing a rewrite at the moment. The tables will stay pretty much the same but no current stored procedure will continue to be in existence after the rewrite. All cursors and dynamic SQL will be eliminated... I hope.

Anyway, one of the things that was troublesome was when we had changes to the dimension tables without a change to an employee's schedule. Once entered, a record is "current" (like type 2 SCD) and if it doesn't change from the time it's entered until the end of time it will continue to be "current", but if a dimension changes (say the employee is now in a new team) but there were no other changes to his schedule, then there wouldn't be an update to force the update to the fact table. Since the schedule happens in the future, this would be problematic. I also had a fair bit of trouble trying to link together previous records, but that could be related to the data.

Due to the data quality issues, I didn't bother capturing all historical changes and just put in end of day changes, using the appropriate source table. Also, on the day after the employee's scheduled day, I update all the records for that day in the fact table, even if there were no changes. Then at least the keys to the dimensions are current as of that day, which is all that really matters. I could always add forced updates when certain dimension tables are updated but for our purposes I seem to be alright.

I can imagine some people having more complicated temporal scenarios than this one. At least someone out there is thinking 4th dimensionally.

Randy AV Old Hand Points: 358 More actions · Answer 8

Thanks for the Feedback. Most of my background is in the Insurance biz. Proper Temporal management and auditability is key in that Industry (e.g. for Policy/Eligibility Dates, Claim Service Dates, Renewal Dates, Provider Contract Dates, etc.). Having a Canonical Model, Framework and approach is important to keeping the somewhat complex logic easy to implement and maintain.

Randy AV Old Hand Points: 358 More actions · Answer 9

We did a very deep dive into looking at Prior Art as did the USPTO examiner. A lot of the prior art was ours, based on the dozens of articles we wrote before the 1 year cut-off. So, we had to exclude a lot of what we would have liked to patent. The examiner referenced a few other items we discussed and sometimes needed to adjust. One of the leaders in this space, Dr. Richard Snodgrass, had a lot of the basic concepts documented in the early 90's, but there wasn't a lot more. We added a lot in regards to Real Time TEI, TRI, performance and Future Assertions (sandbox) processes. I did my first bi-temporal implementation in 1995. I wasn't even aware of Rick's work until the mid-2000s. Rick was one of the reviewers of our book. He is a very bright guy, and I wish we could have patented his ideas. 🙂

Randy AV Old Hand Points: 358 More actions · Answer 10

I appreciate the comment. Yes, that's where the big savings are at -- in the developer time. Most shops that try to implement any kind of temporality do it (differently) table by table. Not only do they spend a lot of time doing it, but they often get it wrong, or a new programmer comes in and does it differently. So, there are often different approaches spread across an organization.

I lot of the data quality issues we see are often traced back to incorrect temporal management.

By doing this at the DBMS Insert/Update/Delete level, even ad-hoc types of SQL data updates maintain the temporality in the tables the same way, and 100% of the time. It can be used for OLTP and SCDs in dimensional models. Not only for History, but also for future effectively and future assertions.

Randy AV Old Hand Points: 358 More actions · Answer 11

In most OLTP and DW systems maintaining bi-temporality table by table is time consuming and error prone. It can easily be done for a few tables, but many of the organizations I have worked in have hundreds and even thousands of tables. I have a deck that I prepared for a client showing the difference in Modeling, Data Population logic, Update SQL (I,U,D) and Querying various models, such as Anchor and DataVault, and comparing those to the effort with built-in temporal support, and the difference is substantial. So, maybe in a small shop or an application with a small number of tables, the effort may not be as significant, but in databases with lots of tables, it becomes significant; not only in my opinion, but in the opinion of many others.

However, I appreciate your response because there may be some home-grown solutions that are easier to implement and maintain, and it's good to be aware of those.

Randy AV Old Hand Points: 358 More actions · Answer 12

David, Thanks for your help in clarifying this and for your endorsement of our book. Tom wrote a follow-up book titled "Bitemporal Data - Theory and Practice" where he gets into more details about SCDs and other items that might help in understanding this.

So, in our book and in the SQL Server Framework we propose using "Object IDs" which might be considered something like an Anchor_Id in an Anchor model to identify an object, where it represents one Entity (or relationship), irrespective of time. So, a Person would only have one Object ID even though it might have many versions. Now, a person might have a Business Key such as SSN, and we typically enforce the uniqueness of that BK at any "one" instance in time. However, we allow someone to assign a new BK to an Object at a new point in time. So, say my Credit Card Acct OBID is 123 and my Credit Card# was my BK, but they change my CC because of a breach. I can assign the new CC# BK to OBID 123 as-of a certain date. We don't allow two BKs at the same time as part of the framework, and we treat that BK like a non-temporal primary key for backward compatibility. However, if having two active at one time was a requirement, a child table could be created with a list of CC#s assigned to a parent OBID. Thanks again!

Jeff Moden SSC Guru Points: 1003861 More actions · Answer 13

Randy AV (10/24/2014)
We did a very deep dive into looking at Prior Art as did the USPTO examiner. A lot of the prior art was ours, based on the dozens of articles we wrote before the 1 year cut-off. So, we had to exclude a lot of what we would have liked to patent. The examiner referenced a few other items we discussed and sometimes needed to adjust. One of the leaders in this space, Dr. Richard Snodgrass, had a lot of the basic concepts documented in the early 90's, but there wasn't a lot more. We added a lot in regards to Real Time TEI, TRI, performance and Future Assertions (sandbox) processes. I did my first bi-temporal implementation in 1995. I wasn't even aware of Rick's work until the mid-2000s. Rick was one of the reviewers of our book. He is a very bright guy, and I wish we could have patented his ideas. 🙂

What are the patent numbers?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jrstern SSC Enthusiast Points: 102 More actions · Answer 14

Well here's the thing, given that Dr. Snodgrass did publish his stuff around 1995, I'm not sure what you could find to patent. I was building an application back then and discovering all this temporal hoo-hah from scratch, and remember coming home and finding that nice dark blue SIGMOD issue in my mailbox and dragging it to work to show people. I'd already worked out the relevant algebras and was painfully adding the required date fields and SQL code. Ugh.

From that moment on it was horribly obvious to me that the only way this should ever be done is integrated in the engine, so you just add a property to the table, like "temporal='on'", and the rest should be automagic, except for the new temporal keyword syntax added to SQL. My stuff largely agreed with Snodgrass', I recall I had some differences at least in terminology. And there are always questions about joining temporal tables to non-temporal tables, there are different "right" answers depending on assumptions.

I'm sadly unfamiliar with what other vendors have implemented (or when), but I think it's pretty much obvious what is needed and didn't take me all that long to do from scratch - in theory. And in practice it was just too ugly to do, outside of the engine. And it would be no walk in the park even inside the engine. So we're stuck with SCD and the like.

In fact, there is a version of this now approved as part of ANSI SQL, right?

http://en.wikipedia.org/wiki/Temporal_database#History

I disagree with this violently, I want hidden fields (and of course some way to inspect them!).