Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

To Date or Not To Date….

 

OK….so I’m in the middle of a very interesting Effective Dating project…basically we’re dealing with some tables that only store current data, but there is a need for future-dated data as well.  Because the base table can’t hold that data, we have to come up with additional tables and a process to store future-dated data and then move it into the base table whenever we get to the future date.  Very cumbersome and fragile, and that’s why we’re looking at this project.

There is a great book on this subject by Richard Snodgrass which you can find at Amazon…this book is highly recommended (read:  required) for anyone looking at Effectively Dating tables.  Trust me, you won’t believe how complicated the subject is…here is the link to the book:

http://www.amazon.com/Developing-Time-Oriented-Database-Applications-Management/dp/1558604367/ref=sr_1_1?ie=UTF8&s=books&qid=1242268718&sr=1-1

There is also a book by C.J. Date on the subject, titled “Temporal Data & the Relational Model”, but I haven’t read it….if anyone has read it please feel free to post your feedback on it!

Anyways, the book makes an interesting case for some major enhancements to the Relational Database products out there, SQL Server included.  Basically, what you need when effectively dating tables is to add the valid date/time period for each row.  So, if you have an e-commerce website, for example, and you want to schedule a sales promotion for 3 days, you would indicate in your PROMOTION table that this particular sales promotion is only valid for 3 days.  The beauty of the solution is that the SQL queries in the application would query the tables by using the system date, as in “give me the promotions that are effective for today (the system date)”…so the website content can change as needed without any manual intervention. 

If the table was not effectively dated, you would have to schedule a job to run at the start date/time of the promotion and make all the data changes right then and there.  Of course, the promotion will probably start at midnight, and something will break when the scheduled job runs.  The promotion won’t show on the website, and the DBA will get called at half-past midnight in order to fix the problem to the tune of frantic curses and imprecations from the pained executive staff…but I won’t elaborate any further on the joys of our profession :-)

Now, how do you specify a valid date/time period for each row?  The obvious answer at first glance is to add 2 date columns:  StartDate and EndDate, and then add them to the Primary Key of the table.  That should work, right?

You won’t believe how off the mark that solution is.  Because what’s happening here is that the database is just storing 2 dates….it’s not really storing a date/time PERIOD (or INTERVAL).  The database doesn’t know that this period should include ALL DATES IN BETWEEN the 2 dates you’re storing.  So it’s possible to insert rows that overlap… as long as the StartDate and EndDate are different the Primary Key constraint will allow the INSERT…but you probably won’t want 2 of the same promotions running at the same time…..

And what about Foreign Keys?  As it turns out, these won’t work either….in our example, if you have a FK in the ORDER table pointing to the PromotionID, well now you can have duplicate promotionIDs in the PROMOTION table.  What if you have a promotion that you run every year for 3 weeks?  You keep the same promotion ID, because the substance of the promotion is the same, but every year you insert a new row with different effective dates.  So, if a customer bought something from PromotionID 6, and you’ve been running that promotion for 5 years, you now have 5 rows with PromotionID = 6 in the PROMOTION table.  The FK can’t handle that…it needs a unique constraint to point to.

So you need lots of constrants or triggers to get the job done….this is what the Snodgrass book deals with…..and by the way, the above is just a delicious sampling of the issues encountered…it’s not an all-inclusive list. 

In response to this, I have entered a suggestion in MS Connect in order to enhance SQL Server to fix this problem.  The solution entails defining a DATE/TIME PERIOD or INTERVAL datatype.  It turns out that the ANSI SQL standard already has this, but it’s not implemented in SQL Server.  Below I pasted the text of the Connect entry…if it sounds interesting to you, please follow this link and vote for it: 

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=432281

 

Hefty subject…I think I should get 3 cents for this one…. :-)

SB

 

 

The text of the MS Connect entry follows:

“Problem:  SQL Server does not have an ANSI INTERVAL data type.  When we create effectively dated tables we have to simulate a time period by using 2 DATETIME columns (StartDate, EndDate).  However, often we don't want to allow 2 rows to have overlapping date ranges and many times we don't want to allow gaps in there either.  All of these validations have to be done in SQL.  It is cumbersome and probably doesn't perform as well as a data type would.

Solution:  Create an 'Interval' or 'Time Period' data type.  It should probably contain 2 values (StartDate, EndDate) stored in the same column and it can then treat all time periods between these 2 dates as a continuous length of time instead of just having 2 separate date columns in the DB. 

It should also contain a special value for 'forever' or 'until the end of time'.  This is heavily used for rows that don't currently have an end date.  NULLs are often used for this, but they create problems such as not being able to participate in a Primary Key and they also are treated differently in SQL and can cause performance problems.  So many times people just use an artificial value like '9999-12-31' for this end date, but this is not standardized, nor is its meaning obvious to the user.

It should have functions or operators to check for the gaps and overlaps. 

It can also contain options to make it more flexible, such as "should the period be inclusive of the start & end dates?" and "allow gaps/overlaps?".

The benefits of this would be that we don't have to code triggers and also the logic would be very neatly encapsulated in the datatype and would not clutter up our SQL excessively.  I imagine it would be faster as well.

Microsoft already did something like this in SQL 2008, where they created the Spatial Data Types.  Maybe some of the existing features developed for Spatial support can be reused for this data type?  Spatial data already has an 'Overlaps' operator, which is similar to what we would need for this data type.”

Comments

Posted by Carl Federl on 18 May 2009

Welcome to the world of temporal data, which as you are discovering, has a very high level of complexity.

Although Dr Snodgrass's book is out of print, he has made it available for download at his home site specifically at  

www.cs.arizona.edu/.../publications.html.  One of the downloads available is named "cdrom" and contains his book in html format, has a lot of links that eases usage, and contains all of the SQL referenced in his book.

Of interest is that some of Dr Snodgrass's Temporal SQL solutions are more complicated than alternative solution and these solutions have been discussed at SQLServerCentrl.com.

Temporal Project is discussed at www.sqlservercentral.com/.../Topic508703-338-1.aspx

Temporal Join (under a topic of "Many to Many join with date ranges") is discussed at www.sqlservercentral.com/.../Topic499787-338-1.aspx

I have read C.J. Date's book “Temporal Data & the Relational Model”, which describes a temporal approach that is different than Dr Snodgrass's approach.   An overview can be found on Hugh Darwen's page at www.dcs.warwick.ac.uk/~hugh.  Specifically, the CS253 Course Material material.

Regarding the Temporal SQL standard, Dr Snodgrass's approach was proposed to be the ANSI SQL standard. However, there were many objections, so the ISO decided not include temporal SQL as part of the proposed standard. Hugh Darwen and C.J. Date have written a review of the SQL Temporal Standard and compared to their approach and the review is available at www.dcs.warwick.ac.uk/~hughTTM/OnTSQL2.pdf.

Regarding your MS Connect suggestion, your request for an interval datatype does not match the SQL Standard definition, which is defined in "Part 2: Foundation (SQL/Foundation)" as

"There are two classes of intervals. One class, called year-month intervals, has an express or implied datetime precision that includes no fields other than YEAR and MONTH, though not both are required. The other class, called day-time intervals, has an express or implied interval precision that can include any fields other than YEAR or MONTH."

Perhaps you should revise your suggestion.  This will be difficult as there are no SQL Standards for Temporal data at this time.

Good luck on you project.

Posted by Carl Federl on 18 May 2009

Welcome to the world of temporal data, which as you are discovering, has a very high level of complexity.

Although Dr Snodgrass's book is out of print, he has made it available for download at his home site specifically at  

www.cs.arizona.edu/.../publications.html.  One of the downloads available is named "cdrom" and contains his book in html format, has a lot of links that eases usage, and contains all of the SQL referenced in his book.

Of interest is that some of Dr Snodgrass's Temporal SQL solutions are more complicated than alternative solution and these solutions have been discussed at SQLServerCentrl.com.

Temporal Project is discussed at www.sqlservercentral.com/.../Topic508703-338-1.aspx

Temporal Join (under a topic of "Many to Many join with date ranges") is discussed at www.sqlservercentral.com/.../Topic499787-338-1.aspx

I have read C.J. Date's book “Temporal Data & the Relational Model”, which describes a temporal approach that is different than Dr Snodgrass's approach.   An overview can be found on Hugh Darwen's page at www.dcs.warwick.ac.uk/~hugh.  Specifically, the CS253 Course Material material.

Regarding the Temporal SQL standard, Dr Snodgrass's approach was proposed to be the ANSI SQL standard. However, there were many objections, so the ISO decided not include temporal SQL as part of the proposed standard. Hugh Darwen and C.J. Date have written a review of the SQL Temporal Standard and compared to their approach and the review is available at www.dcs.warwick.ac.uk/~hughTTM/OnTSQL2.pdf.

Regarding your MS Connect suggestion, your request for an interval datatype does not match the SQL Standard definition, which is defined in "Part 2: Foundation (SQL/Foundation)" as

"There are two classes of intervals. One class, called year-month intervals, has an express or implied datetime precision that includes no fields other than YEAR and MONTH, though not both are required. The other class, called day-time intervals, has an express or implied interval precision that can include any fields other than YEAR or MONTH."

Perhaps you should revise your suggestion.  This will be difficult as there are no SQL Standards for Temporal data at this time.

Good luck on you project.

Posted by Steve Jones on 18 May 2009

Interesting. I think you could turn this into a nice series of posts or articles, and it would be good to see some DDL or ER diagrams to explain how this works. It's a little confusing in the description you have.

Posted by smunson on 18 May 2009

I'm not convinced a separate data type is necessary.  What I don't see here is why you can't build the "re-use" logic into the application that allows a promotion to get into the PROMOTIONS table in the first place.   Surely, one's app could query that table for promos run previously on a given product, or in general, and then allow you to use that data as the basis for a new one...

I guess I just don't see what the big deal is.   I really don't see a need here for triggers or for a series of tricky-dick routines.   Either a promotion applies or it doesn't, and while the rules for any given promotion may be complex, associating a product with a promotion is still the key concept, as any promotion that applies to ALL products can be done at least two ways - as part of a modified or customized checkout process, or as a large number of records in the promotion table.   Using any form of non key-based query where you actually compare the dates in the query doesn't make sense to me.   That should be a decision made by the application instead of by the query.

Using a query to answer a question more easily handled by the application can be an unnecessary additional load on the database.   The likelihood of having to deliver significantly more data for any given product due to multiple promotions seems minimal to me, as one would certainly need to ensure that old promotions are archived or inactivated in a manner allowing for easy exclusion in the query, and by that I mean a method NOT reliant on dates alone.   If you are going to rely on the dates alone, then I think you're over relying on automation, because you are then just heaping the entire data load on the server and leaving capacity as an afterthought, instead of that concern being part and parcel of the design to begin with.

Just my two cents...

Steve

Posted by larry.smith on 18 May 2009

My thought is, you could in your example normalize out the  period of your promotion, such that you have a Promotions table, and a PromotionPeriod table:

CREATE TABLE dbo.PromotionPeriod

(

  PromotionPeriodID int identity(1,1)

  ,PromotionID int

  ,PromotionStartDate datetime

  ,PromotionEndDate datetime

)

You would still need to do either application or trigger logic to make sure no two promotion periods overlap for the same promotion, but it solves your problem with multiple runs of the same promotion.

Posted by randy on 18 May 2009

I prefer creating Timespan's with StartDate and Duration instead Of Start and End dates.  It does provide the same results, but having an explict Duration seems logically more intuitive (to me)for representing a Timespan.

Posted by Manie Verster on 18 May 2009

Excuse me if I sound dumb, but there is a very simple solution to this problem and please tell me if I am wrong. I have a client that runs a Project Management system that we (our company) created for them. A project would have a Start and End date and a ProjectId and other columns that is of no importance here as the key. They could at any time run a report that will give them the projects that are currently active. Once the project has come to it's end it falls of the report and another report, overdue projects, is generated so they can archive the project. The dates is only dates, no time involved and no midnightly callouts. Even though they do not use time in the start and end dates it is a given that the project runs from midnight to midnight for any given start and end dates. Why make it difficult?

Posted by marco on 18 May 2009

Maybe I'm a bit oldschool, but should the database-engine REALLY cater for domain-specific logic (the meaning of dates)?

I believe the role of the database engine is to provide the common functionality for storing and retrieving data. It is the role of the business-logic to assign meaning to this data.

Although time-spans are a very basic concept, the intricate details on how to deal with overlaps, adjacency, inclusive/exclusive, infinity, etc vary from application to application (the fact that someone can write a whole book on just this single topic says enough).

So in my opinion the database should stick to the primitive types and the applications should build their custom types otherwise where do you draw the line? Should a database-engine support a native data-type "customer" which comprises a customer-id, name and address because many applications need a customer-object?)

Posted by Naked Ape on 18 May 2009

I would tend to agree with Marco, should the DB engine cater for domain-specific logic? Overlaps/gaps definitely seem more like a requirement of the application.

Maybe I could be convinced of the need for a Period data-type, but can't say in the last 25 years that I've ever needed one when designing a database - nothing that start and end dates/times won't achieve.

I wouldn also avoid using a Start-Date and Period too, since in some cases I might actually need a Period and End-Date, and maybe I don't even know the period just when something started or ended - so I'm back to using start/end dates again. I just don't see any great benefit in combining them into a single data-type.

I think the vast majority of apps could, and do, manage quite easily without it. Temporal data does get pretty complicated - but then Einsteins' theory of General Relativity makes it all moot anyway.

Posted by Sam Bendayan on 18 May 2009

Steve Jones:  I agree with you, and I think I’m going to start from the beginning and do a series on this.  I guess I wanted to know if there was interest out there for this topic…and I think that question has been answered :-)

smunson:  I’m not sure what you mean by ‘re-use’, but I think you mean that you want to reuse an old row of data?  You can do this, but then you won’t have any history of past promotions and when they applied.  And if that’s OK for your implementation, then you don’ t need Effective Dating at all.  The point I was trying to make is:  how do you structure a table when you need to have historical, current, and future rows in it?  That’s the scenario I’m dealing with.  Once you have to deal with these requirements, then the complications start to appear…

larry.smith:  Yes, you have part of the idea that I’m trying to convey…but the missing part is:  why should we need to write all this trigger or constraint logic by hand?  I believe that a new datatype is in order that handles these functions seamlessly so we don’t have to.  That’s the rationale behind the INTERVAL datatype.

randy:  What you mentioned is another option for dealing with Effective Dating…using a StartDate and Duration instead of a StartDate and an EndDate.  I went with the 2 dates because I thought it would be simpler to do a BETWEEN instead of having to calculate the EndDate all the time.  Although you could do this with a computed column to simplify things, it would take up more CPU.  I guess I prefer storing the dates because the most common queries in my scenario call for a BETWEEN and I have little use for the Duration…although I could calculate that if necessary as well.

Manie Verster:  Sure, you can run reports on the table that has the 2 dates and things appear to work fine, but how would you prevent 2 projects from overlapping?  And how would you handle Referential Integrity from other tables?  How about gaps in the dates…would you allow or disallow them?  There are a whole host of issues that appear when you try to incorporate Effective Dating, and this was the point of the article.  Many of these problems have to be solved by programming, and my argument is that with an INTERVAL datatype this could be built into the Database Engine and make things much simpler.

As to your comment about the time, it depends on the granularity of time in the requirements.  If you only need to keep track of DATES without a TIME, then you don’t need the time in there as it will only sow confusion.  But there are scenarios where you may want rows to appear or disappear in the middle of the day, and you would need the TIME component for that.

marco:  I’m not sure I agree with the Database sticking to primitive types.  After all, what is a ‘primitive’ type?  Years ago you could have said that the DATETIME datatype was very complex and should not be handled in the database either.  Yet here it is and it’s been a resounding success.  Recently, SQL 2005 and 2008 have come out with several new datatypes based on the CLR integration that store more complex information and they have created some very elegant solutions with these new datatypes.  Check out the Spatial and Hierarchy datatypes in SQL 2008, for example.  So DB vendors are constantly pushing back the limits of what is ‘too complex’ to be represented as a datatype.  The question that remains is:  will this happen for date/time intervals?  Maybe we’re not at the point yet where we know enough about it to standardize it and encapsulate it into a datatype.    But maybe we are….the INTERVAL datatype would be complex, and there would probably be many attributes that you would be able to set to ‘customize’ it for your purposes, but in the end we are dealing with date periods, which are a very common component of everyday life, so they should be encapsulated in a datatype if possible.

Thanks to all for the feedback!

Posted by brian.leuschen on 19 May 2009

What's wrong with having just one DateTime column that is the "EffectiveDate".  Each row specifies what the price for the product is starting at a particular day-time.  

The key for the promotional table should be effectivedate and productID.

In your app, when someone creates a new promotion, insert two rows.  One for the start of the promotion and one for the end of the promotion...Actually the first row stating at x date product A will start being c dollars.  The second row stating that at y date product A will start being f dollars. You make the app smart enough to use the prepromotinal price for "f"

This would give the following benefits

   There would/could only be one price(row) for any product at any given point in time.

   It would keep the history of different prices for the product

   You could future date rows and they would not "take effect" until their time came.

To pull data out you would just do something like:

select P.productID, P.productPrice

From Promotions P

WHERE P.effectiveDate =

(

   select MAX(P1.effectiveDate)

   FROM Promotions P1

   WHERE P1.effectiveDate <= getDate()

   AND P1.productID = P.productID

)

Posted by randy on 19 May 2009

I agree Sam that there are multiple approaches, and "best" is really determined buy the application.  In my case I'm building project planning and slotting software and duration is important.  To move a task I just move the start date.  I do have the overhead of a calculated field for end date, but my applications are so small and have a dedicated server; performance is not an issue.

Re: What a database should do.  It has seemed to me that the database world somewhat stalled out in the early 90's late 80’s.  Many core principles no longer apply.  I was around when every bit counted.  In the late 70's we used to use bits in bytes with bitmasks to store information, word or byte alignment was a critical; I haven't used a bit mask to store a Boolean or multi-bit codes in a very long time.  Computing power is almost never a concern except for the largest applications, as hardware is almost “free”; I just bought a new 1T drive for $80, 8G (high speed) memory for $120, quad processor -$120.  PC, not a “server”, but way more powerful than most srvers only a few years ago and  price drops are similar.  (Anyone remember when 1M of memory cost over $150, and a 10M HD was “huge”?)  We are still solving the same business automation problems we had in the 60’s, so I don’t understand how performance is still an issue for most cases.  I would also venture that one of the best things Web applications have given developers is incredibly lowered expectations.  In about 10 years I have seen performance criteria go from milliseconds to “best effort”.  That gives the database much more (relative) capacity.  My issue with the DB world is we haven’t really moved far (comparatively v application software) since the 80’s.  The fact that Objects are not in the DB vocabulary is (IMHO) a disgrace.  Though there were OODB attempts; anyone remember Opal?  Every time I create O/R mappings I curse the DB companies and DB vs.OO conventions, as O/R mappings are often time consuming and fragile.  But I digress, sorry.  

Anyway, my opinion is that a database should be able to store data at the level of the business object domain.  Or at least map to known external data types.  For example C#/.Net has a Timespan class.  I believe that SQL Server should support the Timespan and other “Native Type” objects with all operations and attributes.  At a minimum a preprocessor could be (or has been?) developed to handle objects; let it do the work.  I guess my main point is -- Do we really care how the data is stored if we hit our performance goals and ease of use is improved?  As performance continues to increase and prices fall, hopefully O/R mapping will soon go the way of bit mapping, and databases will present to the developers the capability to pass complex objects to the database front end.  SOA is now changing database design paradigms as much as warehousing brought dimensional modeling.  Native handling of complex objects should be something db developers should be screaming for (IMHO).  Anyway, too long of a post, sorry.  

Posted by Carl Federl on 20 May 2009

Welcome to the world of temporal data, which as you are discovering, has a very high level of complexity.

Although Dr Snodgrass's book is out of print, he has made it available for download at his home site at  www.cs.arizona.edu/.../publications.html.  One of the downloads available is named "cdrom" and contains his book in html format, has a lot of links that eases usage, and contains all of the SQL referenced in his book.

Of interest is that some of Dr Snodgrass's Temporal SQL solutions are more complicated than alternative solution and these solutions have been discussed on SQLServerCentrl.com.

Temporal Project is discussed at www.sqlservercentral.com/.../Topic508703-338-1.aspx

Temporal Join (under a topic of "Many to Many join with date ranges") is discussed at www.sqlservercentral.com/.../Topic499787-338-1.aspx

I have read C.J. Date's book “Temporal Data & the Relational Model”, which describes a temporal approach that is different than Dr Snodgrass's approach.   An overview can be found on Hugh Darwen's page at www.dcs.warwick.ac.uk/~hugh.  Specifically, the CS253 Course Material material.

Regarding the Temporal SQL standard, Dr Snodgrass's approach was proposed to be the ANSI SQL standard. However, there were many objections, so the ISO decided not include temporal SQL as part of the proposed standard. Hugh Darwen and C.J. Date have written a review of the SQL Temporal Standard and compared to their approach and the review is available at www.dcs.warwick.ac.uk/~hughTTM/OnTSQL2.pdf.

Regarding your MS Connect suggestion, your request for an interval datatype does not match the SQL Standard definition, which is defined in "Part 2: Foundation (SQL/Foundation)" as

"There are two classes of intervals. One class, called year-month intervals, has an express or implied datetime precision that includes no fields other than YEAR and MONTH, though not both are required. The other class, called day-time intervals, has an express or implied interval precision that can include any fields other than YEAR or MONTH."

Perhaps you should revise your suggestion.  This will be difficult as there are no SQL Standards for Temporal data at this time.

Good luck on you project.

Leave a Comment

Please register or log in to leave a comment.