Sins of SQL: The Time Table

  • Besides the many issues raised about this approach, another issue not addressed by this approach is auditing. The Time dim (the most important dimension, IMO) is dynamic. It's members are ephemeral and only created on the fly. There's nothing to audit.

    Auditing is less important when using when using a simple "textbook" example like the one chosen. OTOH, real-world Time dims can be much more robust and will take time to create, audit, and maintain.

    IMO, this is an example of a nice theoretical discussion of what you CAN do but probably should NOT do.

    Regards,

    Dan.

    One star.

  • I agree, in theory the concept is interesting but in practice, it's well... impractical in my opinion. Simple SSIS lookup operations etc. work really well with dimensions, especially small ones.

    I have read the same kind of arguments here a while ago: http://forum.kimballgroup.com/dimensional-modeling-and-data-architecture-f6/eliminate-date-dimension-surrogate-key-t334.htm

    While it was an interesting read, I think these dimensions are very valuable nonetheless. I can easily ask what activity happened between hours etc. for all mondays etc. and use indexes while doing it. I know in Oracle for example you can index on a function but still...

    Anyways, I think there are arguments for and against. But I think that providing both is a good thing. I can't really imagine somebody would put a date dimension and time dimension key in a fact without the datetime itself so having all of them kinda covers most scenarios I can imagine.

  • Yes CTEs have their role but I'm wondering if it's a replacement in an highly optimized star schema warehousing solution.

    On the plus side, the date format in the CTE function can take any number of formats -- a wonderful feature for ETL applications. I use this method in almost all ETL code.

  • For day-specific values I have for years helped clients refactor their report queries to completely eliminate the need for a time table altogether. Properly formated integers can be used to allow very fast date manipulations/filtering without the need for joining to the time dimension, leading to tremendous speed improvements.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I am of the same thinking as others on this one. I think I would use the numbers table, or a calendar table in lieu of the cte approach. In my experiences, the calendar table outperformed the recursive CTE on a consistent basis.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Hi,

    I'm sure nobody denies this. But when dealing with inexperienced users or when you need to describe things further than by just providing a strict hour or day measure, you might need a dimension nonetheless. Note that you can have both so that your surrogate key is intelligent but can also be used to lookup descriptive values in your date/time dimensions. This way you accomodate both needs I think 🙂

  • Martin Mason (6/1/2010)


    I couldn't disagree with an article more. The author should read Chapter 2 of Kimball's The Data Warehouse Toolkit for a proper way to handle dates and times in a dimensional model. If I could give zero stars, I would.

    Not everyone is an expert at every part of SQL Server and while this persons article does not show the best approach to this, it is sufficient enough to point this out and provide alternate suggestions and or links as other have. It’s not necessary to comment that you'd like to give 0 stars if you could; that not being constructively critical of an article but just plain mean and rude.

    Thanks to comments like these this person may in the future come across some tidbit of insight to something that they would have shared but now won’t for fear of ridicule and prosecution by posters who like to make “I’d like to give 0 stars” comments. Please keep criticism to a constructive level.

    Kindest Regards,

    Just say No to Facebook!
  • I think this article shows very well that there are many ways to get to a result, each with their own advantages and inconvenients :).

    And this is no surprise 😉

  • We also use a date dimension with a human readable key and a time dimension with a human readable key. In some cases, the time dimension is not used with the granularity to the second - just to the minute or 5 minutes. Because we want to preserve precision in detail but it is unnecessary for most analysis, we tend to store the datetime of the transaction in addition to the dimension key. We do not require any lookups in order to generate our keys, but lookups into tables with a few tens of thousands of rows is not normally a problem.

    We often tend to not join to the date dimension or the time dimension in many of our normal SQL operations, often relying on the surrogate id's value, so the overhead of retrieving the further functionality the dimension tables provide is not always incurred.

    In addition, I strongly believe that working with the 32-bit int for date is often much quicker than full datetime operations, and an index on such a column is also narrower.

    I strongly suggest that this article be updated to propose and demonstrate some concrete benchmarks for ETL and reporting which show the different approaches and their benefits

  • To play devils advocate, if you were to split the date and time into two dimensions, the calculation for summing all transactions between Jan 10, 2010 3:00PM and Jan 14, 2010 11:00AM becomes a bit more complex than one would expect. Unless you've got a sophisticated DBA, creating the appropriate indexes may prove difficult if this is a common type of query.

  • Well, then I'd just use the actual datetime values instead probably 🙂

    Like I said I think that the most important part of to have something that scales relatively well and lends itself to different query patterns. Chances are this means that not one but at least 2 or 3 different designs that can be incorporated together and work for these different patterns. So long as the finest grain is still available, using surrogate keys etc. it alwas something that can be done.

  • YSLGuru (6/1/2010)


    Martin Mason (6/1/2010)


    I couldn't disagree with an article more. The author should read Chapter 2 of Kimball's The Data Warehouse Toolkit for a proper way to handle dates and times in a dimensional model. If I could give zero stars, I would.

    Not everyone is an expert at every part of SQL Server and while this persons article does not show the best approach to this, it is sufficient enough to point this out and provide alternate suggestions and or links as other have. It’s not necessary to comment that you'd like to give 0 stars if you could; that not being constructively critical of an article but just plain mean and rude.

    Thanks to comments like these this person may in the future come across some tidbit of insight to something that they would have shared but now won’t for fear of ridicule and prosecution by posters who like to make “I’d like to give 0 stars” comments. Please keep criticism to a constructive level.

    With a title like "Sins of SQL" wasn't this really the response that was intended? 😉

    I generally don't have generalities about data warehousing and when asked a question usually answer, it depends. This article, however, violates three principles I hold to be true in ALL situations. First, a well designed date dimension is critical to data warehouse project success. When combining primary data marts to create secondary data marts, whether in a multidimensional database or in the ETL, having "intelligent"/surrogate keys at different levels of the date hierarchy is crucial as the business processes being combined are frequently at different time grains. That principle is in addition to all the reasoning provided in Kimball's treatise. Second, combining date and time in a single dimension is never a good idea. Time will have a static number of rows. It will be either 24 rows if analysis is to be performed at the hour grain, or 24*3600 if at the second but static nonetheless. Creating a "composite" dimension between one that is static and another that is not (i.e. monster changing dimensions) is rarely a wise choice. And thirdly, CTE's have their place but absolutely NEVER in the user presentation layer of the data warehouse for the many reasons others have stated. If necessary, i.e parent-child hierarchies, using a Kimball helper table is a far better approach.

  • Let's see:

    1. What if you want Gregorian calendar, business calendar, and manufacturing calendar?

    2. You need to designate holidays?

    3. You need to designate fiscal quarters/years based on both a date and a week ending date?

    4. You need to designate quarters/calendar years based on both date and week ending date?

    5. You need to designate quarters/manufacturing calendars based on both date and week ending date?

    6. Your week ending date need to account for country differences?

    7. Your business week needs to account for country differences?

    I'll give you the time thing, because you can use simple functions to slice apart a column with a time data type.

    As for the "Sins of SQL" with respect to a date table? Not a chance at all. Your little recursive CTE can deal with one very specific scenario and that is straight Gregorian calendar, it can't deal with any of the items listed above. Before you say that this is an "artificial" list, I'm working with a cube right now that requires ALL of these within the date hierarchies. How about we try the "Sins of Oversimplifying Problems and Trying to Apply the Oversimplification to Everything".

    Michael Hotek

  • Not everyone is an expert at every part of SQL Server and while this persons article does not show the best approach to this, it is sufficient enough to point this out and provide alternate suggestions and or links as other have. It’s not necessary to comment that you'd like to give 0 stars if you could; that not being constructively critical of an article but just plain mean and rude.

    You're right, that not everyone can be an expert at every part of SQL. But it's probably not a good idea to write about something for publication that you don't know well. Now this article is out there and an inexperienced user may try the technique if they aren't aware of or able to check the discussion. As the date dimension is critical to a successful data warehouse deployment, the chances have just been increased that the deployment will fail. There probably should be some basic vetting system for published articles on the site.

    Maybe be comment is also unnecessary, but the strange title did open the door a bit, too. What exactly is the shortcoming of SQL that constitutes the "sin?" The problem is an exploding dimension was caused not by some flaw in SQL but by the author combining the date and time dimensions, which is not a recommended practice for precisely this reason. So while the comments perhaps unnecessary, calling the comment plain mean and rude isn't helpful either, as the responder did not likely mean it that way.

  • Martin Mason (6/1/2010)


    YSLGuru (6/1/2010)


    Martin Mason (6/1/2010)


    I couldn't disagree with an article more. The author should read Chapter 2 of Kimball's The Data Warehouse Toolkit for a proper way to handle dates and times in a dimensional model. If I could give zero stars, I would.

    Not everyone is an expert at every part of SQL Server and while this persons article does not show the best approach to this, it is sufficient enough to point this out and provide alternate suggestions and or links as other have. It’s not necessary to comment that you'd like to give 0 stars if you could; that not being constructively critical of an article but just plain mean and rude.

    Thanks to comments like these this person may in the future come across some tidbit of insight to something that they would have shared but now won’t for fear of ridicule and prosecution by posters who like to make “I’d like to give 0 stars” comments. Please keep criticism to a constructive level.

    With a title like "Sins of SQL" wasn't this really the response that was intended? 😉

    I generally don't have generalities about data warehousing and when asked a question usually answer, it depends. This article, however, violates three principles I hold to be true in ALL situations. First, a well designed date dimension is critical to data warehouse project success. When combining primary data marts to create secondary data marts, whether in a multidimensional database or in the ETL, having "intelligent"/surrogate keys at different levels of the date hierarchy is crucial as the business processes being combined are frequently at different time grains. That principle is in addition to all the reasoning provided in Kimball's treatise. Second, combining date and time in a single dimension is never a good idea. Time will have a static number of rows. It will be either 24 rows if analysis is to be performed at the hour grain, or 24*3600 if at the second but static nonetheless. Creating a "composite" dimension between one that is static and another that is not (i.e. monster changing dimensions) is rarely a wise choice. And thirdly, CTE's have their place but absolutely NEVER in the user presentation layer of the data warehouse for the many reasons others have stated. If necessary, i.e parent-child hierarchies, using a Kimball helper table is a far better approach.

    Now this last reply was constructive and well put. That's all I'm suggesting.

    Kindest Regards,

    Just say No to Facebook!

Viewing 15 posts - 16 through 30 (of 83 total)

You must be logged in to reply to this topic. Login to reply