Sins of SQL: The Time Table

  • Okay Steve, I find it admirable that you will stand by your code. Unfortunately, I took that as a challenge and wrote my own inline TVF that generates the same results sets as yours with the same input values and then ran some tests. Now, I didn't write it up in a really formal method, but i have published a short synopsis of the results in my blog so that others my see the differences between the two approaches.

    Please, take a little time and read this post, Sins of a Recursive CTE?. I think you will notice that the recursive CTE really isn't the best option. Of course, you are free to copy my code and do any testing that you desire, just be sure to let us know your results.

  • That is quite a strange response. Your article spent time basically explaining how your Gregorian calendar approach using a recursive CTE is the only "good" solution. (Afterall, you did title it the "Sins of SQL". Which, by the way, has absolutely nothing whatsoever to do with SQL.)

    I never said in the article it was the only good solution. The 'Sin of SQL' being referred to is creating an arbitirary resolution dimension analogue for a well-defined contiguous range of data (date/time) with a new surrogate key.

    Your solution to each of the holes in the approach is to simply start bolting on additional stuff.

    Using a base technique for the core of the problem I attempted to solve, and then extending it to solve additional requirements?

    So, let me see... Use a recursive CTE that has to generate the set of dates each time and also can't be used to enforce any kind of integrity.

    When would you have to delete a time record? Again, if you needed to exclude specific ranges of time, an exclusions table that operates on the span level would work.

    Then when you need holidays, bolt on a look up table. Wait, need business defined week ending dates, quarter ending dates, and year ending dates - bolt on another lookup table or tables. Need to vary the start/end of a business week by country or even change the definition after several years while still preserving the previous definition - bolt on another lookup table. By the time you're done bolting on all of these special purpose lookup tables just to handle all of the permutations in a single cube that I'm dealing with, you have nothing but a mess that blatantly demonstrates your assertion about not needing a date table is completely full of holes that you never considered.

    Let's just disect an example: you have to exclude an arbitrary non-expected holiday from your figures - a period of 24 hours. In the approach I've proposed, you'd create a table of exclusions with a start/end time specified, and then join against it where the date of the row lay between. The advantage is that you do not need to redefine your holidays/other linkage tables based on whether or not you're dumping the data at the day, hour, minute or seconds or even resolution. In this approach we're treating time like a contiguous field over the fact tables, and then the dimension for dates/times becomes an arbitrary ruler over it.

    In the alternative approach of a time table, perhaps you annotate each row with a 'IsHoliday' or 'ExcludeAccounting' flag. Trouble is that if you change the resolution up/down on your date/time data, you the need to re-key your source data to the new definitions, and you have to build your time table and figure out a way of preserving the manual edits. Even when the values are fixed, you likely need to resolve the TimeID or some such on the way in.

  • Martin Mason (6/2/2010)

    Whoa. Timeout. If you're talking about SSAS, your argument only considers partition processing, NOT dimension or aggregation processing SPACE, TIME, or MEMORY CONSUMPTION. With separate dimensions, the time dimension at second level granularity is going to have process 3600 rows, 3600 members at the key level. Your date dimension for 10 years of data at the day grain would be about the same. On the other hand, your composite design with 10 years at second level granularity would be 13,140,000 records at the key level. Which is the more performant solution, one that requires processing of 3600 * 2 records or the one that requires processing of 3600 squared?

    Let's take an example with an arbitrary resolution. If you have years worth of data (365 days) and a time metric at the second level (86,400 rows in your time table). For a given year in SSAS the cell-space/aggregations would need to potentialy deal with something in the area of 31,536,000 distinct permutations - it's not just "Two dimesnions, so times by two" - each combination of time/date could house different values and thus is distinct/seperate cells. This is inescapable, whether or not you use two seperate dimensions, or one hierarchical one. If this was not the case, it'd mean you could only get the aggregations on time OR date, but never an intersection of both, which is clearly not the case.

    I suppose if you're hell bent on using two seperate dimensions, could still easily adapt the technique demonstrated (or use similar), so that your fact tables are still only storing the real datetime value, but then when creating your Data-Source View in Business Intelligence studio, extract the date/time values and wire them up to two seperate dimensions generated on the fly seperately.

  • Not having "29 years of experience" but being keenly interested in learning from those with good ideas &/or years of experience/expertise, I am disappointed when I see comments such as Dan's closing sentences as such mean-spirited words make it harder for me to consider the rest of his post (which seems to raise valid points, otherwise). In this field, as in most that live in shades of gray, a public square full of differing opinions is a sign of a healthy environment....one better maintained by avoiding mean-spirited, rude and ad hominem statements.

    I think we've all seen Sins of SQL from people with "30 years of SQL Server 2005 experience". I try not to worry too much about the lineage someone asserts, much more interested in the examples they bring to the table 😉

    Thank you to the author and those (respectfully) offering differing opinions/reasoning. The back and forth is MUCH more valuable than any article by itself (controversial OR sacrosanct).

    Indeed. I enjoy the debate aspect, since it yields a lot more opportunity for learning than simply accepting conventional wisdom at face value.

  • DeronDilger (6/2/2010)


    dan_public (6/2/2010)


    Steven,

    .....[G]oing forward, I will be on the lookout for your articles. And be sure not to read them.

    Have a nice day.

    Dan.

    Not having "29 years of experience" but being keenly interested in learning from those with good ideas &/or years of experience/expertise, I am disappointed when I see comments such as Dan's closing sentences as such mean-spirited words make it harder for me to consider the rest of his post (which seems to raise valid points, otherwise). In this field, as in most that live in shades of gray, a public square full of differing opinions is a sign of a healthy environment....one better maintained by avoiding mean-spirited, rude and ad hominem statements.

    Thank you to the author and those (respectfully) offering differing opinions/reasoning. The back and forth is MUCH more valuable than any article by itself (controversial OR sacrosanct).

    DeronDilger (6/2/2010)


    dan_public (6/2/2010)


    Steven,

    .....[G]oing forward, I will be on the lookout for your articles. And be sure not to read them.

    Have a nice day.

    Dan.

    Not having "29 years of experience" but being keenly interested in learning from those with good ideas &/or years of experience/expertise, I am disappointed when I see comments such as Dan's closing sentences as such mean-spirited words make it harder for me to consider the rest of his post (which seems to raise valid points, otherwise). In this field, as in most that live in shades of gray, a public square full of differing opinions is a sign of a healthy environment....one better maintained by avoiding mean-spirited, rude and ad hominem statements.

    Thank you to the author and those (respectfully) offering differing opinions/reasoning. The back and forth is MUCH more valuable than any article by itself (controversial OR sacrosanct).

    Deron,

    My point in mentioning my experience with analytical systems is that it doesn't make me the keeper of all wisdom. Quite the opposite, in fact. My experience has made me more open to differing opinions and methods, and that there can be multiple "best" solutions to a problem.

    The downside of this experience is that I've run into a quite a few people over the years who insist that THEY are the keepers of the wisdom. And that their approach is "best", regardless of differing needs and environmental constraints. That raises my hackles.

    I enjoy a vigorous exchange of opinions as much as the next person, but I'm not interested in a religious argument disguised as a debate. The purpose of the first is personal growth; the purpose of the second is to convert the "sinners". I'm interested in growing, but have no interested in being "converted".

    Regards,

    Dan.

  • Steven James Gray (6/3/2010)


    Martin Mason (6/2/2010)

    Whoa. Timeout. If you're talking about SSAS, your argument only considers partition processing, NOT dimension or aggregation processing SPACE, TIME, or MEMORY CONSUMPTION. With separate dimensions, the time dimension at second level granularity is going to have process 3600 rows, 3600 members at the key level. Your date dimension for 10 years of data at the day grain would be about the same. On the other hand, your composite design with 10 years at second level granularity would be 13,140,000 records at the key level. Which is the more performant solution, one that requires processing of 3600 * 2 records or the one that requires processing of 3600 squared?

    Let's take an example with an arbitrary resolution. If you have years worth of data (365 days) and a time metric at the second level (86,400 rows in your time table). For a given year in SSAS the cell-space/aggregations would need to potentialy deal with something in the area of 31,536,000 distinct permutations - it's not just "Two dimesnions, so times by two" - each combination of time/date could house different values and thus is distinct/seperate cells. This is inescapable, whether or not you use two seperate dimensions, or one hierarchical one. If this was not the case, it'd mean you could only get the aggregations on time OR date, but never an intersection of both, which is clearly not the case.

    I suppose if you're hell bent on using two seperate dimensions, could still easily adapt the technique demonstrated (or use similar), so that your fact tables are still only storing the real datetime value, but then when creating your Data-Source View in Business Intelligence studio, extract the date/time values and wire them up to two seperate dimensions generated on the fly seperately.

    SSAS WILL store each member of a dimension and its related attribute relationships whether referenced by fact table data or not. That storage WILL impact performance, both query and processing. With two separate dimensions, 12:01 AM is stored once and only once no matter how many years defined in your date dimension. With your approach, it's created for as many days as defined in your dimension. Which do you think will be better? I'm "hell bent" on using two separate dimensions because it's a better approach. Read Chapter 2 of the The Data Warehouse Toolkit for an explanation.

  • Steven James Gray (6/3/2010)


    That is quite a strange response. Your article spent time basically explaining how your Gregorian calendar approach using a recursive CTE is the only "good" solution. (Afterall, you did title it the "Sins of SQL". Which, by the way, has absolutely nothing whatsoever to do with SQL.)

    I never said in the article it was the only good solution. The 'Sin of SQL' being referred to is creating an arbitirary resolution dimension analogue for a well-defined contiguous range of data (date/time) with a new surrogate key.

    What you fail to realize is that you are dealing with two very different things and trying to generalize them. A day is NOT a "time interval". A day is NOT contiguous. But, you're obviously so wrapped up in your shiny little solution about TIME to understand that. One of my systems deals with DATES, only DATES, and nothing but DATES. It could care less about hours, minutes, and seconds. On this day - 6/3/2010, there are exactly 14 different definitions for 6/3/2010, all at the same time and all used by the business. 6/3/2009 has 11 different definitions and 6/3/2011 currently has 17 different definitions. If that weren't enough, in another cube, this precise hour in time (7AM) on this day happens to have 4 different business definitions while this same hour 1 year ago only had 3 business definitions. Could you deal with all of this in code? Sure. But, your code would be so ridiculously convoluted with more exception than actual rule and you'd spend just about all of your time testing and modifying the code to deal with each date that you'd never be done.

    I get it. You've built a system or two and now decided that you are going to write stuff. Congratulations. I applaud your choice. You're at least not scared to put something out in front of everyone and let people pick it apart. Fantastic.

    You went about it the absolutely wrong way. Publishing something that is very obviously going to have debate on and something that you very obviously didn't do very much benchmarking or testing of alternative solutions while sensationalizing it with the title has gotten you exactly one thing. From this point forward, every time I see your name as the author of anything - book, article, post, I'm going to skip it and go on to something else. Every heard of "the boy who cried wolf"? Welcome to the club.

    Michael Hotek

  • I want to thank everyone that posted on this thread. It is one of the most interesting and dynamic threads I have read in a long time. Whether you agree with the author's article or not, he deserves a great deal of credit for the discussion the article created. I expect I will study this thread for a quite a while.

  • DeronDilger (6/2/2010)


    dan_public (6/2/2010)


    Steven,

    .....[G]oing forward, I will be on the lookout for your articles. And be sure not to read them.

    Have a nice day.

    Dan.

    Not having "29 years of experience" but being keenly interested in learning from those with good ideas &/or years of experience/expertise, I am disappointed when I see comments such as Dan's closing sentences as such mean-spirited words make it harder for me to consider the rest of his post (which seems to raise valid points, otherwise). In this field, as in most that live in shades of gray, a public square full of differing opinions is a sign of a healthy environment....one better maintained by avoiding mean-spirited, rude and ad hominem statements.

    Thank you to the author and those (respectfully) offering differing opinions/reasoning. The back and forth is MUCH more valuable than any article by itself (controversial OR sacrosanct).

    Deron,

    My point in mentioning my experience with analytical systems is that it doesn't make me the keeper of all wisdom. Quite the opposite, in fact. My experience has made me more open to differing opinions and methods, and that there can be multiple "best" solutions to a problem.

    The downside of this experience is that I've run into a quite a few people over the years who insist that THEY are the keepers of the wisdom. And that their approach is "best", regardless of differing needs and environmental constraints. That raises my hackles.

    I enjoy a vigorous exchange of opinions as much as the next person, but I'm not interested in a religious argument disguised as a debate. The purpose of the first is personal growth; the purpose of the second is to convert the "sinners". I'm interested in growing, but have no interested in being "converted".

    Regards,

    Dan.

    Dan,

    1) If the author has stated he is the 'Keeper of Wisdom' then i believe you are the only one has seen that in his article. He even said "If you use this I'd love to hear about it" and those aren't the words of one who believes they are the keeper of wisdom.

    2) Not sure how you get " a religious argument disguised as a debate.." out of the thread on this article but again I think you may be the only one seeing that however I will give you this one.

    3) Lastly....

    [Quote]I enjoy a vigorous exchange of opinions as much as the next person..[/quote]

    Then please keep the comments to those of a constructive nature and not those of sarcasm. Your original post was very clearly intended to convey sarcasm which is far from constructive and that is what Deron was trying to tell you and that you failed to understand.

    Its up to you how you handle this and how you reply to threads but if you can't be constructive and or aren't sure if you are being constructive then just don't reply. There's enough quality feedback that is constructive to weight the thread down with cheap shots and sarcasm.

    Kindest Regards,

    Just say No to Facebook!
  • Lynn,

    Your point about constructive criticism is well taken. I'll keep that in mind and will scrub out the "crisp" language going forward. That said...

    I'm not sure why you think my initial post (copied below) was sarcastic. In fact, while the final statement was more declarative than the point you made in your article, the message is the same - while using Gray's methodology is something you can do, it's probably not something that you should do. Where is the sarcasm?

    Regards,

    Dan.

    Besides the many issues raised about this approach, another issue not addressed by this approach is auditing. The Time dim (the most important dimension, IMO) is dynamic. It's members are ephemeral and only created on the fly. There's nothing to audit.

    Auditing is less important when using when using a simple "textbook" example like the one chosen. OTOH, real-world Time dims can be much more robust and will take time to create, audit, and maintain.

    IMO, this is an example of a nice theoretical discussion of what you CAN do but probably should NOT do.

    Regards,

    Dan.

    One star.

  • dan_public (6/3/2010)


    Lynn,

    Your point about constructive criticism is well taken. I'll keep that in mind and will scrub out the "crisp" language going forward. That said...

    I'm not sure why you think my initial post (copied below) was sarcastic. In fact, while the final statement was more declarative than the point you made in your article, the message is the same - while using Gray's methodology is something you can do, it's probably not something that you should do. Where is the sarcasm?

    Regards,

    Dan.

    Besides the many issues raised about this approach, another issue not addressed by this approach is auditing. The Time dim (the most important dimension, IMO) is dynamic. It's members are ephemeral and only created on the fly. There's nothing to audit.

    Auditing is less important when using when using a simple "textbook" example like the one chosen. OTOH, real-world Time dims can be much more robust and will take time to create, audit, and maintain.

    IMO, this is an example of a nice theoretical discussion of what you CAN do but probably should NOT do.

    Regards,

    Dan.

    One star.

    Dan,

    Sorry, but I'm confused. I don't recall any of the above. Could you find the post number and provide that or a link to the post?

    Thank you,

    Lynn

  • Lynn Pettis (6/3/2010)


    dan_public (6/3/2010)


    Lynn,

    Your point about constructive criticism is well taken. I'll keep that in mind and will scrub out the "crisp" language going forward. That said...

    I'm not sure why you think my initial post (copied below) was sarcastic. In fact, while the final statement was more declarative than the point you made in your article, the message is the same - while using Gray's methodology is something you can do, it's probably not something that you should do. Where is the sarcasm?

    Regards,

    Dan.

    Besides the many issues raised about this approach, another issue not addressed by this approach is auditing. The Time dim (the most important dimension, IMO) is dynamic. It's members are ephemeral and only created on the fly. There's nothing to audit.

    Auditing is less important when using when using a simple "textbook" example like the one chosen. OTOH, real-world Time dims can be much more robust and will take time to create, audit, and maintain.

    IMO, this is an example of a nice theoretical discussion of what you CAN do but probably should NOT do.

    Regards,

    Dan.

    One star.

    Dan,

    Sorry, but I'm confused. I don't recall any of the above. Could you find the post number and provide that or a link to the post?

    Thank you,

    Lynn

    Lynn,

    Here's the post link:

    Regards,

    Dan.

    p.s. The forum software has an odd way of parsing links. You'll have to cut and paste. Sorry.

  • A day is NOT a "time interval".

    That's a matter of perspective, and not a matter of fact. Under what circumstance is a day not subdivision of some constantly distributed time axis?

    But, you're obviously so wrapped up in your shiny little solution about TIME to understand that.

    The code in question could be used to build surrogate Date+Time combined dimensions, or broken down/refactored to generate two smaller dimensions that cater for both seperately, without having to re-key the source tables or do arbitrary lookups.

    One of my systems deals with DATES, only DATES, and nothing but DATES. It could care less about hours, minutes, and seconds....

    The article clearly indicates that the resolution I've used may not suit all applications, and you may want to tweak it to the scenario.

    On this day - 6/3/2010, there are exactly 14 different definitions for 6/3/2010, all at the same time and all used by the business. 6/3/2009 has 11 different definitions and 6/3/2011 currently has 17 different definitions.

    Sounds like different definitions of dates is something you can indicate another dimension (DateType dimension?) - what bearing does this have on the article itself though?

    If that weren't enough, in another cube, this precise hour in time (7AM) on this day happens to have 4 different business definitions while this same hour 1 year ago only had 3 business definitions. Could you deal with all of this in code? Sure. But, your code would be so ridiculously convoluted with more exception than actual rule and you'd spend just about all of your time testing and modifying the code to deal with each date that you'd never be done.

    I'd create a table such as:

    TimeDefinition

    TimeDefinitionID (Primary key)

    Description

    StartDate

    EndDate

    Then do inner joins to this where the facts being processed are within the range. No hard coding, no special cases involved? Any dependant data that varies based on either a combination of the time-type being considered and the specific time in question can be done through a sub-table or if it's purely a time-definition level element that's constant across time, setting it up as part of the TimeDefinition table.

    I get it. You've built a system or two and now decided that you are going to write stuff. Congratulations. I applaud your choice. You're at least not scared to put something out in front of everyone and let people pick it apart. Fantastic.

    The tone of feedback from some (not all, but enough) people is sufficiently unconstructive here that it's certainly made me regret my chosen outlet. It's got very much the feeling of fighting a forum-entrenched old-guard, and not something I'm caring to be part of. I'm engaging on this thread primarily because as it's instigator I feel some urge to see the debate through, as I'm having to hold ground in an otherwise interesting area of debate.

    You went about it the absolutely wrong way.

    Subjective - please justify with a contextual example. This solution works fine for many scenarios - I never said it'd work for all.

    Publishing something that is very obviously going to have debate on and something that you very obviously didn't do very much benchmarking...

    I'm very much aware of the performance characteristics of CTE's, however I've explicitly stated that this purely for use when incrementally populating time dimensions, and the article never infers that this solution is intended for any kind of performance intensive/OLTP scenario. It's an article to show a path, not an article on writing some arbitrary bit of assembly-code quality SQL.

    .... or testing of alternative solutions

    The CTE is not the crux of the article itself - the retention of date/time values within fact data without having to resort to arbitrary lookup tables for these values is, however.

    while sensationalizing it with the title has gotten you exactly one thing.

    I've had some positive feedback, some negative. The title has certainly encouraged more debate than otherwise would have occured, although it seems to have rubbed up some the wrong way. In my younger years I'd written a fair bit of editorial copy for public consumption, so I'm not really bothered by backlash on a forum - you can never really judge the crowd before your first solo 😉

    From this point forward, every time I see your name as the author of anything - book, article, post, I'm going to skip it and go on to something else. Every heard of "the boy who cried wolf"?

    That's a fairly reaching statement that neither reflects accurately the article nor the points therein, and is driven from what is apparently nothing more than pedantic complaining about one of the tools used to achieve a result. The great thing about tools however, is that you're rarely required to use a single one for all problems.

  • dan_public (6/3/2010)


    Lynn Pettis (6/3/2010)


    dan_public (6/3/2010)


    Lynn,

    Your point about constructive criticism is well taken. I'll keep that in mind and will scrub out the "crisp" language going forward. That said...

    I'm not sure why you think my initial post (copied below) was sarcastic. In fact, while the final statement was more declarative than the point you made in your article, the message is the same - while using Gray's methodology is something you can do, it's probably not something that you should do. Where is the sarcasm?

    Regards,

    Dan.

    Besides the many issues raised about this approach, another issue not addressed by this approach is auditing. The Time dim (the most important dimension, IMO) is dynamic. It's members are ephemeral and only created on the fly. There's nothing to audit.

    Auditing is less important when using when using a simple "textbook" example like the one chosen. OTOH, real-world Time dims can be much more robust and will take time to create, audit, and maintain.

    IMO, this is an example of a nice theoretical discussion of what you CAN do but probably should NOT do.

    Regards,

    Dan.

    One star.

    Dan,

    Sorry, but I'm confused. I don't recall any of the above. Could you find the post number and provide that or a link to the post?

    Thank you,

    Lynn

    Lynn,

    Here's the post link:

    Regards,

    Dan.

    p.s. The forum software has an odd way of parsing links. You'll have to cut and paste. Sorry.

    Dan,

    Still confused. I was actually looking for the post where I was supposedly critical of your post. I only remember being critical of the inappropriate use of a recursive CTE in the code provided in the article.

    Thanks,

    Lynn

  • deleted -- multiple posts due to blackberry issues

Viewing 15 posts - 46 through 60 (of 83 total)

You must be logged in to reply to this topic. Login to reply