Date Dimension

  • RonMexico

    Hall of Fame

    Points: 3340

    I'm in the process of creating a date dimension table with the usual fields such as DayOfWeek, DayOfMonth, DayName, IsHoliday, various different formats of date, etc. Much of our reporting relies on trending data and comparing to previous month(s), quarter(s), or year(s) and also month-to-date, quarter-to-date, and year-to-date indicators. There would also be a need for a current day indicator. The issue I'm having with designing is do I 1.) add these fields to the table and run updates against it daily, 2.) Store these fields in a separate table, or 3.) Do not store them anywhere but rather have the users perform calculations?

  • Andrew P

    SSCarpal Tunnel

    Points: 4493

    We use a date table, but not in a high-throughput environment. Here's my thoughts:

    I recommend against #3 - requiring users calculate the date-related fields themselves (especially for the commonly used calculated values). I'm currently working through whether to use the DAX WEEKNUM or ISO8601 method of identifying the week number of a given date. The first says the first week is the week that 01/01/yyyy falls in, the second is the one where most of the days in the week are in yyyy. I don't think you want alternative calculations in use. You'd also be losing one of the key benefits of date dimensions, storing the calculated values for a given date so these don't need to be computed at query time.

    Looking at the other two options, there's only 51,134 days between 1900 and 2040.  I don't know your organization , but for us, that range would be more than sufficient, and 51,134 isn't a lot of rows. I recommend using the date value as the clustered index, adding the additional pre-calculated columns, and seeing if this solution is performant enough, and refactoring in any situation where it's found it is not. A doubt I have with this is that we don't often use many of these pre-calculated fields, and mostly use our date table to select the days between x and y, so the other additional columns aren't a great help.

    With the above in mind, I suggest #1, and suspect #2 to be likely over-optimization, at least initially, unless you have information or experimentation that suggests otherwise.

  • Jeff Moden

    SSC Guru

    Points: 996645

    My personal opinion is that people put entirely too much into Calender/DateDim tables.  An example of this is people that store the full name of the days of the week, which is usually only needed for display purposes, in such a table.  It's just too easy and less resource intensive to use DATENAME to calculate such a thing.

    On the other hand, I also find that people don't include enough to replace the more difficult calculations that are either needed now or will be sometime in the future.  An example of this are forward and backward looking date serial numbers where the non-working days are numbered the same as the next work day for one column and the same as the previous latest workday for another to easily allow for the resolution of such things as "next business day" or "within 5 business" days or to calculate lead times given a target date.

    They also miss out on adding ordinals for such things as "WeekDayOfMonth", which is simply an ordinal partitioned by month and  day of the week to help easily answer questions like "what is the date for the 4th Tuesday of every month"  or even "what is the last Thursday of every month" (find the first Thursday of months and then subtract 7 days is pretty easy and fast to do).

    As for marking the table with the current date, I wouldn't do that to the actual Calendar/DateDim table.  If you really need such functionality, I'd create a sister table with such information (actually, 2 with a "flip-flop synonym" to point to the current table while a new one is being built  and then "flop over" to point to the new one when it's ready for consumption).

    As for storing the new columns in a separate table, "It Depends" (much like the current date table thing above) on what the need is, what the impact on performance (good or bad) might be, and what the complexity of use will be.  There's also another rather nasty consideration... I've found that a lot of front-end programmers still don't know how to write code that (much like views that use SELECT * in SQL Server) doesn't require an update when the schema of a table changes.  You need to do a test in the Development environment to make sure that the addition of columns isn't going to break code.  If it does, that would be a good time to fix that code.

    Now... all that being said, my normal first question that I ask on interviews of Senior DBAs and Senior Developers is "How do you get the current date and time in SQL Server?".  It was meant as an easy question to help break the ice and get the candidate to relax a bit.  I didn't realize it was going to be the penultimate litmus-strip test.  I stopped counting after 20 out of 22 candidates couldn't answer the question {major face-palm}{head-desk}{head-desk}{head-desk}.

    And, with THAT being said, I have to say that I totally agree with Andrew above on your 3rd question.  In other words, yes, I'd likely add the spelled out day of the week (and other columns that I personally think shouldn't be added) to the Calendar/DateDim table and leave little to nothing to chance because a lot of "users" simply don't know how to do such easy temporal calculations.  Just mind the performance.

    And, no... I'm not being "unkind" or "snarky" here.  Too often I've actually seen code like the following (patently not the correct way to do such a thing for multiple reasons) on forum replies (and even in some articles) and in code that I've had to fix (especially when they put it into a scalar function).

    --===== Way too typical code.
    SELECT DayOfWeek = CASE
    WHEN DATEPART(WEEKDAY,@SomeDate) = 1 THEN 'Sunday'
    WHEN DATEPART(WEEKDAY,@SomeDate) = 2 THEN 'Monday'
    WHEN DATEPART(WEEKDAY,@SomeDate) = 3 THEN 'Tuesday'
    WHEN DATEPART(WEEKDAY,@SomeDate) = 4 THEN 'Wednesday'
    WHEN DATEPART(WEEKDAY,@SomeDate) = 5 THEN 'Thursday'
    WHEN DATEPART(WEEKDAY,@SomeDate) = 6 THEN 'Friday'
    WHEN DATEPART(WEEKDAY,@SomeDate) = 7 THEN 'Saturday'

    The same holds true for month names and month/day of week name abbreviations.

    --Jeff Moden

    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply