Calendar Tables

  • >>

    Here we've always been of the belief that you shouldn't have tables with date ranges, because tables get created and populated, the query in question works, and the table gets forgotten about. You come into work on Jan 2nd, and something has stopped working, because there are no valid dates in the table, and the query that relies on this has failed because no-one updated the table, or the people who know about the table have left.

    I understand about the performance benefit, and it makes sense. Perhaps there could be a recurring job that runs every Dec XXth, that adds to the contents of that table so that the table always holds at least a years extra data, for instance. That way, the table will never run out of dates.

    >>

    So it's OK to depend on a job that runs infrequently, but it's not OK to depend on a table that has to be maintained infrequently? Where's your DBA?

    _________________
    "Look, those sheep have been shorn."
    data analyst replies, "On the sides that we can see.."

  • katesl (7/15/2010)


    >>

    So it's OK to depend on a job that runs infrequently, but it's not OK to depend on a table that has to be maintained infrequently? Where's your DBA?

    No, neither are OK, but I believe the former might just be safer than the latter. If the DBA leaves, and doesn't tell his replacement about this magic table of numbers, do we wait for the new DBA to find this table, and work out why it's there?

    I think you're marginally *less* likely to have a well tested job screw up, than have a DBA remember to infrequently remember to maintain that table.

  • Nice article. Thanks for the time and effort that you took to write it.

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • We advocate and use the calendar table approach too. We deal with reams (tech term) of daily financial data and this approach makes many difficult things easier and some impossible things possible 😉

    If not for the calendar table, we'd be recomputing date ranges, etc. to the nth degree. With the table, they're computed once (and referenced many times). It helps with the load.

    Thanks for the article.

  • I don't object to a calendar table per se, and a specific calendar table does have certain advantages over a tally table.

    But I'm not sure that big of a performance difference should necessarily be one of them.

    When using the tally table, I don't think it's necessary to use functions on the table column in the JOIN clause. That will automatically limit the usefulness of any index on that column.

    Couldn't we do something like this instead?:

    ...

    FROM Tally

    LEFT JOIN #Stay

    ON

    N <= 12 AND

    ArrivalDate >= @StartYear1 AND

    (ArrivalDate BETWEEN DATEADD(MM, N - 1, @StartYear1) AND DATEADD(MS, -3, DATEADD(MM, N, @StartYear1)) OR

    ArrivalDate BETWEEN DATEADD(MM, N - 1, @StartYear2) AND DATEADD(MS, -3, DATEADD(MM, N, @StartYear2)))

    GROUP BY ...

    I.e., put all the JOIN criteria in the JOIN and make sure we don't use functions on the ArrivalDate. I'm not sure the optimizer is sophisticated enough to recognize that it should be able to use the index here, but I think at least it gives it a chance 🙂 .

    I know for a single month I usually an index seek vs. a scan using this style of coding, but for the side-by-side month, I'm not sure.

    Scott Pletcher, SQL Server MVP 2008-2010

  • A question for people re performance.

    As I said above, we use a date table, and as our significant dates are of type [date] (with no time-of-day), we can join ON a.Datefield = b.Datefield, which is fast.

    However, some new data will use [datetime] fields, where the time of day is not 00:00:00. We can match using BETWEEN or CONVERT(DATE,a.Datefield), but is that going to slow down the join? Should we add computed or pre-computed fields of type [date] to use in joins, or would this not gain anything? (Assume all fields being joined on are indexed.)

  • Chiming in with my agreement with other BI/Warehouse guys, this is a 10 year old practice with time dimensions. It makes perfect sense in almost any system where rows are generated at a specific interval of time.

    As far as the type of key, I remember one instance were an int IDENTITY(1,1) key was the appropriate choice because our time dimension was at the hourly granularity, and we had to take into consideration DST. We had an extra hour in the fall with the exact same time stamp. In this regard, the auto-incrementing ID kept the two separated in the fact table as two unique time periods even though logically they were the exact same point in time. We could have solved this using a datetime field by using UTC, but since our system didn't care what the UTC was, we left the integer field in as ID.

    At that point if we wanted to constrain by time period, we would just put a join on D_Time, like I find happens in most queries anyway in a warehouse (between this month and that). So far warehouses I've worked with have not had a lot to gain out of using the datetime as a primary column, usually because I'm already joining on D_Time for a separate constraint or to get a time reference for the time I'm working with (ex: end of previous month when doing comparisons between today and previous month end).

    Another thing to consider about using datetime as key and as constraint is if you put any converts or dateparts on your datetime field in the where or join clause, as far as I know and have tested, SQL will not use the index associated with that field. So it is better for performance if you join on an appropriate field in the time dimension without changing the raw data in the field and also then putting an index on that column if it is used in many queries.

    From my experience, unless you're working with a very large chunk of time (greater than 5 years) and have a very tiny time granularity (less than an hour), a join on a time table really doesn't affect performance significantly, again assuming the appropriate indexes have been created.

    If anyone thinks to the contrary, please post. I've never learned so much I think I have nothing left to learn. =D

    Correction: It will do an index scan rather than index seek, because it will convert every time using the function you've given it before it uses the index.

    -------------------------------------------------------------------------------------------------
    My SQL Server Blog

  • The example doesn't really show off the capabilities of a "Time" table. I'm used to very wide tables that contain columns to help you count work days, holidays, fiscal days, gregorian, julian, and week days in any given time period by adding up the ones and zeros in a given column where there is one row per day. It also allows you to cross reference work day 145 with a calendar date, or to find which fiscal month it relates to.

  • There is logic behind not using the date as the key to the date dimension... it stops developers from falling into the easy trap of using the key as the date and not bothering to join onto the date dimension table!

  • Thanks for taking the time to put this together and share with us.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Samuel Vella (7/15/2010)


    There is logic behind not using the date as the key to the date dimension... it stops developers from falling into the easy trap of using the key as the date and not bothering to join onto the date dimension table!

    True, however you do loose the ability to maintain a sequential set of date information, as its possible that 02-Jan-2011 is not the next key in sequence to 01-Jan-2011 thus it slows performance in not only the ETL process but also the reporting of the data.

    The idea behind a Key on a table is a unique reference, and time can never have more than one occurance thus it makes sense to use the date in a data warehouse as the key on both the date dimension and also on the Fact table as it cannot have more than one entry.

    You then only have to lookup to the date column to get any flags such as Public Holidays, Seasons, Financial Periods (Open, Closed, current), etc.

    One way to alieveate this is to convert the Datetime to an INT, however this has a few draw backs, the main one being due rounding when cast a date that has gone over Midday, it gets rounded up to the next day, but as long as you know the pitfalls its pretty easy to avoid.

    edit : dates wrong in the first paragraph

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Now [font="Arial Black"]there's [/font]an idea for an article... comparing the pros and cons of using a DateTime dates compared to INT dates.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Looking at the "Tally" table code that someone else wrote, I can certainly see why it's so slow. No chance of the clustered index being used there!

    Thanks for the honorable mention (heh... I think ;-)), Todd. Considering how badly someone abused the Tally table, you've just got to know that I'll be doing a full blown code review there. 😛

    Also, do you happen to have the link where that "Tally" code originated? I need to set things straight with the author of that code (BWAAA-HAAA!!! Hopefully, it wasn't me under the influence of cold medicine. :w00t::hehe:)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the article! As a relative newcomer to SQL I'd be intested in comments on using a UDF to generate a similar table of dates. below is one I've been using with parameters to set the date range and switch whether it returns days, months and years. Are there performance issues with this approach?

    set ANSI_NULLS ON

    set QUOTED_IDENTIFIER ON

    go

    /*

    * PAREMETERS:@DatePart: 0 = Day, 1 = Month, 2 = Year

    * USE: SELECT * FROM dbo.fn_Generate_Dates_V2 ('2000-01-01', '2025-01-01', 0)

    */

    CREATE FUNCTION [dbo].[fn_Generate_Dates_V2](@StartDate SmallDateTime, @EndDate SmallDateTime, @DatePart int = 0)

    RETURNS @returntable

    TABLE ( Date_Key int IDENTITY,

    Date_Value SmallDatetime,

    Last_Month smalldatetime,

    Next_Month smalldatetime,

    Day_Value int,

    Day_Name varchar (20),

    Month_Value int,

    Month_Name varchar(20),

    Year_Value int) AS

    BEGIN

    WHILE (@StartDate < @EndDate)

    BEGIN

    INSERT @returntable (

    Date_Value,

    Last_Month,

    Next_Month,

    Day_Value,

    Day_Name,

    Month_Value,

    Month_Name,

    Year_Value)

    VALUES (

    @StartDate,

    DateAdd(Month, -1, @StartDate),

    DateAdd(Month, 1, @StartDate),

    DatePart(Day, @StartDate),

    Datename(dw,@StartDate),

    DatePart(Month, @StartDate),

    Datename(Month, @StartDate),

    DatePart(yyyy, @StartDate)

    )

    SELECT @StartDate = CASE @DatePart

    WHEN 1 THEN DateAdd(Month, 1, @StartDate)

    WHEN 2 THEN DateAdd(Year, 1, @StartDate)

    ELSE DateAdd(Day, 1, @StartDate)

    END

    END

    RETURN

    END

  • You may want to consider using a DATENAME() instead of the cumbersome CASE statement. That would also allow some language variation dependant on the default language.

Viewing 15 posts - 16 through 30 (of 46 total)

You must be logged in to reply to this topic. Login to reply