Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««12345»»»

Calendar Tables Expand / Collapse
Author
Message
Posted Thursday, July 15, 2010 11:06 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, July 25, 2014 8:39 AM
Points: 115, Visits: 745
A question for people re performance.

As I said above, we use a date table, and as our significant dates are of type [date] (with no time-of-day), we can join ON a.Datefield = b.Datefield, which is fast.

However, some new data will use [datetime] fields, where the time of day is not 00:00:00. We can match using BETWEEN or CONVERT(DATE,a.Datefield), but is that going to slow down the join? Should we add computed or pre-computed fields of type [date] to use in joins, or would this not gain anything? (Assume all fields being joined on are indexed.)
Post #953329
Posted Thursday, July 15, 2010 12:33 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, December 16, 2013 10:42 AM
Points: 96, Visits: 434
Chiming in with my agreement with other BI/Warehouse guys, this is a 10 year old practice with time dimensions. It makes perfect sense in almost any system where rows are generated at a specific interval of time.

As far as the type of key, I remember one instance were an int IDENTITY(1,1) key was the appropriate choice because our time dimension was at the hourly granularity, and we had to take into consideration DST. We had an extra hour in the fall with the exact same time stamp. In this regard, the auto-incrementing ID kept the two separated in the fact table as two unique time periods even though logically they were the exact same point in time. We could have solved this using a datetime field by using UTC, but since our system didn't care what the UTC was, we left the integer field in as ID.

At that point if we wanted to constrain by time period, we would just put a join on D_Time, like I find happens in most queries anyway in a warehouse (between this month and that). So far warehouses I've worked with have not had a lot to gain out of using the datetime as a primary column, usually because I'm already joining on D_Time for a separate constraint or to get a time reference for the time I'm working with (ex: end of previous month when doing comparisons between today and previous month end).

Another thing to consider about using datetime as key and as constraint is if you put any converts or dateparts on your datetime field in the where or join clause, as far as I know and have tested, SQL will not use the index associated with that field. So it is better for performance if you join on an appropriate field in the time dimension without changing the raw data in the field and also then putting an index on that column if it is used in many queries.

From my experience, unless you're working with a very large chunk of time (greater than 5 years) and have a very tiny time granularity (less than an hour), a join on a time table really doesn't affect performance significantly, again assuming the appropriate indexes have been created.

If anyone thinks to the contrary, please post. I've never learned so much I think I have nothing left to learn. =D

Correction: It will do an index scan rather than index seek, because it will convert every time using the function you've given it before it uses the index.


-------------------------------------------------------------------------------------------------
My SQL Server Blog
Post #953404
Posted Thursday, July 15, 2010 2:53 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 9:18 PM
Points: 128, Visits: 823
The example doesn't really show off the capabilities of a "Time" table. I'm used to very wide tables that contain columns to help you count work days, holidays, fiscal days, gregorian, julian, and week days in any given time period by adding up the ones and zeros in a given column where there is one row per day. It also allows you to cross reference work day 145 with a calendar date, or to find which fiscal month it relates to.
Post #953495
Posted Thursday, July 15, 2010 2:56 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 2:33 AM
Points: 337, Visits: 1,982
There is logic behind not using the date as the key to the date dimension... it stops developers from falling into the easy trap of using the key as the date and not bothering to join onto the date dimension table!

Post #953496
Posted Thursday, July 15, 2010 6:57 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 10:02 PM
Points: 21,342, Visits: 15,017
Thanks for taking the time to put this together and share with us.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server


SQL RNNR

Posting Performance Based Questions - Gail Shaw
Posting Data Etiquette - Jeff Moden
Hidden RBAR - Jeff Moden
VLFs and the Tran Log - Kimberly Tripp
Post #953567
Posted Friday, July 16, 2010 2:15 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Friday, July 18, 2014 9:09 AM
Points: 870, Visits: 2,385
Samuel Vella (7/15/2010)
There is logic behind not using the date as the key to the date dimension... it stops developers from falling into the easy trap of using the key as the date and not bothering to join onto the date dimension table!



True, however you do loose the ability to maintain a sequential set of date information, as its possible that 02-Jan-2011 is not the next key in sequence to 01-Jan-2011 thus it slows performance in not only the ETL process but also the reporting of the data.

The idea behind a Key on a table is a unique reference, and time can never have more than one occurance thus it makes sense to use the date in a data warehouse as the key on both the date dimension and also on the Fact table as it cannot have more than one entry.

You then only have to lookup to the date column to get any flags such as Public Holidays, Seasons, Financial Periods (Open, Closed, current), etc.

One way to alieveate this is to convert the Datetime to an INT, however this has a few draw backs, the main one being due rounding when cast a date that has gone over Midday, it gets rounded up to the next day, but as long as you know the pitfalls its pretty easy to avoid.

edit : dates wrong in the first paragraph


_________________________________________________________________________
SSC Guide to Posting and Best Practices
Post #953664
Posted Friday, July 16, 2010 12:56 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:58 PM
Points: 36,794, Visits: 31,253
Now there's an idea for an article... comparing the pros and cons of using a DateTime dates compared to INT dates.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #954116
Posted Saturday, July 31, 2010 8:27 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:58 PM
Points: 36,794, Visits: 31,253
Looking at the "Tally" table code that someone else wrote, I can certainly see why it's so slow. No chance of the clustered index being used there!

Thanks for the honorable mention (heh... I think ), Todd. Considering how badly someone abused the Tally table, you've just got to know that I'll be doing a full blown code review there.

Also, do you happen to have the link where that "Tally" code originated? I need to set things straight with the author of that code (BWAAA-HAAA!!! Hopefully, it wasn't me under the influence of cold medicine. )


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #961839
Posted Tuesday, August 10, 2010 9:17 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, December 10, 2012 9:55 PM
Points: 1, Visits: 62
Thanks for the article! As a relative newcomer to SQL I'd be intested in comments on using a UDF to generate a similar table of dates. below is one I've been using with parameters to set the date range and switch whether it returns days, months and years. Are there performance issues with this approach?

set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
go

/*
* PAREMETERS: @DatePart: 0 = Day, 1 = Month, 2 = Year
* USE: SELECT * FROM dbo.fn_Generate_Dates_V2 ('2000-01-01', '2025-01-01', 0)
*/

CREATE FUNCTION [dbo].[fn_Generate_Dates_V2](@StartDate SmallDateTime, @EndDate SmallDateTime, @DatePart int = 0)
RETURNS @returntable
TABLE ( Date_Key int IDENTITY,
Date_Value SmallDatetime,
Last_Month smalldatetime,
Next_Month smalldatetime,
Day_Value int,
Day_Name varchar (20),
Month_Value int,
Month_Name varchar(20),
Year_Value int) AS
BEGIN
WHILE (@StartDate < @EndDate)
BEGIN
INSERT @returntable (
Date_Value,
Last_Month,
Next_Month,
Day_Value,
Day_Name,
Month_Value,
Month_Name,
Year_Value)
VALUES (
@StartDate,
DateAdd(Month, -1, @StartDate),
DateAdd(Month, 1, @StartDate),
DatePart(Day, @StartDate),
Datename(dw,@StartDate),
DatePart(Month, @StartDate),
Datename(Month, @StartDate),
DatePart(yyyy, @StartDate)
)

SELECT @StartDate = CASE @DatePart
WHEN 1 THEN DateAdd(Month, 1, @StartDate)
WHEN 2 THEN DateAdd(Year, 1, @StartDate)
ELSE DateAdd(Day, 1, @StartDate)
END
END
RETURN
END




Post #967105
Posted Thursday, February 23, 2012 11:45 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: 2 days ago @ 5:49 PM
Points: 401, Visits: 547
You may want to consider using a DATENAME() instead of the cumbersome CASE statement. That would also allow some language variation dependant on the default language.
Post #1257125
« Prev Topic | Next Topic »

Add to briefcase «««12345»»»

Permissions Expand / Collapse