RE: Date and Time dimension creation and population T-SQL

Ten Centuries

Points: 1326

February 13, 2009 at 9:47 am

jwalker8680 (2/13/2009)
Actually... "The Data Warehouse Toolkit Second Edition" by Ralph Kimball specifically mentions this on page 60, last paragraph.
You should avoid using a readable date as the key, and instead use a surrogate key as some developers may use the readable format to bypass joining to the Date dimension.
Besides that... All of the other tables in my schemas have an autogenerated integer PK simply named [ID] and I prefer to keep it that way for consistancy.

That's odd. In the Microsoft Data Warehouse Toolkit on page 57, they say "Where other surrogate keys are usually a meaningless sequence of integers, it's a good idea to use a meaningful value for the Date surrogate key. Specifically, use an integer that corresponds to the date in year-month-day order... This can lead to more efficient queries against the relational database. It also makes implementing date-based partitioning much easier, and the partition management function will be more understandable".

They go on to state that sometimes they use smalldatetime instead of a role-played date dimension in some cases.

In particular, I like that the natural integer does let you do things like DATA_DT_ID BETWEEN START_DT_ID and END_DT_ID, which with a pure surrogate you have to express through the joins to the dimension table. This can make a huge difference in performance. I understand the point about developers not going through the dimension - if you can't trust your developers not to make date mistakes, that can be helpful.