Mastering Dimensions of Time

  • Interesting approach. I am too lazy. I just go to either the kimballgroup's website or Don McMunn's site...........then extend as necessary. 😀

  • Here's an extract of the procedure to populate date dimension (I removed columns needed in my company and left some that might be relevant). We populate it by year, sowe don't need a large number of rows to be included, but it can grow if needed.

    I'm using the easiest way to populate a Tally table by using master.sys.all_columns. I could easily change it to a cte construct, but as it's a process that will run once a year, I prefer simplicity over performance to a certain degree.

    Here's an option without while loops (RBAR) or recursive CTEs (hidden RBAR).

    DECLARE @Year int = 2014;

    SET DATEFIRST 1;

    WITH Tally(n) AS(

    SELECT TOP 400 DATEADD( dd, ROW_NUMBER() OVER( ORDER BY (SELECT NULL)), (SELECT MAX(fecha) FROM DWESTRELLAS.dbo.dimDate WHERE WeekYear = @Year - 1))

    FROM sys.all_columns),

    cteDays AS(

    SELECT n AS cal_date,

    DATEADD( WEEK, DATEDIFF( WEEK, 0, n - 1), 0) AS Monday,

    DATEADD( DAY, 6, DATEADD( WEEK, DATEDIFF( WEEK, 0, n - 1), 0)) AS Sunday

    FROM Tally

    )

    SELECT CONVERT( char(8), cal_date, 112) idDate,

    CAST( cal_date AS smalldatetime) cal_date,

    YEAR( cal_date) cal_year,

    MONTH( cal_date) cal_month,

    DAY( cal_date) cal_day,

    DATEPART( DAYOFYEAR, cal_date) day_year,

    YEAR( Sunday) WeekYear,

    DATEPART( WEEK, Sunday) cal_Week,

    DATEPART( QUARTER, cal_date) cal_Quarter,

    UPPER(REPLACE( CONVERT( char(6), Monday, 113), ' ', '')) + '-'

    + UPPER(REPLACE( CONVERT( char(6), Sunday, 113), ' ', '')) week_desc,

    UPPER( CONVERT( char(6), Sunday, 113)) SundayDesc,

    DATEPART( WEEKDAY, cal_date) cal_weekday,

    UPPER( DATENAME( WEEKDAY, cal_date)) weekdayName,

    CAST( YEAR( Sunday) AS CHAR(4)) + RIGHT( '0' + CAST( DATEPART( WEEK, Sunday) AS varchar(2)), 2) idSemana,

    CASE WHEN DATEPART( DAY, Sunday) > 3

    THEN UPPER( RIGHT( CONVERT( char(6), Sunday, 113), 3)) + '-' + RIGHT( YEAR(Sunday), 2)

    ELSE UPPER( RIGHT( CONVERT( char(6), Monday, 113), 3)) + '-' + RIGHT( YEAR(Monday), 2) END

    MonthDesc

    FROM cteDays

    WHERE YEAR( Sunday) = @Year

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • mtassin (1/21/2014)


    Jeff Moden (1/21/2014)


    Just a couple of thoughts...

    First, these two tables are one of the few places where I'd use the natural key of either the date or the time instead of a surrogate key. There's not much worse than trying to troubleshoot something by looking at rows in a table only to have to lookup every single date or time ID. Using the natural key would allow you to not have to make multiple time tables for the sake of continuing to use a surrogate key while still providing more resolution in the time elements.

    Also, the code isn't actually portable. That possibility ended as soon as you defined a variable. I don't know about other databases but, last I heard, Oracle (for example) does not use @ variables.

    And see here... I found using an identity column with a start date equivalent to the beginning of data history i.e. for the last few DW's I've built 2000-01-01 to be very handy.

    To calculate the keys for the DW, I could just DATEDIFF(dd,'1999-12-31',datadate) and not have to do a lookup. For time, 00:00:00 was key 1 and 1440 = 23:59 DATEDIFF(N,cast('00:00:00' as time),cast(datadate as time))+1 generated that key. Typically when a greater depth of granularity was necessary I'd just include the datetime value in the fact table.

    Yes it prevented me from backfilling, but see... I used a Tally table to fill the date dimension... so meh? And to calculate the duration in days, I could just subtract one key from another :).

    I guess I don't understand why you'd take the difference between date serial numbers when DATEDIFF does just fine but to each their own.

    I know it's a personal preference but I've worked with "DateDIM" tables that use date and time serial numbers and I've never understood why people would bother. Maybe it was to save on disk space (4 byte INT v.s. 8 byte DATETIME) before extremely large hard disks became so relatively inexpensive but I just don't see the need for it anymore. Add the DATE and TIME datatypes into the mix and I really don't see a need for a surrogate key for any type of calendar or time based table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • raulggonzalez (1/21/2014)


    And for this particular case, I cannot see any 'true set based' alternative, maybe because I'm used to recursive CTE to generate this kind of samples.

    See the example code that Luis posted above for one possibility.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • peter-757102 (1/21/2014)


    We use such date tables as well, but structured quite differently, see the script code for details.

    What it amounts to is our own integer based structured date type, with years written as 20140000, months as 20141200 and days as 20141231. It is easy to use in all sorts of code, filters well and even allows for a hierarchy in a result-set (distinct year, month and day results all fit the same mold). Having years, months and days in their own table in turn allows for simple queries when it comes to filtering and covering date ranges, even when there is sparse data to work with.

    The storage is minimal and even less thanks to the use of computed columns where possible. We work with ISO week numbers, but any kind of week number can be stuffed in there, as well as leap years and all that. It is more of a starting framework with favorable properties, on which to build as required. Feel free to use and modify as you see fit.

    create schema [Calendar] authorization dbo;

    go

    -- Calendar Segment table (list of all acceptable IdSegement values)

    --

    create table [Calendar].[Segment]

    (

    [Id] int not null

    , constraint pk_CalendarSegment primary key clustered( [Id] ) with fillfactor = 100

    )

    go

    -- Calendar Year table

    --

    create table [Calendar].[Year]

    (

    [IdSegment] int not null

    , [Year] smallint not null

    , [DayCount] smallint not null

    , [IdSegmentStart] as ( [IdSegment] + 0101 )

    , [IdSegmentEnd] as ( [IdSegment] + 1231 )

    , [DateStart] as ( dateAdd( year, [Year] - 2000, {d '2000-01-01'} ) )

    , [DateNext] as ( dateAdd( year, [Year] - 2000, {d '2001-01-01'} ) )

    , constraint pk_CalendarYear primary key nonclustered( [IdSegment] ) with fillfactor = 100

    , constraint fk_CalendarYear_Segment foreign key ( [IdSegment] ) references [Calendar].[Segment]( Id )

    );

    go

    create unique nonclustered index ak_CalendarYear on [Calendar].[Year]( [Year] ) include ( [IdSegment], DayCount ) with fillfactor = 100; -- fully covering

    go

    -- Calendar Month table

    --

    create table [Calendar].[Month]

    (

    [IdSegment] int not null

    , [Year] smallint not null

    , [Month] tinyint not null

    , [DayCount] smallint not null

    , [IdSegmentStart] as ( [IdSegment] + 01 )

    , [IdSegmentEnd] as ( [IdSegment] + [DayCount] )

    , [DateStart] as ( dateAdd( month, [Month] - 1, dateAdd( year, [Year] - 2000, {d '2000-01-01'} ) ) )

    , [DateNext] as ( dateAdd( month, [Month] - 1, dateAdd( year, [Year] - 2000, {d '2000-02-01'} ) ) )

    , constraint pk_CalendarMonth primary key clustered( [IdSegment] ) with fillfactor = 100

    , constraint fk_CalendarMonth_Segment foreign key ( [IdSegment] ) references [Calendar].[Segment]( Id )

    , constraint ck_CalendarMonth_Month check ([Month] >= 1 and [Month] <= 12)

    , constraint ck_CalendarMonth_DayCount check ([DayCount] >= 1 and [DayCount] <= 31)

    );

    go

    create unique nonclustered index ak_CalendarMonth_year on [Calendar].[Month] ( [Year], [Month] ) include ( [IdSegment], DayCount ) with fillfactor = 100; -- fully covering

    go

    create unique nonclustered index ak_CalendarMonth_month on [Calendar].[Month] ( [Month], [year] ) include ( [IdSegment], DayCount ) with fillfactor = 100; -- fully covering

    go

    -- Calendar Day table

    --

    create table [Calendar].[Day]

    (

    [IdSegment] int not null

    , [Year] smallint not null

    , [Month] tinyint not null

    , [Day] tinyint not null

    , [DayCount] as ( cast( 1 as smallint ) )

    , [IdSegmentStart] as ( [IdSegment] )

    , [IdSegmentEnd] as ( [IdSegment] )

    , [DateStart] as ( dateAdd( day, [Day] - 1, dateAdd( month, [Month] - 1, dateAdd( year, [year] - 2000, {d '2000-01-01'} ) ) ) )

    , [DateNext] as ( dateAdd( day, [Day] , dateAdd( month, [Month] - 1, dateAdd( year, [year] - 2000, {d '2000-01-01'} ) ) ) )

    , [Week] tinyint

    , [WeekDay] tinyint -- [1..7] = monday..sunday

    , [YearDay] smallint

    , constraint pk_CalendarDay primary key clustered( [IdSegment] ) with fillfactor = 100

    , constraint fk_CalendarDay_Segment foreign key ( [IdSegment] ) references [Calendar].[Segment]( Id )

    , constraint ck_CalendarDay_Month check ([Month] >= 1 and [Month] <= 12)

    , constraint ck_CalendarDay_Day check ([Day] >= 1 and [Day] <= 31)

    );

    We fill these tables with some initialization code that is run once:

    -- Generate data to test with

    --

    declare @year as int; set @year = 2000;

    declare @month as int;

    declare @date as datetime;

    while @year <= 2150 begin

    set @date = dateAdd( year, @year - 2000, {d '2000-01-01'} );

    set @month = 1;

    insert into [Calendar].[Segment]( Id )

    select @year * 10000;

    insert into [Calendar].[Year]( IdSegment, Year, DayCount )

    select @year * 10000, @year, datediff( day, @date, dateAdd( year, 1, @date ) );

    while @month <= 12 begin

    insert into [Calendar].[Segment]( Id )

    select @year * 10000 + @month * 100;

    insert into [Calendar].[Month]( IdSegment, Year, Month, DayCount )

    select @year * 10000 + @month * 100, @year, month( @date ), datediff( day, @date, dateAdd( month, 1, @date ) );

    while month( @date ) = @month begin

    insert into [Calendar].[Segment]( Id )

    select @year * 10000 + @month * 100 + day( @date );

    insert into [Calendar].Day( IdSegment, Year, Month, Day, Week, WeekDay, YearDay )

    select @year * 10000 + @month * 100 + day( @date ), @year, month( @date ), day( @date ), datePart( iso_week, @date ), datePart( weekDay , @date ), datePart( dayOfYear, @date );

    set @date = @date + 1;

    end

    set @month = @month + 1;

    end

    set @year = @year + 1;

    end

    go

    Peter,

    I know that a non-persisted computed column saves space but it's also recalculated on every use and non-indexable. Since the data won't actually ever change, I'd recommend actually dedicating the space to hard data even above using a persisted computed column, which could also be indexed. It's also not so much space because even "big" calendar tables are relatively small. The performance gains could be quite substantial depending on how you're using the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • PHYData DBA (1/21/2014)


    I want to thank everyone that took the time this morning to read my article and comment on it.

    I had no idea that my very first submission would end up being the top link on the daily email newsletter!

    Thank you all for making me feel well read and received! 😀

    Thanks for making the effort! Your article has elicited some good discussions!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • akosinsk (1/21/2014)


    If you need to define your internal, company calendar i.e. "4/4/5" in time table, such a script may become a little more interesting.

    If you have such a thing, it would likely be helpful to many if you posted it. Or, maybe even write an article about it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/21/2014)


    PHYData DBA (1/21/2014)


    I want to thank everyone that took the time this morning to read my article and comment on it.

    I had no idea that my very first submission would end up being the top link on the daily email newsletter!

    Thank you all for making me feel well read and received! 😀

    Thanks for making the effort! Your article has elicited some good discussions!

    Thank you Jeff. That was the idea.

    To show others my simple solution and open a discussion on how they would do things themselves.

  • Luis Cazares (1/21/2014)


    Here's an extract of the procedure to populate date dimension (I removed columns needed in my company and left some that might be relevant). We populate it by year, sowe don't need a large number of rows to be included, but it can grow if needed.

    I'm using the easiest way to populate a Tally table by using master.sys.all_columns. I could easily change it to a cte construct, but as it's a process that will run once a year, I prefer simplicity over performance to a certain degree.

    Here's an option without while loops (RBAR) or recursive CTEs (hidden RBAR).

    DECLARE @Year int = 2014;

    SET DATEFIRST 1;

    WITH Tally(n) AS(

    SELECT TOP 400 DATEADD( dd, ROW_NUMBER() OVER( ORDER BY (SELECT NULL)), (SELECT MAX(fecha) FROM DWESTRELLAS.dbo.dimDate WHERE WeekYear = @Year - 1))

    FROM sys.all_columns),

    cteDays AS(

    SELECT n AS cal_date,

    DATEADD( WEEK, DATEDIFF( WEEK, 0, n - 1), 0) AS Monday,

    DATEADD( DAY, 6, DATEADD( WEEK, DATEDIFF( WEEK, 0, n - 1), 0)) AS Sunday

    FROM Tally

    )

    SELECT CONVERT( char(8), cal_date, 112) idDate,

    CAST( cal_date AS smalldatetime) cal_date,

    YEAR( cal_date) cal_year,

    MONTH( cal_date) cal_month,

    DAY( cal_date) cal_day,

    DATEPART( DAYOFYEAR, cal_date) day_year,

    YEAR( Sunday) WeekYear,

    DATEPART( WEEK, Sunday) cal_Week,

    DATEPART( QUARTER, cal_date) cal_Quarter,

    UPPER(REPLACE( CONVERT( char(6), Monday, 113), ' ', '')) + '-'

    + UPPER(REPLACE( CONVERT( char(6), Sunday, 113), ' ', '')) week_desc,

    UPPER( CONVERT( char(6), Sunday, 113)) SundayDesc,

    DATEPART( WEEKDAY, cal_date) cal_weekday,

    UPPER( DATENAME( WEEKDAY, cal_date)) weekdayName,

    CAST( YEAR( Sunday) AS CHAR(4)) + RIGHT( '0' + CAST( DATEPART( WEEK, Sunday) AS varchar(2)), 2) idSemana,

    CASE WHEN DATEPART( DAY, Sunday) > 3

    THEN UPPER( RIGHT( CONVERT( char(6), Sunday, 113), 3)) + '-' + RIGHT( YEAR(Sunday), 2)

    ELSE UPPER( RIGHT( CONVERT( char(6), Monday, 113), 3)) + '-' + RIGHT( YEAR(Monday), 2) END

    MonthDesc

    FROM cteDays

    WHERE YEAR( Sunday) = @Year

    aaaaaaaahhhhhhh.... I do wish you lived in the Detroit area.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • PHYData DBA (1/21/2014)


    I want to thank everyone that took the time this morning to read my article and comment on it.

    I had no idea that my very first submission would end up being the top link on the daily email newsletter!

    Thank you all for making me feel well read and received! 😀

    Thanks, your effort has shown that there are many ways to achieve the same result. Some solutions work better for different systems. You got people thinking and learning, the main reason I enjoy articles on this site.

  • Jeff Moden (1/21/2014)


    aaaaaaaahhhhhhh.... I do wish you lived in the Detroit area.

    I wouldn't mind to change my address. Especially to a city that hosts an NFL team.:-D

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I agree strongly with peter-757102 and raulggonzalez.

    However the point of a date dimension is not that is stores all of the days of a year. That's trivial. The point is that it pre-calculates things that are less obvious. Is May 27 a holiday? If so, what is the name of the holiday? When does Easter fall this year? How many working days are there in April this year? Perhaps more importantly, was such-and-such critical thing to my application, say the NYSE, closed by unusual circumstances on such and such a day.

    Heavily specialised calendars are even more important: When is the next day when the LSE is closed and the NYSE is open?

    These are the sort of things that motivate date dimensions, not avoiding calls to datepart.

  • Nice work on the article.

    I was putting together a way to populate dimdate using a tally table but see that Louis beat me to it. Thought I would include this anyhow as another example of how to do this using a set-based approach.

    DECLARE @StartDate date= '2005-01-01',

    @EndDate date = '2016-01-01';

    WITH tally(n) AS

    (

    SELECT TOP (DATEDIFF(dd,@startDate,@endDate)+1)

    ROW_NUMBER() OVER (ORDER BY (SELECT ($)))-1

    FROM sys.all_columns a CROSS JOIN sys.all_columns b

    ),

    daterange(dt) AS

    (

    SELECT DATEADD(DD,n,@StartDate)

    FROM tally

    )

    --INSERT dbo.DimDate

    SELECT

    DateKey =REPLACE(CONVERT(CHAR(10),dt),'-',''),

    FullDateAlternateKey =dt,

    DateString =CONVERT(varchar(10),dt,105),

    [DayOfWeek] =DATEPART(weekday,dt),

    [DayOfWeekName] =DATENAME(weekday,dt),

    [DayOfMonth] =DATEPART(dd,dt),

    [DayOfYear] =DATEPART(dy,dt),

    [WeekOfYear] =DATEPART(wk,dt),

    [MonthName] =DATENAME(mm,dt),

    [MonthOfYear] =DATEPART(mm,dt),

    [CalendarQuarter] =DATEPART(qq,dt),

    CalendarYear =YEAR(dt),

    IsWeekend =

    CASE

    WHEN DATEPART(weekday,dt) IN(1,7)

    THEN 1 ELSE 0 END,

    isLeapYear =

    CASE

    WHEN ((Year(dt) % 4 = 0)

    AND (Year(dt) % 100 <> 0

    OR Year(dt) % 400 = 0))

    THEN 1 ELSE 0 END

    FROM daterange;

    Edit (tiny): code was misaligned.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • steven.ensslen (1/21/2014)


    I agree strongly with peter-757102 and raulggonzalez.

    However the point of a date dimension is not that is stores all of the days of a year. That's trivial. The point is that it pre-calculates things that are less obvious. Is May 27 a holiday? If so, what is the name of the holiday? When does Easter fall this year? How many working days are there in April this year? Perhaps more importantly, was such-and-such critical thing to my application, say the NYSE, closed by unusual circumstances on such and such a day.

    Heavily specialised calendars are even more important: When is the next day when the LSE is closed and the NYSE is open?

    These are the sort of things that motivate date dimensions, not avoiding calls to datepart.

    I completely agrre with your point, but each company will have its own requirements. That's why posting a complete solution it's insane.

    I just tried to post something that could be used as an example of a set based solution. Complete calendar tables scripts are posted through the web but most of them are made to fit a particular need and wouldn't work in my company.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Jeff Moden (1/21/2014)


    I know it's a personal preference but I've worked with "DateDIM" tables that use date and time serial numbers and I've never understood why people would bother. Maybe it was to save on disk space (4 byte INT v.s. 8 byte DATETIME) before extremely large hard disks became so relatively inexpensive but I just don't see the need for it anymore. Add the DATE and TIME datatypes into the mix and I really don't see a need for a surrogate key for any type of calendar or time based table.

    A rich date dimension is often necessary for the "slice and dice" that end-users will do with a SSAS cube. They may want to look at sales on Tuesday vs. Sales on Thursday, or the average of sales on first Saturday of the month vs. all the others. That is why one would refer out to a lush date table. Considering that, in OLTP, DATETIME often is populated with GETDATE() or VB.NET's Now(), using it as the reference key would require the application or ETL process to strip off at minimum the fractional 10000ths/second. At that point, why not shave off four bytes of what is in essence unnecessary storage?

    As for the data space savings, with SSAS cubes, RAM is also an issue as well as disk -- the need to cache rowsets may become critical. an extra 4 bytes per row could bring a performance benefit.

    That said, using a surrogate INT that is an offset of a known unit from a known base date (which could be stored in a one row one column table) is an excellent idea. That should easily populate with no need to go (eww) RBAR.

    Thanks

    John.

Viewing 15 posts - 16 through 30 (of 57 total)

You must be logged in to reply to this topic. Login to reply