SQLServerCentral Article

Mastering Dimensions of Time

,

Recently I had a project to create a Data Warehouse for our small business to use for reporting and analysis.

We needed date and time Dimension tables to store values for any date or time in our datasources.  We had no requirements to support multiple regions or languages, but needed a schema that could be updated easily and expanded on if needed.  There was also a requirement that the date table row id value be based on the actual date stored in each row.

We decided on the following two tables:

CREATE TABLE [dbo].[DimDate](
       [DateKey] [int] NOT NULL,
       [FullDateAlternateKey] [date] NOT NULL,
       [DateString] [varchar](10) NULL,
       [DayOfWeek] [tinyint] NOT NULL,
       [DayOfWeekName] [nvarchar](10) NOT NULL,
       [DayOfMonth] [tinyint] NOT NULL,
       [DayOfYear] [smallint] NOT NULL,
       [WeekOfYear] [tinyint] NOT NULL,
       [MonthName] [nvarchar](10) NOT NULL,
       [MonthOfYear] [tinyint] NOT NULL,
       [CalendarQuarter] [tinyint] NOT NULL,
       [CalendarYear] [smallint] NOT NULL,
       [IsWeekend] [bit] NOT NULL,
       [IsLeapYear] [bit] NOT NULL,
 CONSTRAINT [PK_DimDate] PRIMARY KEY CLUSTERED
(
       [DateKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[DimDate] ADD  CONSTRAINT [DF_DimDate_IsWeekend]  DEFAULT ((0)) FOR [IsWeekend]
GO
ALTER TABLE [dbo].[DimDate] ADD  CONSTRAINT [DF_DimDate_IsLeapYear]  DEFAULT ((0)) FOR [IsLeapYear]
GO
 CREATE TABLE [dbo].[DimTime](
       [TimeKey] [int] NOT NULL,
       [TimeAltKey] [time](0) NULL,
       [HourOfDay] [tinyint] NULL,
       [MinuteOfHour] [tinyint] NULL,
       [SecondOfMinute] [tinyint] NULL,
       [TimeString] [varchar](8) NULL,
 CONSTRAINT [PK_DimTime] PRIMARY KEY CLUSTERED
(
       [TimeKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Now we need to populate the time and date dimension tables with some data.  We do not need every possible date in the table, but at least all the possible dates that could be in our data set.  Let’s make the scripts re-useable.  When executed with optional parameters they can either fill in gaps, populate entire ranges, or just a specific date. This should allow us to fill and then update the Date dimension table with only the data that is needed.  Since the time table has values for just one 24 hour period, its procedure should fill the entire table at once. 

Here is what we came up with. To populate DimTime:

-- Declare and set variables for loop
Declare
@StartTime time(0),
@EndTime time(0),
@Time time(0),
@Interval int
Set @StartTime = '00:00:00'
Set @EndTime = '23:59:59'
Set @Time = @StartTime
Set @Interval = 1
-- Loop through Times
WHILE @Time <= @EndTime
BEGIN
    -- Insert record in dimension table if Time does not exist
    IF NOT EXISTS ( SELECT 'X' from dbo.DimTime (NOLOCK) where TimeAltKey = @Time )
    BEGIN
  
    INSERT Into DimTime with(rowlock)
    (
    TimeKey,
    TimeAltKey,
    [HourOfDay],
    [MinuteOfHour],
    SecondOfMinute,
    TimeString
    )
    Values
    (
    @Interval,
    @Time,
    datename(hh,@Time),
    datename(minute,@time),
    datename(second,@time),
    CONVERT(varchar(8), @time)
    )
    END
    -- Goto next second
    Select @Time = Dateadd(ss,1, @Time)
    Select @Interval = @Interval + 1
END

Notice that there is a section that checks to see if the value being added already exists.  We also made the script so that if we wanted to add more precision to the time table later, that would be supported.  It is recommended to create a new time dimension table if you are going to increase the precision of your time values.  This will allow you to keep the existing surrogate key values from the initial time dimension table in your Data warehouse, or at the least convert them to the correct values in the new time dimension table.

Now it is time to populate the Date table with all possible dates in our data source.  A quick check through all the date values in the data source shows that we need to support dates between January 2005 and December 2016. Here is our script for populating these values:

-- Declare and set variables for loop
Declare
@StartDate date,
@EndDate date,
@Date date
Set @StartDate = '2005-01-01'
Set @EndDate = '2016-01-01'
Set @Date = @StartDate
-- Loop through dates
WHILE @Date <= @EndDate
BEGIN
    -- Check for leap year
    DECLARE @IsLeapYear BIT
    IF ((Year(@Date) % 4 = 0) AND (Year(@Date) % 100 != 0 OR Year(@Date) % 400 = 0))
    BEGIN
        SELECT @IsLeapYear = 1
    END
    ELSE
    BEGIN
        SELECT @IsLeapYear = 0
    END
    -- Check for weekend
    DECLARE @IsWeekend BIT
    IF (DATEPART(dw, @Date) = 1 OR DATEPART(dw, @Date) = 7)
    BEGIN
        SELECT @IsWeekend = 1
    END
    ELSE
    BEGIN
        SELECT @IsWeekend = 0
    END
    -- Insert record in dimension table if date does not exist
    IF NOT EXISTS ( SELECT 'X' from dbo.DimDate (NOLOCK) where FullDateAlternateKey = @Date )
    BEGIN
   
    INSERT Into DimDate with(rowlock)
    (
    DateKey,
    FullDateAlternateKey,
    DateString,
    [DayofWeek],
    [Dayofweekname],
    [DayOfMonth],
    [DayOfYear],
    [WeekOfYear],
    [MonthName],
    MonthOfYear,
    [CalendarQuarter],
    [CalendarYear],
    [IsWeekend],
    [IsLeapYear]
    )
    Values
    (
    Convert(INT, CONVERT(varchar(10), @Date, 112)),
    @Date,
    CONVERT(varchar(10), @Date, 105),
    DATEPART(dw, @Date),
    DATENAME(dw, @Date),
    Day(@Date),
    DATEPART(dy, @Date),
    DATEPART(wk, @Date),
    DATENAME(mm, @Date),
    DATEPART(mm, @Date),
    DATENAME(qq, @Date),
    Year(@Date),
    @IsWeekend,
    @IsLeapYear
    )
   
    END
    -- Goto next day
    Set @Date = dateadd(day,1,@Date )
END

This script is not only fast and updateable, notice how flexible it is.  We also made the insert loop check for and skip existing values that are in the supplied date range.

There might be several other ways of doing this without using a Row By Row script. We could not find one that was as easy to read and understand, or that would be as modifiable and portable as this one. Also, since we are not talking about millions of rows it was faster in a big picture sort of way to do it like this.

I hope that others find this article and the information in it as useful as we did.  Enjoy!

NOTE: This script creates the Date Dimension table sets the Primary Key and clustered Index of the table to the column called [DateKey].  This is just what worked well for us.  You may find better results by changing this, but that is up to you.

Rate

3.79 (24)

You rated this post out of 5. Change rating

Share

Share

Rate

3.79 (24)

You rated this post out of 5. Change rating