Recently I had a project to create a Data Warehouse for our small business to use for reporting and analysis.
We needed date and time Dimension tables to store values for any date or time in our datasources. We had no requirements to support multiple regions or languages, but needed a schema that could be updated easily and expanded on if needed. There was also a requirement that the date table row id value be based on the actual date stored in each row.
We decided on the following two tables:
CREATE TABLE [dbo].[DimDate]( [DateKey] [int] NOT NULL, [FullDateAlternateKey] [date] NOT NULL, [DateString] [varchar](10) NULL, [DayOfWeek] [tinyint] NOT NULL, [DayOfWeekName] [nvarchar](10) NOT NULL, [DayOfMonth] [tinyint] NOT NULL, [DayOfYear] [smallint] NOT NULL, [WeekOfYear] [tinyint] NOT NULL, [MonthName] [nvarchar](10) NOT NULL, [MonthOfYear] [tinyint] NOT NULL, [CalendarQuarter] [tinyint] NOT NULL, [CalendarYear] [smallint] NOT NULL, [IsWeekend] [bit] NOT NULL, [IsLeapYear] [bit] NOT NULL, CONSTRAINT [PK_DimDate] PRIMARY KEY CLUSTERED ( [DateKey] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO SET ANSI_PADDING OFF GO ALTER TABLE [dbo].[DimDate] ADD CONSTRAINT [DF_DimDate_IsWeekend] DEFAULT ((0)) FOR [IsWeekend] GO ALTER TABLE [dbo].[DimDate] ADD CONSTRAINT [DF_DimDate_IsLeapYear] DEFAULT ((0)) FOR [IsLeapYear] GO CREATE TABLE [dbo].[DimTime]( [TimeKey] [int] NOT NULL, [TimeAltKey] [time](0) NULL, [HourOfDay] [tinyint] NULL, [MinuteOfHour] [tinyint] NULL, [SecondOfMinute] [tinyint] NULL, [TimeString] [varchar](8) NULL, CONSTRAINT [PK_DimTime] PRIMARY KEY CLUSTERED ( [TimeKey] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
Now we need to populate the time and date dimension tables with some data. We do not need every possible date in the table, but at least all the possible dates that could be in our data set. Let’s make the scripts re-useable. When executed with optional parameters they can either fill in gaps, populate entire ranges, or just a specific date. This should allow us to fill and then update the Date dimension table with only the data that is needed. Since the time table has values for just one 24 hour period, its procedure should fill the entire table at once.
Here is what we came up with. To populate DimTime:
-- Declare and set variables for loop Declare @StartTime time(0), @EndTime time(0), @Time time(0), @Interval int Set @StartTime = '00:00:00' Set @EndTime = '23:59:59' Set @Time = @StartTime Set @Interval = 1 -- Loop through Times WHILE @Time <= @EndTime BEGIN -- Insert record in dimension table if Time does not exist IF NOT EXISTS ( SELECT 'X' from dbo.DimTime (NOLOCK) where TimeAltKey = @Time ) BEGIN INSERT Into DimTime with(rowlock) ( TimeKey, TimeAltKey, [HourOfDay], [MinuteOfHour], SecondOfMinute, TimeString ) Values ( @Interval, @Time, datename(hh,@Time), datename(minute,@time), datename(second,@time), CONVERT(varchar(8), @time) ) END -- Goto next second Select @Time = Dateadd(ss,1, @Time) Select @Interval = @Interval + 1 END
Notice that there is a section that checks to see if the value being added already exists. We also made the script so that if we wanted to add more precision to the time table later, that would be supported. It is recommended to create a new time dimension table if you are going to increase the precision of your time values. This will allow you to keep the existing surrogate key values from the initial time dimension table in your Data warehouse, or at the least convert them to the correct values in the new time dimension table.
Now it is time to populate the Date table with all possible dates in our data source. A quick check through all the date values in the data source shows that we need to support dates between January 2005 and December 2016. Here is our script for populating these values:
-- Declare and set variables for loop Declare @StartDate date, @EndDate date, @Date date Set @StartDate = '2005-01-01' Set @EndDate = '2016-01-01' Set @Date = @StartDate -- Loop through dates WHILE @Date <= @EndDate BEGIN -- Check for leap year DECLARE @IsLeapYear BIT IF ((Year(@Date) % 4 = 0) AND (Year(@Date) % 100 != 0 OR Year(@Date) % 400 = 0)) BEGIN SELECT @IsLeapYear = 1 END ELSE BEGIN SELECT @IsLeapYear = 0 END -- Check for weekend DECLARE @IsWeekend BIT IF (DATEPART(dw, @Date) = 1 OR DATEPART(dw, @Date) = 7) BEGIN SELECT @IsWeekend = 1 END ELSE BEGIN SELECT @IsWeekend = 0 END -- Insert record in dimension table if date does not exist IF NOT EXISTS ( SELECT 'X' from dbo.DimDate (NOLOCK) where FullDateAlternateKey = @Date ) BEGIN INSERT Into DimDate with(rowlock) ( DateKey, FullDateAlternateKey, DateString, [DayofWeek], [Dayofweekname], [DayOfMonth], [DayOfYear], [WeekOfYear], [MonthName], MonthOfYear, [CalendarQuarter], [CalendarYear], [IsWeekend], [IsLeapYear] ) Values ( Convert(INT, CONVERT(varchar(10), @Date, 112)), @Date, CONVERT(varchar(10), @Date, 105), DATEPART(dw, @Date), DATENAME(dw, @Date), Day(@Date), DATEPART(dy, @Date), DATEPART(wk, @Date), DATENAME(mm, @Date), DATEPART(mm, @Date), DATENAME(qq, @Date), Year(@Date), @IsWeekend, @IsLeapYear ) END -- Goto next day Set @Date = dateadd(day,1,@Date ) END
This script is not only fast and updateable, notice how flexible it is. We also made the insert loop check for and skip existing values that are in the supplied date range.
There might be several other ways of doing this without using a Row By Row script. We could not find one that was as easy to read and understand, or that would be as modifiable and portable as this one. Also, since we are not talking about millions of rows it was faster in a big picture sort of way to do it like this.
I hope that others find this article and the information in it as useful as we did. Enjoy!
NOTE: This script creates the Date Dimension table sets the Primary Key and clustered Index of the table to the column called [DateKey]. This is just what worked well for us. You may find better results by changing this, but that is up to you.