Data Warehousing Tip #7 – How I build a time dimension

,

In this article I’m going to show you how I build my time dimensions.  There are lots of different ways to create time dimensions, and there are lots of different requirements, from the attributes in the dimension to the granularity of the data.  You can use these scripts to build an entire time dimension, and you can use them as a starting point for yours.

I’ve built time dimensions using various methods, but this is the one that I’ve been using for a while.  The big diversion that I take from the norm is that the key for my time dimension is not an integer, it’s a date.  I do this for a few reasons:

You don’t always need to join to the time dimension

The biggest cost associated with querying a well structured data warehouse is the joins.  When your time dimension key is a date and you are simply querying for data within a date range you don’t need to join to the time dimension to perform that query.  One less join.  Maybe more, depending on how many date keys you have in your fact table.

I can happily clear and reload the time dimension and not lose relationships

The way that I add new dates to my time dimension is to clear and reload the table with the new range.  It’s super quick and with a date as the key you’re guaranteed that the value of the key before will match the value of the key afterwards.  If you’re using an incrementing integer then that’s not going to happen.  You can hold your data as an INT in a YYYYMMDD format (e.g. 20190130), but if you’re doing that why not just go for a date?

Dates are smaller than INTs

The DATE data type is 3 bytes.  INTs are 4 bytes.  If you’re going down to intra-day precision such as hourly entries in your time dimension, your data type might be a DATETIME2 which would be 6 bytes in this scenario, compared to a BIGINT which is 8 bytes.

The Code

So enough about why I do it.  Here’s how I go about doing it.

Create and populate a LookupNumbers table

Firstly I create a table to store a lot of numbers.

CREATE TABLE [dbo].[LookupNumbers]
(
    [Number] INT NOT NULL 
)

Then I populate it with a lot of numbers.

-- Generate lots of numbers in sequence to allow us to efficiently generate dates for the time dimension
IF NOT EXISTS (SELECT 1
               FROM   LookupNumbers)
BEGIN
	PRINT 'Loading LookupNumbers'
	INSERT dbo.LookupNumbers (Number) VALUES (1);
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;
	INSERT dbo.LookupNumbers SELECT Number + (SELECT COUNT(*) FROM LookupNumbers) FROM LookupNumbers;	
END

Create a table valued function to get all of the dates within a range

--=====================================================================
--Created by: Chris Jenkins
--Date: 6th September 2018
--Description: Gets all dates within the passed in range. Used to 
-- populate the time dimension.
--=====================================================================
CREATE FUNCTION [dbo].[udf_GetAllDatesInRange]
(
    @dStartDate DATE
  , @dEndDate DATE
)
RETURNS TABLE
AS
RETURN (SELECT DATEADD(DAY,Number-1,@dStartDate) as TheDate
        FROM LookupNumbers
        WHERE Number <= DATEDIFF(DAY,@dStartDate,@dEndDate) + 1);

Now create the time dimension

CREATE TABLE [dbo].[DimDate]
(
    [DateKey] DATE NOT NULL, 
    [FullDateName] VARCHAR(30) NOT NULL, 
    [CalendarYear] SMALLINT NOT NULL,  
    [CalendarQuarterKey] INT NOT NULL, 
    [CalendarQuarterNumber] TINYINT NOT NULL,  
    [CalendarQuarterShortName] CHAR(2) NOT NULL, 
    [CalendarQuarterFullName] CHAR(9) NOT NULL, 
    [CalendarMonthKey] INT NOT NULL,  
    [CalendarMonthNumber] TINYINT NOT NULL,  
    [CalendarMonthShortName] CHAR(3) NOT NULL, 
    [CalendarMonthFullName] VARCHAR(9) NOT NULL, 
    [CalendarMonthYearShortName] CHAR(8) NOT NULL, 
    [CalendarMonthYearFullName] VARCHAR(14) NOT NULL, 
    [CalendarWeekKey] INT NOT NULL, 
    [CalendarWeekNumber] TINYINT NOT NULL, 
    [CalendarWeekShortName] VARCHAR(4) NOT NULL, 
    [CalendarWeekFullName] VARCHAR(7) NOT NULL, 
    [CalendarWeekYearShortName] VARCHAR(9) NOT NULL, 
    [CalendarWeekYearFullName] VARCHAR(12) NOT NULL, 
    [CalendarDayOfMonthNumber] TINYINT NOT NULL, 
    [CalendarDayOfMonthName] VARCHAR(4) NOT NULL, 
    [CalendarDayOfWeekNumber] TINYINT NOT NULL, 
    [CalendarDayShortName] CHAR(3) NOT NULL, 
    [CalendarDayFullName] VARCHAR(9) NOT NULL
)

Finally populate the time dimension

--=================================================================================
--Created by:  Chris Jenkins
--Date:        6th September 2018
--Description: Regenerates all data for the time dimension. This is easier and 
--             cleaner than checking and maintaining existing data. We just clear 
--             and repopulate the table. It should complete almost instantaneously.
--=================================================================================
CREATE PROCEDURE [dbo].[usp_PopulateDimDate]
AS
SET NOCOUNT ON;
DECLARE @nFututeYearsToPopulate INT = 1  -- This can be passed in or retrieved from lookup table
      , @nPastYearsToRetain     INT = 5; -- This can be passed in or retrieved from lookup table
 
--Get the first day of this year and then take off @nPastYearsToRetain to get the start date
DECLARE @dStartDate DATE = DATEADD(YEAR, -@nPastYearsToRetain, DATEADD(YEAR, DATEDIFF(YEAR,0,SYSDATETIME()), 0));
--Get the last day of this year and then add @nFututeYearsToPopulate to get the end date
DECLARE @dEndDate DATE = DATEADD(YEAR, @nFututeYearsToPopulate, DATEADD(YEAR, DATEDIFF(YEAR,0,SYSDATETIME()) + 1, -1));
BEGIN TRY
BEGIN TRAN
	TRUNCATE TABLE dbo.DimDate;
	INSERT INTO [dbo].[DimDate]
				([DateKey]
				,[FullDateName]
				,[CalendarYear]
				,[CalendarQuarterKey]
				,[CalendarQuarterNumber]
				,[CalendarQuarterShortName]
				,[CalendarQuarterFullName]
				,[CalendarMonthKey]
				,[CalendarMonthNumber]
				,[CalendarMonthShortName]
				,[CalendarMonthFullName]
				,[CalendarMonthYearShortName]
				,[CalendarMonthYearFullName]
				,[CalendarWeekKey]
				,[CalendarWeekNumber]
				,[CalendarWeekShortName]
				,[CalendarWeekFullName]
				,[CalendarWeekYearShortName]
				,[CalendarWeekYearFullName]
				,[CalendarDayOfMonthNumber]
				,[CalendarDayOfMonthName]
				,[CalendarDayOfWeekNumber]
				,[CalendarDayShortName]
				,[CalendarDayFullName])
	SELECT TheDate AS DateKey
		 , DATENAME(DAY, TheDate) 
		 + CASE WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '0'
		 		THEN 'th '
		 		WHEN DATENAME(DAY, TheDate) IN ('11', '12', '13')
		 		THEN 'th '
		 		WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '1'
		 		THEN 'st '
		 		WHEN LEN(TheDate) = 2
		 		THEN 'th '
		 		WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '2'
		 		THEN 'nd '
		 		WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '3'
		 		THEN 'rd '
		 		ELSE 'th '
		 END
		 + DATENAME(MONTH, TheDate) + ' ' + DATENAME(YEAR, TheDate) AS FullDateName
		 , DATEPART(YEAR, TheDate) AS CalendarYear
		 , DATENAME(YEAR, TheDate) + DATENAME(QUARTER, TheDate) AS CalendarQuarterKey
		 , DATEPART(QUARTER, TheDate) AS CalendarQuarterNumber
		 , 'Q' + DATENAME(QUARTER, TheDate) AS CalendarQuarterShortName
		 , 'Quarter ' + DATENAME(QUARTER, TheDate) AS CalendarQuarterFullName
		 , DATENAME(YEAR, TheDate) + CONVERT(VARCHAR, REPLACE(STR(DATEPART(MONTH, TheDate),2),SPACE(1), '0')) AS CalendarMonthKey
		 , DATEPART(MONTH, TheDate) AS CalendarMonthNumber
		 , LEFT(DATENAME(MONTH, TheDate), 3) AS CalendarMonthShortName
		 , DATENAME(MONTH, TheDate) AS CalendarMonthFullName
		 , LEFT(DATENAME(MONTH, TheDate), 3) + ' ' + DATENAME(YEAR, TheDate) AS CalendarMonthYearShortName
		 , DATENAME(MONTH, TheDate) + ' ' + DATENAME(YEAR, TheDate) AS CalendarMonthYearFullName
		 , DATENAME(YEAR, TheDate) + CONVERT(VARCHAR, REPLACE(STR(DATEPART(WEEK, TheDate),2),SPACE(1), '0')) AS CalendarWeekKey
		 , DATEPART(WEEK, TheDate) AS CalendarWeekNumber
		 , 'WK' + DATENAME(WEEK, TheDate) AS CalendarWeekShortName
		 , 'Week ' + DATENAME(WEEK, TheDate) AS CalendarWeekFullName
		 , 'WK' + DATENAME(WEEK, TheDate) + ' ' + DATENAME(YEAR, TheDate) AS CalendarWeekYearShortName
		 , 'Week ' + DATENAME(WEEK, TheDate) + ' ' + DATENAME(YEAR, TheDate) AS CalendarWeekYearFullName
		 , DATEPART(DAY, TheDate) AS CalendarDayOfMonthNumber
		 , DATENAME(DAY, TheDate) 
		 + CASE WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '0'
		 		THEN 'th'
		 		WHEN DATENAME(DAY, TheDate) IN ('11', '12', '13')
		 		THEN 'th'
		 		WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '1'
		 		THEN 'st'
		 		WHEN LEN(TheDate) = 2
		 		THEN 'th'
		 		WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '2'
		 		THEN 'nd'
		 		WHEN RIGHT(DATENAME(DAY, TheDate), 1) = '3'
		 		THEN 'rd'
		 		ELSE 'th'
			END AS CalendarDayOfMonthName
		  , DATEPART(WEEKDAY, TheDate) AS CalendarDayOfWeekNumber
		  , LEFT(DATENAME(WEEKDAY, TheDate), 3) AS CalendarDayShortName
		  , DATENAME(WEEKDAY, TheDate) AS CalendarDayFullName
	FROM    dbo.udf_GetAllDatesInRange (@dStartDate, @dEndDate) as c;
COMMIT TRAN
END TRY
--If there have been any failures we'll jump into the catch block.
--Rollback the transaction to ensure everything is as it was before
--we started and show the error.
BEGIN CATCH
	IF (@@TRANCOUNT > 0)
	BEGIN
		ROLLBACK TRAN;
	END
	SELECT ERROR_MESSAGE() AS ErrorMsg;
END CATCH

Usage

The intended usage is that the usp_PopulateDimDate stored procedure is executed as often as needed.  I’ve tested it in a number of scenarios with a number of date ranges and it usually repopulates the time dimension in less than a second.  I usually have it set up so that you pass in the date range or you fetch it from a global variables table.

You don’t need to do anything other than re-running that procedure.  Just make sure you don’t clear your LookupNumbers table.

This is just one of many ways to do the same or similar things, but I hope you find it useful.  You’ll find all of the scripts used above in a zip file in the resources section of the website here.

The post Data Warehousing Tip #7 – How I build a time dimension appeared first on BI Design.

Rate

Share

Share

Rate