Technical Article

Calendar Tables

,

I've been using Calendar tables for years to create code that is easier to read, performs better and is more flexible than using complicated date type calculations in queries and stored procedures. I recently saw a posting here on SQLServerCentral where the original poster asked how to improve his query's performance. The suggestions did improve performance, but I thought that a very different approach would have served better and would allow further queries of similar type to be written much more easily.

In this article I will be using a month based calendar table. I also use week based and quarter based calendar tables (the quarter based tables make life much easier when doing 4-4-5 accounting periods, which are based on 13 weeks rather than Calendar months).

The theory behind this type of calendar table is that the table itself determines the bucket into which date related facts are dumped or aggregated. Date calculations mostly are not used. The table has an ascending integer (usually an identity), which is the clustered index. It has a start date for the period and either an end date for the period or the start of the next period. These are DATETIME data types. If the dates you use in your data are pure dates (no time) then you would use an end date and it would be the last day of the period. If there is time in your dates then you would use the start of the next period instead of an end date.

More data can be put into a month based calendar table than I'm using here, but this should do for our purposes. Here is the code to create a simple month based calendar table starting at January 2000 going through December 2024:

-- These columns are the most useful. Others can be added.

SET NOCOUNT ON;
CREATE TABLE CalMonth
( MonthID INT IDENTITY(1, 1)
, MonthStart DATETIME
, NextMonth DATETIME
, MonthDescr CHAR(6)
, MonthName CHAR(3)
, YearMonth INT
, MonthNum INT
);
DECLARE
 @MonthStart DATETIME
, @NextMonth DATETIME
, @MonthDescr CHAR(6)
, @MonthName CHAR(3)
, @MonthNum INT
, @Year INT
, @YearMonth INT;
SET @MonthStart = '1/1/2000';
SET @NextMonth = DATEADD(month, 1, @MonthStart);
WHILE @MonthStart < '1/1/2025'
 BEGIN
 SET @MonthNum = DATEPART(month, @MonthStart);
 SET @MonthName = CASE @MonthNum
 WHEN 1 THEN 'JAN'
 WHEN 2 THEN 'FEB'
 WHEN 3 THEN 'MAR'
 WHEN 4 THEN 'APR'
 WHEN 5 THEN 'MAY'
 WHEN 6 THEN 'JUN'
 WHEN 7 THEN 'JUL'
 WHEN 8 THEN 'AUG'
 WHEN 9 THEN 'SEP'
 WHEN 10 THEN 'OCT'
 WHEN 11 THEN 'NOV'
 ELSE 'DEC' END;
 SET @Year = DATEPART(year, @MonthStart);
 SET @YearMonth = (@Year * 100) + @MonthNum;
 SET @MonthDescr = @MonthName + '-' + RIGHT(CONVERT(VARCHAR, @Year), 2);
INSERT INTO CalMonth
 ( MonthStart, NextMonth, MonthDescr, MonthName, YearMonth, MonthNum )
 SELECT
 @MonthStart, @NextMonth, @MonthDescr, @MonthName, @YearMonth, @MonthNum;
SET @MonthStart = @NextMonth;
SET @NextMonth = DATEADD(month, 1, @NextMonth);
END
GO
ALTER TABLE [dbo].[CalMonth] ADD CONSTRAINT [PK_CalMonth] PRIMARY KEY CLUSTERED 
(
 MonthID ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF
, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF
, FILLFACTOR = 100) ON [PRIMARY]
GO
CREATE UNIQUE NONCLUSTERED INDEX [CalMonth_Dates] ON [dbo].[CalMonth] 
(
 [MonthStart] ASC,
 [NextMonth] ASC
)
INCLUDE ( [MonthDescr]) 
WITH (PAD_INDEX = OFF, FILLFACTOR = 100)
GO

This table should fit into 2 pages total. The idea is that the calendar table should be as compact as you can make it to allow it to be cached in memory as much as possible.

Here is a simple example of using the CalMonth table in a query in the AdventureWorks database.( I had to dummy up the date for the query since my SalesOrderHeader table is way out of date). It returns a range of customers with their total sales by month for the month of June 2004 and two months before that. Notice that there is no DATEADD or DATEPART in the query. Since the MonthID is a sequential integer, 2 is subtracted from the latest month to get the first month. The display month is also fetched from CalMonth.

DECLARE
 @LatestMonth INT;
SELECT @LatestMonth = MonthID
FROM CalMonth
WHERE MonthStart = '2004-06-01'
SELECT C.CustomerID, C.AccountNumber, C.TerritoryID
, CM.MonthID, CM.MonthDescr
, COUNT(*) AS NumberOfSales
, SUM(SO.SubTotal) AS MonthlySales
FROM Sales.Customer C
INNER JOIN Sales.SalesOrderHeader SO ON
 C.CustomerID = SO.CustomerID
INNER JOIN CalMonth CM ON
 SO.OrderDate >= CM.MonthStart
 AND SO.OrderDate < CM.NextMonth
WHERE C.CustomerType = 'S'
AND CM.MonthID BETWEEN @LatestMonth - 2 And @LatestMonth
GROUP BY C.CustomerID, C.AccountNumber, C.TerritoryID
, CM.MonthID, CM.MonthDescr
ORDER BY C.TerritoryID, CM.MonthID;

The result set returned looks like:

CustomerID  AccountNumber TerritoryID MonthID     MonthDescr NumberOfSales MonthlySales
----------- ------------- ----------- ----------- ---------- ------------- ------------
110         AW00000110    1           52          APR-04     1             75997.3582
146         AW00000146    1           52          APR-04     1             85828.7208
290         AW00000290    1           52          APR-04     1             66245.5197
253         AW00000253    1           52          APR-04     1             4372.7986
325         AW00000325    1           52          APR-04     1             318.4545
344         AW00000344    1           52          APR-04     1             59.9326
380         AW00000380    1           52          APR-04     1             272.961
398         AW00000398    1           52          APR-04     1             1655.4936
451         AW00000451    1           52          APR-04     1             50000.0044
505         AW00000505    1           52          APR-04     1             60748.8992
506         AW00000506    1           52          APR-04     1             113252.4963
560         AW00000560    1           52          APR-04     1             452.7394
578         AW00000578    1           52          APR-04     1             299.6926
668         AW00000668    1           52          APR-04     1             8687.2228
673         AW00000673    1           52          APR-04     1             2312.6134
2           AW00000002    1           53          MAY-04     1             822.009

CalMonth is used to both select the sales orders to include and for the grouping. The MonthDescr makes it easy to display which month we're grouping.

I did some testing on an alternate version of this query where I just set the start and end dates as variables and then did a GROUP BY (DATEPART(year, OrderDate) * 100) + DATEPART(month, OrderDate) in order to group by Customer, Year and Month and eliminated the CalMonth table. It generally ran 10 ms longer in CPU time. So this method also affords good performance .

Now we get to more complex applications of our CalMonth table. This is from an actual post here on SQL Server Central just a few weeks ago. I slightly modified it to run in the AdventureWorks database.

The original poster (OP) wanted a side by side comparison of transactions from 2 different years summarized and compared month by month. This was for a hotel group. There were close to 2 million records in the Stay (transaction) table.

Here is a script that will create a 2 million record temp table that we can use for comparison:

IF OBJECT_ID('tempdb..#Stay') IS NOT NULL
 DROP TABLE #Stay;
CREATE TABLE #Stay(
 [StayID] [uniqueidentifier] NOT NULL,
 [GuestID] [uniqueidentifier] NOT NULL,
 [ArrivalDate] [datetime] NOT NULL,
 [TotalRevenue] [money] NOT NULL
 CONSTRAINT [PK_Stay] PRIMARY KEY NONCLUSTERED 
(
 [StayID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
ALTER TABLE #Stay ADD CONSTRAINT [DF_Stay_StayID] DEFAULT (newsequentialid()) FOR [StayID];
CREATE CLUSTERED INDEX [IX_Stay_ArrivalDateGuestID] ON #Stay
(
 [ArrivalDate] ASC,
 [GuestID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY];
--Populate with one million rows of random data, thanks to Jeff for this (http://www.sqlservercentral.com/articles/Crosstab/65048/)
INSERT INTO #Stay (GuestID, ArrivalDate, TotalRevenue)
SELECT TOP 2000000
 NEWID(),
 CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
 CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY)
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2

I added a covering index on ArrivalDate and Revenue in order to avoid excessive key lookups:

CREATE INDEX #Stay_ArrivalRevenue ON #STAY (ArrivalDate, TotalRevenue)

The technique the OP was using required a Tally or Numbers table. Here is a link to a great article by Jeff Moden on the use of numbers tables: http://www.sqlservercentral.com/articles/T-SQL/62867/

Here is the code to generate a Tally table:

SELECT TOP 10000
N = IDENTITY(INT, 1, 1)
INTO Tally
FROM sys.syscolumns S1
CROSS JOIN sys.syscolumns S2
GO
ALTER TABLE Tally ADD CONSTRAINT PK_Tally PRIMARY KEY CLUSTERED (N)
WITH FILLFACTOR = 100
GO

Here is the query the OP wanted help on. The tally table was used for grouping to get a single month and in the date calculations:

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
DECLARE @StartYear1 DATETIME;
DECLARE @StartYear2 DATETIME;
SET @StartYear2 = DATEADD(YY, -1, DATEADD(MM, DATEDIFF(MM, 0, GETDATE()), 0));
SET @StartYear1 = DATEADD(YY, -1, @StartYear2);
SELECT 
 DATEADD(MM, N - 1, @StartYear1) Year1,
 SUM(CASE WHEN ArrivalDate >= @StartYear1 AND ArrivalDate < @StartYear2 THEN TotalRevenue ELSE 0 END) Year1Revenue,
 DATEADD(MM, N -1, @StartYear2) Year2,
 SUM(CASE WHEN ArrivalDate >= @StartYear2 THEN TotalRevenue ELSE 0 END) Year2Revenue
FROM Tally
LEFT JOIN #Stay 
ON 
 DATEADD(MM, N - 1, @StartYear1) = DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0)
 OR DATEADD(MM, N - 1, @StartYear2) = DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0)
WHERE ArrivalDate >= @StartYear1 AND N <= 12
GROUP BY N
ORDER BY Year1
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;

The result set returned form this query was:

Year1                   Year1Rev  Year2                   Year2Revenue
2008-06-01 00:00:00.000 814652.29 2009-06-01 00:00:00.000 815557.55
2008-07-01 00:00:00.000 850882.41 2009-07-01 00:00:00.000 831792.32
2008-08-01 00:00:00.000 844237.14 2009-08-01 00:00:00.000 837465.42
2008-09-01 00:00:00.000 836915.91 2009-09-01 00:00:00.000 833296.03
2008-10-01 00:00:00.000 838332.34 2009-10-01 00:00:00.000 855292.19
2008-11-01 00:00:00.000 819412.58 2009-11-01 00:00:00.000 827802.45
2008-12-01 00:00:00.000 838095.67 2009-12-01 00:00:00.000 851131.06
2009-01-01 00:00:00.000 858064.48 2010-01-01 00:00:00.000 0.00
2009-02-01 00:00:00.000 771925.00 2010-02-01 00:00:00.000 0.00
2009-03-01 00:00:00.000 855450.64 2010-03-01 00:00:00.000 0.00
2009-04-01 00:00:00.000 821391.97 2010-04-01 00:00:00.000 0.00
2009-05-01 00:00:00.000 853181.60 2010-05-01 00:00:00.000 0.00

As you can see there are numerous calculations to get the dates right for the grouping and selecting. These types of date calculations prevent the optimizer from using the index on ArrivalDate. The average CPU time for this query on my humble box was 3827 ms. It also isn't the easiest query in the world to read and understand.

Here is another solution using the CalMonth table:

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
DECLARE 
 @StartYear2 DATETIME
, @StartMonth INT
-- This returns the first day of the current month.
SET @StartYear2 = DATEADD(YY, -1, DATEADD(MM, DATEDIFF(MM, 0, GETDATE()), 0));
-- Get the MonthID for this month.
SELECT 
 @StartMonth = MonthID
FROM CalMonth
WHERE MonthStart = @StartYear2;
-- Set to same month the year before.
SET @StartMonth = @StartMonth - 12
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
SELECT
 X.MonthName + '-' + MIN(X.YrSuffix) AS FirstYearMonth
, SUM(CASE WHEN YearNo = 1 THEN X.TotalRevenue ELSE 0 END) AS Year1Revenue
, X.MonthName + '-' + MAX(X.YrSuffix) AS SecondYearMonth
, SUM(CASE WHEN YearNo = 2 THEN X.TotalRevenue ELSE 0 END) as Year2Revenue
, X.MonthOrd
FROM
 (SELECT CM.MonthName, S.TotalRevenue AS TotalRevenue, 1 AS YearNo
 , CM.MonthID - @StartMonth AS MonthOrd
 , RIGHT(CONVERT(VARCHAR, DATEPART(year, CM.MonthStart)), 2) AS YrSuffix
 FROM CalMonth CM
 INNER JOIN #Stay S ON
 S.ArrivalDate >= CM.MonthStart
 AND S.ArrivalDate < CM.NextMonth
 WHERE CM.MonthID BETWEEN @StartMonth And @StartMonth + 11
 UNION ALL
 -- This gets the transactions from the year after.
 SELECT CM.MonthName, S.TotalRevenue, 2
 , CM.MonthID - 12 - @StartMonth
 , RIGHT(CONVERT(VARCHAR, DATEPART(year, CM.MonthStart)), 2)
 FROM CalMonth CM
 INNER JOIN #Stay S ON
 S.ArrivalDate >= CM.MonthStart
 AND S.ArrivalDate < CM.NextMonth
 WHERE CM.MonthID BETWEEN @StartMonth + 12 And @StartMonth + 23
 ) AS X
GROUP BY X.MonthName, X.MonthOrd
ORDER BY X.MonthOrd;

The result set returned looks like:

FirstYearMonth Year1Revenue SecondYearMonth Year2Revenue MonthOrd
JUN-08         814652.29    JUN-09          815557.55    0
JUL-08         850882.41    JUL-09          831792.32    1
AUG-08         844237.14    AUG-09          837465.42    2
SEP-08         836915.91    SEP-09          833296.03    3
OCT-08         838332.34    OCT-09          855292.19    4
NOV-08         819412.58    NOV-09          827802.45    5
DEC-08         838095.67    DEC-09          851131.06    6
JAN-09         858064.48    JAN-09          0.00         7
FEB-09         771925.00    FEB-09          0.00         8
MAR-09         855450.64    MAR-09          0.00         9
APR-09         821391.97    APR-09          0.00         10
MAY-09         853181.60    MAY-09          0.00         11

The average CPU time was 360 ms - 10 times faster. The result set isn't quite the same, but serves the same purpose here. I find the code easier to read without all of the date calculations. Instead of DATEADD type calculations simple integer math is used since all months are sequential integers.

I find calendar tables make code much easier to read and maintain, mostly increase performance and they provide an easy way to change code (i.e. just change the table) when business rules change over time. After a few times using them, you can easily get hooked on them.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating