I've been using Calendar tables for years to create code that is easier to read, performs better and is more flexible than using complicated date type calculations in queries and stored procedures. I recently saw a posting here on SQLServerCentral where the original poster asked how to improve his query's performance. The suggestions did improve performance, but I thought that a very different approach would have served better and would allow further queries of similar type to be written much more easily.
In this article I will be using a month based calendar table. I also use week based and quarter based calendar tables (the quarter based tables make life much easier when doing 4-4-5 accounting periods, which are based on 13 weeks rather than Calendar months).
The theory behind this type of calendar table is that the table itself determines the bucket into which date related facts are dumped or aggregated. Date calculations mostly are not used. The table has an ascending integer (usually an identity), which is the clustered index. It has a start date for the period and either an end date for the period or the start of the next period. These are DATETIME data types. If the dates you use in your data are pure dates (no time) then you would use an end date and it would be the last day of the period. If there is time in your dates then you would use the start of the next period instead of an end date.
More data can be put into a month based calendar table than I'm using here, but this should do for our purposes. Here is the code to create a simple month based calendar table starting at January 2000 going through December 2024:
— These columns are the most useful. Others can be added.
SET NOCOUNT ON; CREATE TABLE CalMonth ( MonthID INT IDENTITY(1, 1) , MonthStart DATETIME , NextMonth DATETIME , MonthDescr CHAR(6) , MonthName CHAR(3) , YearMonth INT , MonthNum INT ); DECLARE @MonthStart DATETIME , @NextMonth DATETIME , @MonthDescr CHAR(6) , @MonthName CHAR(3) , @MonthNum INT , @Year INT , @YearMonth INT; SET @MonthStart = '1/1/2000'; SET @NextMonth = DATEADD(month, 1, @MonthStart); WHILE @MonthStart < '1/1/2025' BEGIN SET @MonthNum = DATEPART(month, @MonthStart); SET @MonthName = CASE @MonthNum WHEN 1 THEN 'JAN' WHEN 2 THEN 'FEB' WHEN 3 THEN 'MAR' WHEN 4 THEN 'APR' WHEN 5 THEN 'MAY' WHEN 6 THEN 'JUN' WHEN 7 THEN 'JUL' WHEN 8 THEN 'AUG' WHEN 9 THEN 'SEP' WHEN 10 THEN 'OCT' WHEN 11 THEN 'NOV' ELSE 'DEC' END; SET @Year = DATEPART(year, @MonthStart); SET @YearMonth = (@Year * 100) + @MonthNum; SET @MonthDescr = @MonthName + '-' + RIGHT(CONVERT(VARCHAR, @Year), 2); INSERT INTO CalMonth ( MonthStart, NextMonth, MonthDescr, MonthName, YearMonth, MonthNum ) SELECT @MonthStart, @NextMonth, @MonthDescr, @MonthName, @YearMonth, @MonthNum; SET @MonthStart = @NextMonth; SET @NextMonth = DATEADD(month, 1, @NextMonth); END GO ALTER TABLE [dbo].[CalMonth] ADD CONSTRAINT [PK_CalMonth] PRIMARY KEY CLUSTERED ( MonthID ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF , SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF , FILLFACTOR = 100) ON [PRIMARY] GO CREATE UNIQUE NONCLUSTERED INDEX [CalMonth_Dates] ON [dbo].[CalMonth] ( [MonthStart] ASC, [NextMonth] ASC ) INCLUDE ( [MonthDescr]) WITH (PAD_INDEX = OFF, FILLFACTOR = 100) GO
This table should fit into 2 pages total. The idea is that the calendar table should be as compact as you can make it to allow it to be cached in memory as much as possible.
Here is a simple example of using the CalMonth table in a query in the AdventureWorks database.( I had to dummy up the date for the query since my SalesOrderHeader table is way out of date). It returns a range of customers with their total sales by month for the month of June 2004 and two months before that. Notice that there is no DATEADD or DATEPART in the query. Since the MonthID is a sequential integer, 2 is subtracted from the latest month to get the first month. The display month is also fetched from CalMonth.
DECLARE @LatestMonth INT; SELECT @LatestMonth = MonthID FROM CalMonth WHERE MonthStart = '2004-06-01' SELECT C.CustomerID, C.AccountNumber, C.TerritoryID , CM.MonthID, CM.MonthDescr , COUNT(*) AS NumberOfSales , SUM(SO.SubTotal) AS MonthlySales FROM Sales.Customer C INNER JOIN Sales.SalesOrderHeader SO ON C.CustomerID = SO.CustomerID INNER JOIN CalMonth CM ON SO.OrderDate >= CM.MonthStart AND SO.OrderDate < CM.NextMonth WHERE C.CustomerType = 'S' AND CM.MonthID BETWEEN @LatestMonth - 2 And @LatestMonth GROUP BY C.CustomerID, C.AccountNumber, C.TerritoryID , CM.MonthID, CM.MonthDescr ORDER BY C.TerritoryID, CM.MonthID;
The result set returned looks like:
CustomerID AccountNumber TerritoryID MonthID MonthDescr NumberOfSales MonthlySales ----------- ------------- ----------- ----------- ---------- ------------- ------------ 110 AW00000110 1 52 APR-04 1 75997.3582 146 AW00000146 1 52 APR-04 1 85828.7208 290 AW00000290 1 52 APR-04 1 66245.5197 253 AW00000253 1 52 APR-04 1 4372.7986 325 AW00000325 1 52 APR-04 1 318.4545 344 AW00000344 1 52 APR-04 1 59.9326 380 AW00000380 1 52 APR-04 1 272.961 398 AW00000398 1 52 APR-04 1 1655.4936 451 AW00000451 1 52 APR-04 1 50000.0044 505 AW00000505 1 52 APR-04 1 60748.8992 506 AW00000506 1 52 APR-04 1 113252.4963 560 AW00000560 1 52 APR-04 1 452.7394 578 AW00000578 1 52 APR-04 1 299.6926 668 AW00000668 1 52 APR-04 1 8687.2228 673 AW00000673 1 52 APR-04 1 2312.6134 2 AW00000002 1 53 MAY-04 1 822.009
CalMonth is used to both select the sales orders to include and for the grouping. The MonthDescr makes it easy to display which month we're grouping.
I did some testing on an alternate version of this query where I just set the start and end dates as variables and then did a GROUP BY (DATEPART(year, OrderDate) * 100) + DATEPART(month, OrderDate) in order to group by Customer, Year and Month and eliminated the CalMonth table. It generally ran 10 ms longer in CPU time. So this method also affords good performance .
Now we get to more complex applications of our CalMonth table. This is from an actual post here on SQL Server Central just a few weeks ago. I slightly modified it to run in the AdventureWorks database.
The original poster (OP) wanted a side by side comparison of transactions from 2 different years summarized and compared month by month. This was for a hotel group. There were close to 2 million records in the Stay (transaction) table.
Here is a script that will create a 2 million record temp table that we can use for comparison:
IF OBJECT_ID('tempdb..#Stay') IS NOT NULL DROP TABLE #Stay; CREATE TABLE #Stay( [StayID] [uniqueidentifier] NOT NULL, [GuestID] [uniqueidentifier] NOT NULL, [ArrivalDate] [datetime] NOT NULL, [TotalRevenue] [money] NOT NULL CONSTRAINT [PK_Stay] PRIMARY KEY NONCLUSTERED ( [StayID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]; ALTER TABLE #Stay ADD CONSTRAINT [DF_Stay_StayID] DEFAULT (newsequentialid()) FOR [StayID]; CREATE CLUSTERED INDEX [IX_Stay_ArrivalDateGuestID] ON #Stay ( [ArrivalDate] ASC, [GuestID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]; --Populate with one million rows of random data, thanks to Jeff for this (http://www.sqlservercentral.com/articles/Crosstab/65048/) INSERT INTO #Stay (GuestID, ArrivalDate, TotalRevenue) SELECT TOP 2000000 NEWID(), CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME), CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY) FROM Master.dbo.SysColumns t1 CROSS JOIN Master.dbo.SysColumns t2
I added a covering index on ArrivalDate and Revenue in order to avoid excessive key lookups:
CREATE INDEX #Stay_ArrivalRevenue ON #STAY (ArrivalDate, TotalRevenue)
The technique the OP was using required a Tally or Numbers table. Here is a link to a great article by Jeff Moden on the use of numbers tables: http://www.sqlservercentral.com/articles/T-SQL/62867/
Here is the code to generate a Tally table:
SELECT TOP 10000 N = IDENTITY(INT, 1, 1) INTO Tally FROM sys.syscolumns S1 CROSS JOIN sys.syscolumns S2 GO ALTER TABLE Tally ADD CONSTRAINT PK_Tally PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100 GO
Here is the query the OP wanted help on. The tally table was used for grouping to get a single month and in the date calculations:
SET STATISTICS IO ON; SET STATISTICS TIME ON; DECLARE @StartYear1 DATETIME; DECLARE @StartYear2 DATETIME; SET @StartYear2 = DATEADD(YY, -1, DATEADD(MM, DATEDIFF(MM, 0, GETDATE()), 0)); SET @StartYear1 = DATEADD(YY, -1, @StartYear2); SELECT DATEADD(MM, N - 1, @StartYear1) Year1, SUM(CASE WHEN ArrivalDate >= @StartYear1 AND ArrivalDate < @StartYear2 THEN TotalRevenue ELSE 0 END) Year1Revenue, DATEADD(MM, N -1, @StartYear2) Year2, SUM(CASE WHEN ArrivalDate >= @StartYear2 THEN TotalRevenue ELSE 0 END) Year2Revenue FROM Tally LEFT JOIN #Stay ON DATEADD(MM, N - 1, @StartYear1) = DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0) OR DATEADD(MM, N - 1, @StartYear2) = DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0) WHERE ArrivalDate >= @StartYear1 AND N <= 12 GROUP BY N ORDER BY Year1 SET STATISTICS IO OFF; SET STATISTICS TIME OFF;
The result set returned form this query was:
Year1 Year1Rev Year2 Year2Revenue 2008-06-01 00:00:00.000 814652.29 2009-06-01 00:00:00.000 815557.55 2008-07-01 00:00:00.000 850882.41 2009-07-01 00:00:00.000 831792.32 2008-08-01 00:00:00.000 844237.14 2009-08-01 00:00:00.000 837465.42 2008-09-01 00:00:00.000 836915.91 2009-09-01 00:00:00.000 833296.03 2008-10-01 00:00:00.000 838332.34 2009-10-01 00:00:00.000 855292.19 2008-11-01 00:00:00.000 819412.58 2009-11-01 00:00:00.000 827802.45 2008-12-01 00:00:00.000 838095.67 2009-12-01 00:00:00.000 851131.06 2009-01-01 00:00:00.000 858064.48 2010-01-01 00:00:00.000 0.00 2009-02-01 00:00:00.000 771925.00 2010-02-01 00:00:00.000 0.00 2009-03-01 00:00:00.000 855450.64 2010-03-01 00:00:00.000 0.00 2009-04-01 00:00:00.000 821391.97 2010-04-01 00:00:00.000 0.00 2009-05-01 00:00:00.000 853181.60 2010-05-01 00:00:00.000 0.00
As you can see there are numerous calculations to get the dates right for the grouping and selecting. These types of date calculations prevent the optimizer from using the index on ArrivalDate. The average CPU time for this query on my humble box was 3827 ms. It also isn't the easiest query in the world to read and understand.
Here is another solution using the CalMonth table:
SET STATISTICS IO ON; SET STATISTICS TIME ON; DECLARE @StartYear2 DATETIME , @StartMonth INT -- This returns the first day of the current month. SET @StartYear2 = DATEADD(YY, -1, DATEADD(MM, DATEDIFF(MM, 0, GETDATE()), 0)); -- Get the MonthID for this month. SELECT @StartMonth = MonthID FROM CalMonth WHERE MonthStart = @StartYear2; -- Set to same month the year before. SET @StartMonth = @StartMonth - 12 SET STATISTICS IO OFF; SET STATISTICS TIME OFF; SELECT X.MonthName + '-' + MIN(X.YrSuffix) AS FirstYearMonth , SUM(CASE WHEN YearNo = 1 THEN X.TotalRevenue ELSE 0 END) AS Year1Revenue , X.MonthName + '-' + MAX(X.YrSuffix) AS SecondYearMonth , SUM(CASE WHEN YearNo = 2 THEN X.TotalRevenue ELSE 0 END) as Year2Revenue , X.MonthOrd FROM (SELECT CM.MonthName, S.TotalRevenue AS TotalRevenue, 1 AS YearNo , CM.MonthID - @StartMonth AS MonthOrd , RIGHT(CONVERT(VARCHAR, DATEPART(year, CM.MonthStart)), 2) AS YrSuffix FROM CalMonth CM INNER JOIN #Stay S ON S.ArrivalDate >= CM.MonthStart AND S.ArrivalDate < CM.NextMonth WHERE CM.MonthID BETWEEN @StartMonth And @StartMonth + 11 UNION ALL -- This gets the transactions from the year after. SELECT CM.MonthName, S.TotalRevenue, 2 , CM.MonthID - 12 - @StartMonth , RIGHT(CONVERT(VARCHAR, DATEPART(year, CM.MonthStart)), 2) FROM CalMonth CM INNER JOIN #Stay S ON S.ArrivalDate >= CM.MonthStart AND S.ArrivalDate < CM.NextMonth WHERE CM.MonthID BETWEEN @StartMonth + 12 And @StartMonth + 23 ) AS X GROUP BY X.MonthName, X.MonthOrd ORDER BY X.MonthOrd;
The result set returned looks like:
FirstYearMonth Year1Revenue SecondYearMonth Year2Revenue MonthOrd JUN-08 814652.29 JUN-09 815557.55 0 JUL-08 850882.41 JUL-09 831792.32 1 AUG-08 844237.14 AUG-09 837465.42 2 SEP-08 836915.91 SEP-09 833296.03 3 OCT-08 838332.34 OCT-09 855292.19 4 NOV-08 819412.58 NOV-09 827802.45 5 DEC-08 838095.67 DEC-09 851131.06 6 JAN-09 858064.48 JAN-09 0.00 7 FEB-09 771925.00 FEB-09 0.00 8 MAR-09 855450.64 MAR-09 0.00 9 APR-09 821391.97 APR-09 0.00 10 MAY-09 853181.60 MAY-09 0.00 11
The average CPU time was 360 ms – 10 times faster. The result set isn't quite the same, but serves the same purpose here. I find the code easier to read without all of the date calculations. Instead of DATEADD type calculations simple integer math is used since all months are sequential integers.
I find calendar tables make code much easier to read and maintain, mostly increase performance and they provide an easy way to change code (i.e. just change the table) when business rules change over time. After a few times using them, you can easily get hooked on them.