Technical Article

Calendar Tables II - The Anchor

,

In my previous article on calendar tables I demonstrated how a simple month based calendar table could be used to simplify queries by taking out complex date calculations, for grouping results and in most cases actually increase performance. In this article I will cover using a day based calendar table to make it easier to show results for a day even if there are no records for that day. I call this using the calendar as the 'anchor' for the query.

Let's take an example of a sales organization. In this example we'll only be using a maximum of 4 sales reps with a very large Sales table. Here is the Sales Rep table:

SET NOCOUNT ON;
CREATE TABLE SlsReps
( SlsRepID INT NOT NULL PRIMARY KEY CLUSTERED
, SlsRepName VARCHAR(20));
INSERT INTO SlsReps
(SlsRepID, SlsRepName )
SELECT 0, 'Joe' UNION ALL
SELECT 1, 'Sally' UNION ALL
SELECT 2, 'Fred' UNION ALL
SELECT 3, 'Alice';
GO

Now, in order to get a realistic idea of performance we'll create a 1 million record sales table spanning dates from 1 Jan 2001 through 31 Dec 2009:

SET NOCOUNT ON;
CREATE TABLE Sales
( SaleID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED
, SlsRepID INT
, CustomerID INT
, SaleDate SMALLDATETIME
, SaleAmt MONEY
);
-- Creates 1,000,000 Sales records from 1 Jan 2001 to 31 Dec 2009
-- This can take 30 seconds or so.
INSERT INTO Sales
( SlsRepID, CustomerID, SaleDate, SaleAmt )
SELECT TOP 1000000
 ABS(CHECKSUM(NEWID())) & 3
, ABS(CHECKSUM(NEWID())) & 3
, CAST(RAND(CHECKSUM(NEWID()))* 3653.0 + 36524.0 AS DATETIME)
, ABS(CHECKSUM(NEWID())) & 1000
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2;

The sales table will have 1 million records with random sales reps from 0 to 3, random customer ID's from 0 to 3 and random sales amounts from 1 to 999. Now create a covering type index on the Sales table so there will be no Key Lookups when accessing the table by date and sales rep.

CREATE NONCLUSTERED INDEX Sales_SaleDateRepAmt ON Sales (SaleDate, SlsRepID)
INCLUDE (SaleAmt)

We are setting up the table to query a specific week in June 2009, from Monday 8 Jun 09 through Sunday 14 Jun 09. To demonstrate the concept of a calendar table used as an achor we have to clear out some sales for specific days and specific sales reps.

-- Alice - SlsRepID = 3 had 8 June 2009 off so no sales
DELETE FROM Sales
WHERE SaleDate >= '2009-06-08'
AND SaleDate < '2009-06-09'
AND SlsRepID = 3
-- Joe - SlsRepID = 0 had off 10 June 2009
DELETE FROM Sales
WHERE SaleDate >= '2009-06-10'
AND SaleDate < '2009-06-11'
AND SlsRepID = 0
-- The whole company was closed on 11 June 2009
DELETE FROM Sales
WHERE SaleDate >= '2009-06-11'
AND SaleDate < '2009-06-12'

The boss wants a query to show the sales each day for week starting 8 Jun 09 and ending 14 Jun 09 for the combined sales of Joe and Alice. You write this query script:

DECLARE
 @StartDate SMALLDATETIME
, @EndDate SMALLDATETIME
SET @StartDate = '2009-06-08'
SET @EndDate = '2009-06-15' -- Since Sales Date has time we have to use the next day.
-- The DATEADD(DATEFIFF functions strips the time out of SalesDate
SELECT DATEADD(day, DATEDIFF(day, 0, S.SaleDate), 0) AS SalesDate
, COUNT(*) AS NumberOfSales, SUM(S.SaleAmt) AS TotalSales
FROM Sales S
WHERE S.SaleDate >= @StartDate 
AND S.SaleDate < @EndDate
AND S.SlsRepID IN (0, 3) -- Joe = 0 and Alice = 3
GROUP BY DATEADD(day, DATEDIFF(day, 0, S.SaleDate), 0)
ORDER BY DATEADD(day, DATEDIFF(day, 0, S.SaleDate), 0)

which gives the boss this result set:

Note that there is no row for 11 Jun 09 since there are no sales records for that date.

The boss comes back and says "What happend to 11 Jun?" You say, "The company was closed and there were no sales for that day." The boss then says that he wants to see each date in all reports even if there was no activity for that day. Here is where you need to have a convenient anchor for queries that group by day so you can show zero results for that day. This is where a calendar table by day comes in handy. You query the calendar table and LEFT OUTER JOIN to other tables so that you have 1 row for each date. If the outer join is null then stick a zero for the result value.

Here is the code to create a 1 year calendar table for 2009. The NextDay is necessary since sales dates have time in them. If the business had only pure dates for sales dates then the table could be set up with only a DayDate and the queries could use '= DayDate' instead of using '>= DayDate' AND '< NextDay'.

CREATE TABLE CalDay
( DayID INT IDENTITY(1, 1)
, DayDate SMALLDATETIME
, NextDay SMALLDATETIME
, DayNum INT
, NameOfDay VARCHAR(10)
)
GO
SET NOCOUNT ON;
DECLARE 
 @Date SMALLDATETIME;
SET @Date = '2009-01-01';
WHILE @Date < '2010-01-01'
 BEGIN
 INSERT INTO CalDay
 ( DayDate, NextDay, DayNum, NameOfDay )
 SELECT @Date, DATEADD(day, 1, @Date)
 , DATEPART(WEEKDAY, @Date)
 , CASE DATEPART(weekday, @Date)
 WHEN 1 THEN 'Sunday'
 WHEN 2 THEN 'Monday'
 WHEN 3 THEN 'Tuesday'
 WHEN 4 THEN 'Wednesday'
 WHEN 5 THEN 'Thursday'
 WHEN 6 THEN 'Friday'
 ELSE 'Saturday' END;
 SET @Date = DATEADD(day, 1, @Date);
 END
 GO

We want the calendar table to be as compact as possible. Sometimes the query optimizer will decide to do a clustered index scan on the table and sometimes an indexed seek will be chosen. Just to cover both possibilitiies we create these indexes:

ALTER TABLE CalDay ADD CONSTRAINT PK_CalDay PRIMARY KEY CLUSTERED (DayID)
WITH (FILLFACTOR = 100)
GO
CREATE UNIQUE INDEX CalDay_Dates ON CalDay (DayDate, NextDay)
WITH (FILLFACTOR = 100)
GO

Here is a query that gives all sales by day for the week for Joe and Alice with zero sales where there were none. Please read the notes at the top as there are some quirks to using this technique:

-- Some notes here. The SlsRepID criteria must be part of the LEFT JOIN.
-- If it's part of the WHERE clause, then it becomes like an INNER JOIN---- and Thursday
-- will be left out since there is no sale for either Joe or Alice on ----- that day.
-- The COUNT has to be on a non-null column or it will be 1 instead of ---- zero for
-- Thursday.
DECLARE
 @StartDate SMALLDATETIME
, @FirstDay INT;
SET @StartDate = '2009-06-08';
SELECT @FirstDay = DayID FROM CalDay
WHERE DayDate = @StartDate; -- Get the integer day value
SELECT
 CD.DayDate, CD.NameOfDay
, COUNT(S.SaleID) AS NumberOfSales
, SUM(ISNULL(S.SaleAmt, 0)) AS DaySales
FROM CalDay CD
LEFT JOIN Sales S ON
 S.SaleDate >= CD.DayDate
 AND S.SaleDate < CD.NextDay
 AND S.SlsRepID IN (0, 3)
WHERE CD.DayID BETWEEN @FirstDay And @FirstDay + 6
GROUP BY CD.DayDate, CD.NameOfDay
ORDER BY CD.DayDate;

Note that the GROUP BY becomes much simpler in this query. Also date handling for the calendar table becomes simple integer math. Another advantage is being able to display a day name, since it is already in the calendar table.

The result set looks like:

Now the boss wants to see all sales by day for Joe and Alice individually and show any days where they didn't have any sales. In order to do this we need another anchor for sales rep. This is achieved below by doing a CROSS JOIN to a derived table - X - that has the 2 sales reps we want. Taking the date calculations out of the query and letting the calendar table handle the selecting and grouping makes this type of query much simpler:

DECLARE
 @StartDate SMALLDATETIME
, @FirstDay INT;
SET @StartDate = '2009-06-08';
SELECT @FirstDay = DayID FROM CalDay
WHERE DayDate = @StartDate; -- Get the integer day value
--
SELECT
 CD.DayDate, CD.NameOfDay, X.SlsRepName
, COUNT(S.SaleID) AS NumberOfSales
, SUM(ISNULL(S.SaleAmt, 0)) AS DaySales
FROM CalDay CD
CROSS JOIN
 (SELECT SlsRepID, SlsRepName
 FROM SlsReps
 WHERE SlsRepID IN (0, 3)
 ) AS X
LEFT JOIN Sales S ON
 S.SaleDate >= CD.DayDate
 AND S.SaleDate < CD.NextDay
 AND S.SlsRepID = X.SlsRepID
 AND S.SlsRepID IN (0, 3)
WHERE CD.DayID BETWEEN @FirstDay And @FirstDay + 6
GROUP BY CD.DayDate, CD.NameOfDay, X.SlsRepName
ORDER BY X.SlsRepName, CD.DayDate;

This produces the result set below:

Finally, the boss wants sales for all sales reps for the same week with 1 row for each sales rep. This query uses a technique described very well in Jeff Moden's article on CrossTabs and Pivots: http://www.sqlservercentral.com/articles/T-SQL/63681/

DECLARE
 @StartDate SMALLDATETIME
, @FirstDay INT
--
SET @StartDate = '2009-06-08'
SELECT @FirstDay = DayID FROM CalDay
WHERE DayDate = @StartDate
--
SELECT SR.SlsRepName AS Rep
, SUM(CASE WHEN X.DayNum = 2 THEN 1 ELSE 0 END) AS [Mon#]
, SUM(CASE WHEN X.DayNum = 2 THEN SaleAmt ELSE 0 END) AS [Mon$]
, SUM(CASE WHEN X.DayNum = 3 THEN 1 ELSE 0 END) AS [Tue#]
, SUM(CASE WHEN X.DayNum = 3 THEN SaleAmt ELSE 0 END) AS [Tue$]
, SUM(CASE WHEN X.DayNum = 4 THEN 1 ELSE 0 END) AS [Wed#]
, SUM(CASE WHEN X.DayNum = 4 THEN SaleAmt ELSE 0 END) AS [Wed$]
, SUM(CASE WHEN X.DayNum = 5 THEN 1 ELSE 0 END) AS [Thu#]
, SUM(CASE WHEN X.DayNum = 5 THEN SaleAmt ELSE 0 END) AS [Thu$]
, SUM(CASE WHEN X.DayNum = 6 THEN 1 ELSE 0 END) AS [Fri#]
, SUM(CASE WHEN X.DayNum = 6 THEN SaleAmt ELSE 0 END) AS [Fri$]
, SUM(CASE WHEN X.DayNum = 7 THEN 1 ELSE 0 END) AS [Sat#]
, SUM(CASE WHEN X.DayNum = 7 THEN SaleAmt ELSE 0 END) AS [Sat$]
, SUM(CASE WHEN X.DayNum = 1 THEN 1 ELSE 0 END) AS [Sun#]
, SUM(CASE WHEN X.DayNum = 1 THEN SaleAmt ELSE 0 END) AS [Sun$]
FROM SlsReps SR
-- Derived table produces at least 1 row per day from the CalDay table.
LEFT JOIN
 (SELECT CD.DayNum, S.SlsRepID, S.SaleAmt -- CalDay LEFT JOIN provides     -- another anchor for days with no sales.
 FROM CalDay CD
 LEFT JOIN Sales S ON
 S.SaleDate >= CD.DayDate
 AND S.SaleDate < CD.NextDay
 WHERE CD.DayID BETWEEN @FirstDay And @FirstDay + 6
 ) AS X ON
 SR.SlsRepID = X.SlsRepID
GROUP BY SR.SlsRepName;

And here is the result set:

Using the calendar table takes out any complex date handling, can be used as an anchor or place holder and greatly simplifies this type of task.

Todd Fifield

Developer/DBA

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating