SQLServerCentral Article

Returning 0's in a SUM (SQL Spackle)

,

Introduction

A common request that many of us have had is our manager comes and wants a simple report on total sales for a day summarized by each hour. If you do a simple SUM and GROUP BY hour, you will only show the hours that have data. But your manager wants to see a 0 for the hours with no data. Instead of seeing this:

He wants to see this:

Creating the Test Data

In order to show how to do this I will need some test data. I will create a #SalesTestData table and populate it with 1,000,000 records with dates between Jan 1, 2007 and Dec 31, 2011. Then I will delete all the records that have a time in the 8 or 17 hours. I am also deleting data from 3 days of each month. This is being done so when we run the demo code, there will be some hours that have no data. This way I can demonstrate showing a 0 when there is no data.

The code that I use to populate the test data was taken from scripts written by Jeff Moden.


IF (SELECT OBJECT_ID('tempdb..#SalesTestData')
) IS NOT NULL
DROP TABLE #SalesTestData
/*Create the table and populate it with random dates from Jan 1, 2007 to Dec 31, 2011*/
SELECT TOP 1000000
CAST(RAND(CHECKSUM(NEWID())) * 1826.0 + 39081.0 AS DATETIME) AS SaleTime
,ABS(CHECKSUM(NEWID())) % 50000 + 1 AS SaleQty
INTO dbo.#SalesTestData
FROM master.sys.all_columns AS ac
CROSS JOIN master.sys.all_columns AS ac1
/*Creating a clustered index for performance*/
CREATE CLUSTERED INDEX IXC_#SalesTestData_SaleTime
ON #SalesTestData (SaleTime)
/*Delete all data where the hour is 8 or 17 and where the day is the 9th, 16th or 23rd. This is to show that 0's will be returned for the missing data.*/
DELETE #SalesTestData
WHERE DATEPART(HOUR,SaleTime) IN (8,17)
DELETE #SalesTestData
WHERE DATEPART(day,SaleTime) IN (9,16,23) GO

Summing the Data by Hour

Now that we have the test data I will sum the data for one day by hour using a table variable. A table variable is like a temp table, but you have to declare it first and it will only exist during the current session. The reason for the table variable is so I can create a basic table with all the hours of the day. This table will just be populated for one day with a start hour and a next hour. I will then left join the #SalesTestDate table to this table. I use a left join so when I use the SUM and GROUP BY hour I will get a 0 for those hours without a sale.

DECLARE @Date DATETIME = '08/13/2009' ; 
/*Create a table variable and populate it with StartHour for one day*/DECLARE @hour TABLE
(StartHour DATETIME
,NextHour DATETIME
) ;
INSERT INTO @hour
(StartHour
,NextHour
)
SELECT DATEADD(hh,Number,@Date) AS StartHour
,DATEADD(hh,Number + 1,@Date) AS NextHour
FROM Master.dbo.spt_Values
WHERE Type = 'P'
AND Number BETWEEN 0 AND 23 ;
SELECT h.StartHour
,COALESCE(SUM(SalesTestData.SaleQty),0) AS Qty
FROM @hour AS h
LEFT OUTER JOIN #SalesTestData AS SalesTestData ON SalesTestData.SaleTime >= h.StartHour
AND SalesTestData.SaleTime < NextHour
AND SalesTestData.SaleTime >= @Date
AND SalesTestData.SaleTime < DATEADD(dd,1,@Date)
GROUP BY h.StartHour GO

Summarizing by Month with a Table Variable

Now I will sum the data by day for one month. I am running for April 2010, but you can change this to run for any month that there is data for. In this example I am creating a table variable and populating it with the days of the month. I then join this table to the #SalesTestData table and do my summary. I am using a left join on the #SalesTestData table. This way I will still get a 0 record for the days that there is no data. If I did an inner join then I would not get a record returned for those days.


/*To get the Sales count by day without a calendar table*/
DECLARE @DataMonth INT
,@StartDate DATETIME
,@NextDate DATETIME
,@LastDay INT ; /*Calculating the start date based on the month */
SET @DataMonth = 201004 ;
SET @StartDate = CONVERT(DATETIME,LEFT(@DataMonth,4) + '-' + RIGHT(@DataMonth,2) + '-01') ;
SET @NextDate = DATEADD(MONTH,1,@StartDate) ;
SET @LastDay = DATEPART(d,DATEADD(d,-1,@NextDate)) -1 ;
DECLARE @date TABLE
(StartDate DATETIME
,NextDate DATETIME
) ;
INSERT INTO @date
(StartDate
,NextDate
)
SELECT DATEADD(d,Number,@StartDate) AS StartDate
,DATEADD(d,Number + 1,@StartDate) AS NextDate
FROM Master.dbo.spt_Values
WHERE Type = 'P'
AND Number BETWEEN 0 AND @LastDay ;
SELECT h.StartDate AS [Date]
,COALESCE(SUM(SalesTestData.SaleQty),0) AS Qty
FROM @date AS h
LEFT OUTER JOIN #SalesTestData AS SalesTestData ON SalesTestData.SaleTime >= h.StartDate AND SalesTestData.SaleTime < h.NextDate
and SalesTestData.SaleTime >= @StartDate AND SalesTestData.SaleTime < @NextDate
GROUP BY h.StartDate ; GO

Creating a Calendar Table

Next let's create a calendar table and populate it. I'm populating the calendar table for the years 2007 to 2011, this way it will match the dates in the #SaleTestData table. It is a simple calendar table with a start date and next date for each day, and a MonthId. If I was going to create calendar table for my production system it would probably be a little more complicated, but for my demonstration this is all I need. I will then use this table to do a sum with it instead of the table variable.


DECLARE @StartDate DATETIME ;
/*Date to start populating the calendar table. Populates for 5 years*/SELECT @StartDate = '01/01/2007' ;
IF (SELECT OBJECT_ID('tempdb..#CalendarTest')
) IS NOT NULL
DROP TABLE #CalendarTest
SELECT CAST (CONVERT(CHAR(8),DATEADD(d,v.Number,@StartDate),112) AS INT) AS CalendarId
,DATEADD(d,v.Number,@StartDate) AS StartDate
,DATEADD(d,v.Number,@StartDate)+1 AS NextDate
,DATEPART(dd,DATEADD(d,v.Number,@StartDate)) AS DateOfMonth
,CONVERT(INT,REPLACE(CONVERT(CHAR(7),DATEADD(d,v.Number,@StartDate),102),'.','')) AS MonthId
INTO #CalendarTest
FROM Master.dbo.spt_Values v
WHERE Type = 'P'
AND v.Number BETWEEN 0 AND 1825 ;
--Creating a clustered index for performance
CREATE CLUSTERED INDEX IXC_#CalendarTest_CalendarId
ON #CalendarTest (CalendarId) GO

Summarizing by Month with a Calendar Table

In this example I am using a calendar table instead of the table variable. Again I am using a left join on the calendar table. This code is a little cleaner than using the table variable plus if you have a calendar table it can be reused in other code, while the table variable is only good for the script that it was written in.

/*To get the Sales count by day with a Calendar table*/
DECLARE @DataMonth INT
,@StartDate DATETIME
,@NextDate DATETIME ;
SET @DataMonth = 201004 ;
SELECT @StartDate = MIN(StartDate)
,@NextDate = MAX(NextDate)
FROM #CalendarTest
WHERE MonthId = @DataMonth ;
SELECT CalendarTest.StartDate AS [Date]
,COALESCE(SUM(SalesTestData.SaleQty),0) AS Qty
FROM #CalendarTest AS CalendarTest
LEFT OUTER JOIN #SalesTestData AS SalesTestData ON DATEPART(DAY,SalesTestData.SaleTime) = CalendarTest.DateOfMonth
AND SalesTestData.SaleTime >= @StartDate AND SalesTestData.SaleTime < @NextDate
WHERE CalendarTest.MonthId = @DataMonth
GROUP BY CalendarTest.StartDate
ORDER BY CalendarTest.StartDate ; GO

I know there are other ways to accomplish this. This is just one example of how to do it.

Rate

3.61 (23)

You rated this post out of 5. Change rating

Share

Share

Rate

3.61 (23)

You rated this post out of 5. Change rating