SQLServerCentral Article

A Stumbling Count

,

Calculating the difference between two dates is a common requirement in the business and manufacturing world. Take for instance delivery time, which can be defined as the difference in days between an order date and a ship date. Managers connect a KPI to delivery time, and they like to know about average delivery times.

The calculation of that difference is not as simple as one might think. To set up the environment, let me start with a very basic example of an order details table. After inserting a few records it looks like this:

select
  Ordernr
, OrderLine
, OrderDate
, Shipdate
 from dbo.Testorderdetails

Ordernr OrderLine OrderDate ShipDate
-------- --------- ---------- ----------
20120 001 2011-04-20 2011-04-27
20120 002 2011-04-20 2011-04-27
20120 003 2011-04-20 2011-05-02
20121 001 2011-04-28 2011-05-03
20122 001 2011-05-01 2011-05-02
20122 002 2011-05-01 NULL
20123 001 2011-05-04 2011-05-04

Notice the NULL value for ShipDate on the sixth record. This means that this particular OrderLine has not been shipped yet. When you see two dates on the same record and you want to know the difference between those two dates, the first thing that comes to mind is the T-SQL datediff() function:

select 
  Ordernr
, Orderline
, datediff(dd, OrderDate, ShipDate) as diff 
 from dbo.Testorderdetails

which produces this result:

Ordernr OrderLine diff
-------- --------- -----------
20120 001 7
20120 002 7
20120 003 12
20121 001 5
20122 001 1
20122 002 NULL
20123 001 0

The problem with this query is that it most likely does not give you what you want. When the value of diff is higher than four, you are certain that at least one Saturday or Sunday is included in the calculation, but you need the result of the query expressed in production days, Saturdays and Sundays not included.

The next thing you can try is typing some terms in your favourite search engine, and it comes up with a T-SQL function that you hope will solve this problem, like the one below:

CREATE FUNCTION dbo.WorkDayDiff (@BegDate datetime ,@EndDate datetime)
RETURNS int
BEGIN
Declare @NumWeek int
set @BegDate = isnull(@BegDate,0)
set @EndDate = IsNull(@EndDate,0)
set @NumWeek = DateDiff(ww, @BegDate, @EndDate)
Return (datediff(dd, @BegDate, @EndDate) - (@numWeek * 2))
END
GO
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS ON
GO

Now your result looks like this:

select 
  Ordernr
, Orderline
, dbo.WorkDayDiff(OrderDate, ShipDate) as diff 
 from dbo.Testorderdetails

Ordernr Orderline diff
-------- --------- -----------
20120 001 5
20120 002 5
20120 003 8
20121 001 3
20122 001 1
20122 002 -29044
20123 001 0

This already looks pretty good. It subtracts two days from the datediff() function for every week that lies between the two dates, and that makes sense. On the sixth record, however, there is a strange value for diff. This has to do with the NULL value for ShipDate on that record. Of course you can use the isnull() function to replace this value with something more meaningful. But this solution has another problem.

It does not take holidays into account. Of course, nobody told you about that particular requirement, but it surely will come up after you have implemented the function. Do not forget to consult your manager, though, before you make any changes in the requirements document.

The problem with holidays is that you cannot build them into a generic function like the one I used above, because sometimes they are dependent on a lunar calendar, sometimes they are dependent on a company's policy and sometimes they are calculated according to a certain formula, like Thanksgiving Day in the USA (fourth Thursday in November). It is not so easy to solve this puzzle. It seems impossible to make the calculation by comparing two dates on the same record. As soon as you throw in the holidays things just get too complicated. The solution here is to put al these holidays in a date dimension, or calendar table as some people prefer to call it. A date dimension is rather easy to implement, but it should have some properties and it needs a few additions in order to satisfy your particular needs.

First of all it should have at least one indicator that tells you if a date is a production day. A production day is defined as a day that is not a Saturday, Sunday, or a holiday. As a demo I have built a very basic date dimension and populated it with some unrealistic data (see script1). Hopefully this script is self-explanatory. Note the Numbers table that is used in this script. Please follow the link that I provided in the code to see some interesting applications of this table.

You are going to use the date dimension as a so-called role playing dimension. This means that you will have two aliases of the calendar table and then join each of them to the demo orders table. One join will be on order date and the other on ship date. Now you can calculate the difference between the two dates by traversing the calendar table and totaling the production days. The T-SQL code looks like this:

select 
  a.Ordernr
, a.Orderline
, SUM(case when a.OrderDate=b.Date then 0 
      else case b.ProdDay when 'Y' then 1 
           else 0 end 
      end) as diff 
 from dbo.Testorderdetails a
inner join dbo.TstCalendar b on a.OrderDate <= b.Date and b.date <=GETDATE()
inner join dbo.TstCalendar c on a.ShipDate = c.Date
where b.date <= c.date
group by a.ordernr, a.orderline
order by a.ordernr, a.orderline

Ordernr Orderline diff
-------- --------- -----------
20120 001 5
20120 002 5
20120 003 8
20121 001 3
20122 001 1
20123 001 0

Notice that the record with the NULL value for ShipDate has disappeared. This is exactly what you want, and it happens because the inner join from the date dimension to ShipDate is unsuccessful in case of a NULL value. Of course there is no performance problem on a table as small as this one, but on large tables the query is a bit too slow. It takes over 28 seconds on my test machine over a 300000 records orders table. Also the nested case calculation for the diff column looks like improvised patchwork:

SUM(case when a.OrderDate=b.Date then 0 else case b.ProdDay when 'Y' then 1 else 0 end end) as diff

This is done because you do not want to count the OrderDate itself in the total of production days.

How do you make your calculation simpler and faster? I suggest you should start from the solution rather than from the problem. In the end we are dealing with the simplest of all problems: all you want to do is subtract one number from another. So, why not arrange both numbers in such a way that they are easily subtractable? The first step in the right direction is the idea that the use of a date dimension gives you the opportunity to attach a number to each individual date. In this case that number indicates how many production days have passed between the beginning of time and the current date. The beginning of time is, of course, the first date in your date dimension.

If these numbers correctly reflect the number of production days that have passed, your problem is solved. And how do you make them correctly reflect reality? We assume that every date that has a 'Y' as a value for the ProdDay flag, is a production date. Your first inclination probably would be to filter on that value 'Y', place a running count next to it, and put everything in a view. This will correctly count the production days, leaving out non-production days. But that is a problem. You need the non-production days as well, because thanks to the Internet there are order dates on holidays and on weekends. Furthermore, you want to make things as simple as possible. Therefore you want to avoid the complexities resulting from the use of outer joins on your date dimension. Because you need inner joins, you cannot have missing values in your final solution. That is why you cannot leave the non-production days out, and at the same time you cannot really count them. You need a running count that runs on production days but stops on non-production days. This is how you achieve that goal:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
create view [dbo].[ProdDayNumbers]
as
with ProdDays
(Sortkey, DateKey,[Date], DayCount)
as
(
select 1 , Datekey , [Date] , ROW_NUMBER() OVER ( Order by [Date])
from dbo.TstCalendar where ProdDay ='Y'
union
select 2 , Datekey , [Date] , ROW_NUMBER() OVER ( Order by [Date])
from dbo.TstCalendar where ProdDay ='N'
)
select DateKey , [Date] , case Sortkey when 1 then DayCount
else ROW_NUMBER() OVER ( Order by [Date]) -DayCount end as ProdDayNumber
from ProdDays
GO

What happens in this code?

First there is a common table expression (CTE) called ProdDays that separates production days from non-production days and counts the records for each group separately, ordered by date. This works like a running count for each group.

In the second part you select the date from this CTE. In the case expression you say that when the date has a 1 for SortKey it should use the counter. When it has a 2 for SortKey it is more complicated: You subtract the counter (which in this case counts the non-production days) from the running count over the whole date dimension. Suppose you are on record five and this is the first non-production day in the date dimension. The calculation is: Running count over all dates - Running count over non-production dates = 5 - 1 = 4.

This all sounds a bit complicated but once you have implemented the code you can see that it turns out to be rather simple.

An example of an implementation

select a.Ordernr, 
a.OrderLine,
c.ProdDayNumber - b.ProdDayNumber as diff
from dbo.Testorderdetails a
join dbo.ProdDayNumbers b on a.OrderDate = b.Date
join dbo.ProdDayNumbers c on a.ShipDate = c.Date
where b.Date<=c.Date
order by a.Ordernr, a.OrderLine

Ordernr OrderLine diff
-------- --------- --------------------
20120 001 5
20120 002 5
20120 003 8
20121 001 3
20122 001 1
20123 001 0

On my simple test machine and using a some 300000 record table, this code runs in about 12 seconds, at least twice as fast as the direct joins to the calendar table. Additionally it is much simpler to implement and easier to understand than the other code. I think together with a numbers table and a calendar table, this view is a useful addition to the toolbox every T-SQL developer should have.

Resources

Rate

4.22 (18)

You rated this post out of 5. Change rating

Share

Share

Rate

4.22 (18)

You rated this post out of 5. Change rating