October 17, 2020 at 6:47 pm
I use Google...
https://weblogs.sqlteam.com/peterl/2009/07/12/how-to-calculate-the-number-of-weekdays-for-any-given/
October 17, 2020 at 7:07 pm
Jeff,
Thanks for taking a look.
The problem is indeed one of the range (as Jeffrey also pointed out while I typed this up). The math trick won't work once negative numbers get involved.
If we push back the anchor Sunday to before the interval you specify for that test, then it returns 20870, as expected, although this can't always be done. I'm not sure I get the business use of figuring out weekends at the beginning of allowed date or datetime data ranges (like the year 0), but it's a fair technical concern for thoroughness' sake, of course :).
I'm normally the paranoid sort that would point out those sorts of caveats when posting, but my diligence was flagging when I posted that; thanks for pointing it out!
Could be addressed in a number of ways, ranging from just putting a caveat on its use to having the query automatically adjust negative numbers appropriately.
I've been rolling this problem around in my head the last couple days, trying to see if I can come up with something even more concise, so I'll post some alternatives once I've given up on the brevity game.
Cheers!
Jeffrey Williams beat me to it (I was working on a proof when he posted). I too was also going to suggest that using -1 instead of 6 would solve the problem for the temporal epoch that starts on 1900-01-01 that most people deal with.
First, here's the test data generation code for a "brute force" test.
/****************************************************************************************
Creates every possible start and end date pair between the given start and end dates
where the StartDate <= EndDate.
-----------------------------------------------------------------------------------------
*** WARNING *** WARNING *** WARNING *** WARNING *** WARNING *** WARNING *** WARNING ***
*** THIS CODE DROPS THE dbo.TestDates TABLE FROM THE CURRENT DATABASE!!!! ***
*** WARNING *** WARNING *** WARNING *** WARNING *** WARNING *** WARNING *** WARNING ***
-----------------------------------------------------------------------------------------
CAUTION: This forms a very rapidly growing "Equality Triangular Join".
The following code will determine the number of rows that will be created.
--===== Given the start and end dates, calculate the number of result rows that will be
-- produced as the result of the "Equality Triangular Join" for the test generation
-- code.
DECLARE @StartDate DATETIME = '19000101' -- Change to suit.
,@EndDate DATETIME = '19041231' -- Change to suit. Must be >= @StartDate
;
SELECT (POWER(d.DayCount,2.0)+d.DayCount)/2 --Classic "Sum of the incremental sequence"
FROM (SELECT CONVERT(FLOAT,DATEDIFF(dd,@StartDate,@EndDate)+1)) d (DayCount)
;
-----------------------------------------------------------------------------------------
References:
1. "Hidden RBAR: Triangular Joins" -- Jeff Moden - 2007-12-06
https://www.sqlservercentral.com/articles/hidden-rbar-triangular-joins
-----------------------------------------------------------------------------------------
Dependencies:
1. This code uses a "Psuedo Cursor" in the form of the dbo.fnTally function as a row
source to generate the dates. It is available at the following URL.
"Create a Tally Function (fnTally)" -- Jeff Moden - 2019-08-04
https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
-----------------------------------------------------------------------------------------
Revision History:
Rev 00 - 17 Oct 2020 - Initial creation and unit test.
****************************************************************************************/
DROP TABLE IF EXISTS dbo.TestDates
;
DECLARE @StartDate DATETIME = '19000101' -- Change to suit.
,@EndDate DATETIME = '19041231' -- Change to suit. Must be >= @StartDate
;
SELECT StartDate = DATEADD(dd,lo.N,@StartDate)
,EndDate = DATEADD(dd,hi.N,DATEADD(dd,lo.N,@StartDate))
INTO dbo.TestDates
FROM dbo.fnTally(0,DATEDIFF(dd,@StartDate,@EndDate)) lo
CROSS APPLY dbo.fnTally(0,DATEDIFF(dd,DATEADD(dd,lo.N,@StartDate),@EndDate)) hi
OPTION (RECOMPILE)
;
I first used the test data generator above using 1900-01-01 and 1904-12-31 to prove there's a problem using the following test harness...
--===== Test the proposed solution code using the dates from the previously generated
-- dbo.TestDates
WITH cteCalc AS
(
SELECT *
,WeekendDayCount = y/7-x/7+(y+1)/7-(x-1)/7
,StartDoW = DATENAME(dw,StartDate)
,EndDoW = DATENAME(dw,EndDate)
FROM dbo.TestDates
CROSS APPLY (VALUES(DATEDIFF(DAY,6,StartDate),DATEDIFF(DAY,6,EndDate)))x(x,y)
)
SELECT *
FROM cteCalc
WHERE 0 = CASE
--== Test for NO split weekends (should be even number of days)
WHEN StartDOW NOT IN ('Saturday','Sunday')
AND EndDOW NOT IN ('Saturday','Sunday')
AND WeekendDayCount%2 = 0 THEN 1 --PASS
--== Test for BOTH split weekends (should be even number of days)
WHEN StartDOW IN ('Sunday')
AND EndDOW IN ('Saturday')
AND WeekendDayCount%2 = 0 THEN 1 --PASS
--== Test for BOTH whole weekends (should be even number of days)
WHEN StartDOW IN ('Saturday')
AND EndDOW IN ('Sunday')
AND WeekendDayCount%2 = 0 THEN 1 --PASS
--== Test for starting whole weekend (should be even number of days)
WHEN StartDOW IN ('Saturday')
AND EndDOW NOT IN ('Saturday','Sunday')
AND WeekendDayCount%2 = 0 THEN 1 --PASS
--== Test for ending whole weekend (should be even number of days)
WHEN StartDOW NOT IN ('Saturday','Sunday')
AND EndDOW IN ('Sunday')
AND WeekendDayCount%2 = 0 THEN 1 --PASS
--== Everything else should be an odd number of days
WHEN WeekendDayCount%2 = 1 THEN 1 --PASS
--== If it doesn't match any of the conditions above, it's a failure
ELSE 0 --Fail
END
;
It does, indeed, prove the problem with many dates occurring from 1900-01-01 thru 1900-01-07.
Then, I generated the dates from 2000-01-01 thru 2004-12-31, reran the test harness, and it proved there were no issues with that date range.
Last but not least, I did just like Jeffrey Williams did and made the formula in the cross apply use -1 (also a Sunday) instead of 6 and reran both tests with no issues. I also ran the test using the ending edge case range of 9996-01-01 thru 9999-12-31 and it also ran fine.
Although not every date pair between 1900-01-01 and 9999-12-31 has been through the "brute force" test (it would require the generation of almost 4.4 Trillion rows), my sufficiency has been mathematically suffonsified that it will work correctly for any proper date range from 1900-01-01 thru 9999-12-31.
Just to be thorough, I also tested using '17530107' (the first Sunday of the DATETIME epoch and could also use the number ) in place of the 6 and it also worked correctly for dates after that particular date. So, with that modification, I'd be comfortable in saying that it will work properly with any start and end dates where the end date >= start date and both are > 1753-01-07 and <= 9999-12-31 if the 6s in the CROSS APPLY were replace with '17530107' or the equivalent date serial number of -53684 .
Nicely done, Jacob.
--Jeff Moden
October 17, 2020 at 7:35 pm
I use Google...
https://weblogs.sqlteam.com/peterl/2009/07/12/how-to-calculate-the-number-of-weekdays-for-any-given/
That, however, doesn't count the number of weekend days as the OP requested. It wouldn't, however, provide much difficultly to the OP to realize that if you calculate the total number of days in the date range and use your good function to subtract the number of weekdays, that the number of days remaining will be the weekend days. 😀
--Jeff Moden
October 17, 2020 at 11:42 pm
I haven't gotten to spend very much time on this at all today, so I'm not sure if I've given up on the brevity game (although I'm quickly losing confidence that there is a shorter solution I'm capable of finding), but I figured I'd at least talk about the issues with the original version of the query.
First, the logic underlying the query actually handles negative offsets from the anchor date just fine. The trick is that the logic is based on the floor of division by 7, which conveniently matches integer division by 7, as long as the dividend is also positive. Once the dividend is negative, integer division by 7 gives the ceiling instead of the floor, and the solution breaks down.
There are a number of ways around this.
The brute method, and not surprisingly the slowest under every condition I tested (by a factor of more than 2 compared to the original), is to wrap each division in FLOOR and change the divisor to 7.0.
You could also implement floor using only integer math, which as far as I know is most easily done by finding the ceiling of the division (add divisor-1 to the dividend) and subtracting 1 from that result.
This works, and across the conditions I tested, increases CPU time by ~25% compared to the original. It also has the downside of adding quite a few characters to the solution and making the expression even less readable than it already is.
Since integer division already gives the floor of the division for positive divisors and dividends, another idea would be to adjust only the negative numbers, if there are any, so that integer division by 7 gives the floor of division of the original number by 7. I implemented this using a CASE expression to adjust the anchor date appropriately if either start or end date were before the original anchor date.
This at least leaves the original expression intact, but adds a lot of characters, and generally performed very slightly worse than the hand-rolled integer math implementation of floor on all divisions, whether they needed it or not.
Finally, in what I can't decide is either the most elegant or crudest of the solutions, since only the relative positions to the anchor matter, not the absolute size of the offsets, you can just add some sufficiently large multiple of 7 to the DATEDIFF(DAY... expressions in the VALUES constructor to make sure that even if a date is 0001-01-01, the resulting offset value can't be negative.
This added the fewest characters, and generally performed better than the others (typically around ~10-15% increase in CPU compared to the original), although the integer implementation of floor sometimes edged it out, but probably only because of random CPU timing quirks on my laptop.
The second problem is the "can't always push the anchor date further back" problem, which affects the knee-jerk reaction to problem 1 of just moving the anchor date back.
All of the above solutions ignore this completely, since they work regardless of what the anchor date is (as long as it's a Sunday).
There is one other solution logically similar to the original that performs reasonably well (usually similar to the integer floor implementations, but sometimes almost as fast as the original), is pretty concise (but not as concise as either the original or the original with a large positive addition to the offset), arguably easier to understand than the original, but technically susceptible to the "can't always go further back" problem.
This approach is to change the VALUES constructor to compute the day before start_date and the day after end_date. It then just adds DATEDIFF(WEEK,start_date,end_date) and DATEDIFF(WEEK,day_before_start, day_after_end). This works because DATEDIFF(WEEK.... is NOT datefirst dependent, and always uses Sunday.
Obviously if your start date is the very first allowed date or if your end date is the very last allowed date, this won't work, so it doesn't completely solve all the problems. Its decent performance, brevity, and readability probably make it one of the better choices if you don't mind the edge case it has issues with.
All in all, though, it looks like the simplest way to get around the listed problems and maintain most of the performance of the original (and keep the query short, since I'm all about imposing that arbitrary constraint on myself when I do things in my spare time for fun 🙂 ) is to add a sufficiently large multiple of 7 to the DATEDIFFs to eliminate the possibility of a negative offset.
Cheers!
EDIT: I posted this before catching up on the thread (apparently I'm adopting more and more bad habits in my increasing age, which doesn't bode well).
Thanks for all that testing Jeff!
Acknowledging the edge cases and not using the original solution on data for which it is not suited is likely even better than all the fiddling I talked about above to get around some of the known edge cases, but I think my train mostly left the "Is this practical?" station on this problem a while back 🙂 (although I haven't done any rigorous performance testing on other solutions, so it's possible that there's some practical performance benefit to such a solution).
October 18, 2020 at 3:47 pm
Just one more thing - be aware of the implicit conversions that DATEDIFF performs when using the DATE or DATETIME2 data types or string dates. Unless you really need dates before 1753-01-01 or the greater precision - use the DATETIME data type for both parameters and do not use a hard-coded string date for the anchor (use the integer value).
If you are not aware of the implicit conversions - if the first parameter is a DATETIME data type (the numeric values are datetime) and the second parameter is either DATE or DATETIME2 - and the second parameter can be converted to a DATETIME then only the second parameter is implicitly converted to a DATETIMEOFFSET(3). If the first parameter is either a DATE or DATETIME2 then both are converted to DATETIMEOFFSET.
Here is what is generated when the columns are defined as datetime - and the anchor is defined as a date:
[Expr1006] = Scalar Operator(datediff(day,CONVERT_IMPLICIT(datetimeoffset(7),'0001-01-06',0),CONVERT_IMPLICIT(datetimeoffset(3),[end_date],0))/(7)
-datediff(day,CONVERT_IMPLICIT(datetimeoffset(7),'0001-01-06',0),CONVERT_IMPLICIT(datetimeoffset(3),[start_date],0))/(7)
+(datediff(day,CONVERT_IMPLICIT(datetimeoffset(7),'0001-01-06',0),CONVERT_IMPLICIT(datetimeoffset(3),[end_date],0))+(1))/(7)
-(datediff(day,CONVERT_IMPLICIT(datetimeoffset(7),'0001-01-06',0),CONVERT_IMPLICIT(datetimeoffset(3),[start_date],0))
-(1))/(7))
If you notice in the above, the datetime is converted to datetimeoffset(3). If both data types are of date data type - the only difference in the above will be that the second parameter is converted to a datetimeoffset(7).
Compare that to when both data types are datetime and the first parameter is set to -1 (integer date = 1899-12-31):
[Expr1005] = Scalar Operator(datediff(day,'1899-12-31 00:00:00.000',[end_date])/(7)
-datediff(day,'1899-12-31 00:00:00.000',[start_date])/(7)
+(datediff(day,'1899-12-31 00:00:00.000',[end_date])+(1))/(7)
-(datediff(day,'1899-12-31 00:00:00.000',[start_date])
-(1))/(7))
If you use '1899-12-31' as the anchor date - and the second parameter is date or datetime2, both will be converted to datetimeoffset(7). Even worse...if you use '1899-12-31' as the anchor date and the second parameter is a datetime - then both will be converted to a datetimeoffset(3).
Jeffrey Williams
Problems are opportunities brilliantly disguised as insurmountable obstacles.
How to post questions to get better answers faster
Managing Transaction Logs
October 18, 2020 at 5:11 pm
Just to be sure, Jacob... I really like your solution (especially since I've tested the bejeezus out of it in the area of functionality) because of the integer math, lack of CASE statements, and zero dependency on either language or DATEFIRST settings.
I still have my own performance testing to do on it (same data generator is going to help with that) by pitting it against other common methods but my initial "usage" impression is that it's both fast and scalable.
With all that, I'll say it again... Nicely Done, Jacob!
--Jeff Moden
October 18, 2020 at 9:29 pm
Just one more thing - be aware of the implicit conversions that DATEDIFF performs when using the DATE or DATETIME2 data types or string dates....
Yeah, I've noticed this when testing similar queries in the past and saw consistent differences in CPU time based solely on this (how the second and third parameters are passed to DATEDIFF). It's definitely worth being aware of. Thanks!
Just to be sure, Jacob... I really like your solution (especially since I've tested the bejeezus out of it in the area of functionality) because of the integer math, lack of CASE statements, and zero dependency on either language or DATEFIRST settings.
I still have my own performance testing to do on it (same data generator is going to help with that) by pitting it against other common methods but my initial "usage" impression is that it's both fast and scalable.
With all that, I'll say it again... Nicely Done, Jacob!
Thanks Jeff! That definitely means a good deal coming from you. At least my little grey cells, they still work sometimes 🙂
October 20, 2020 at 9:05 pm
I was thinking more about this - realized we can do this:
SELECT weekend_day_count=y/7-x/7+(y+1)/7-(x-1)/7
FROM (VALUES(DATEDIFF(DAY,-53690,'1899-01-01') + 1,DATEDIFF(DAY,-53690,'1900-01-14') + 1))x(x,y)
We can specify the anchor date as 1753-01-01 (-53690) and then add 1 to shift to the Sunday prior.
This also works for date/datetime2 and the anchor date of 0001-01-01 - but we need to specify it as:
SELECT weekend_day_count=y/7-x/7+(y+1)/7-(x-1)/7
FROM (VALUES(DATEDIFF(DAY,'0001-01-01 00:00:00.0000000 +00:00','1899-01-01') + 1,DATEDIFF(DAY,'0001-01-01 00:00:00.0000000 +00:00','1900-01-14') + 1))x(x,y)
The above will still implicitly convert the second parameter to a datetimeoffset - but the first parameter isn't converted. To eliminate the implicit converts you would have to specify the columns as a datetimeoffset instead of date, datetime2 or datetime.
Jeffrey Williams
Problems are opportunities brilliantly disguised as insurmountable obstacles.
How to post questions to get better answers faster
Managing Transaction Logs
October 20, 2020 at 10:13 pm
I was thinking more about this - realized we can do this:
SELECT weekend_day_count=y/7-x/7+(y+1)/7-(x-1)/7
FROM (VALUES(DATEDIFF(DAY,-53690,'1899-01-01') + 1,DATEDIFF(DAY,-53690,'1900-01-14') + 1))x(x,y)We can specify the anchor date as 1753-01-01 (-53690) and then add 1 to shift to the Sunday prior.
This also works for date/datetime2 and the anchor date of 0001-01-01 - but we need to specify it as:
SELECT weekend_day_count=y/7-x/7+(y+1)/7-(x-1)/7
FROM (VALUES(DATEDIFF(DAY,'0001-01-01 00:00:00.0000000 +00:00','1899-01-01') + 1,DATEDIFF(DAY,'0001-01-01 00:00:00.0000000 +00:00','1900-01-14') + 1))x(x,y)The above will still implicitly convert the second parameter to a datetimeoffset - but the first parameter isn't converted. To eliminate the implicit converts you would have to specify the columns as a datetimeoffset instead of date, datetime2 or datetime.
Not sure if this will work but mathematically:
y/7-x/7+(y+1)/7-(x-1)/7
can be simplified into
2*(1 - x + y)/7
October 20, 2020 at 10:40 pm
Jonathan,
That's not the same because of the first expression's using integer math.
The second expression is just giving you 2/7 of the number of days in the specified interval, which of course is a reasonable guess at the number of weekend days in an interval if you don't know anything about the day of the week on which the interval started.
Over very long intervals, it will seem very close, but if you use that formula on, say 20201024-20201025, where the correct answer is 2, that formula will give 0.57xxxxx instead (or 0 in your new formulation after the edit).
I tried several simplifications early on; the calculation exploits things about integer division that make the expression difficult to simplify via normal means.
Cheers!
October 21, 2020 at 7:40 am
I love the use of math to solve this. However, I am stumped as to why/how the math works. An explanation for Dummies would be greatly appreciated.
y /7 -- Full weeks to end_date
- x /7 -- Full weeks to start_date
+ (y+1) /7 -- Full weeks to day after end_date
- (x-1) /7 -- Full weeks to day before start_date
October 21, 2020 at 1:17 pm
I love the use of math to solve this. However, I am stumped as to why/how the math works. An explanation for Dummies would be greatly appreciated.
y /7 -- Full weeks to end_date
- x /7 -- Full weeks to start_date
+ (y+1) /7 -- Full weeks to day after end_date
- (x-1) /7 -- Full weeks to day before start_date
Those are some quantities those expressions represent; the trick is that with a Sunday anchor date they also represent some other quantities that turn out to be more obviously useful:
x/7 --Number of Saturdays after anchor date, before start_date
(x-1)/7 --Number of Sundays after anchor date, before start_date
y/7 --Number of Sundays after anchor date, before AND including end_date
(y+1)/7 --Number of Saturdays after anchor date, before AND including end_date
Once you realize that, the trick is less tricky 🙂
Cheers!
October 22, 2020 at 8:25 am
DesNorton wrote:I love the use of math to solve this. However, I am stumped as to why/how the math works. An explanation for Dummies would be greatly appreciated.
y /7 -- Full weeks to end_date
- x /7 -- Full weeks to start_date
+ (y+1) /7 -- Full weeks to day after end_date
- (x-1) /7 -- Full weeks to day before start_date
Those are some quantities those expressions represent; the trick is that with a Sunday anchor date they also represent some other quantities that turn out to be more obviously useful:
x/7 --Number of Saturdays after anchor date, before start_date
(x-1)/7 --Number of Sundays after anchor date, before start_date
y/7 --Number of Sundays after anchor date, before AND including end_date
(y+1)/7 --Number of Saturdays after anchor date, before AND including end_dateOnce you realize that, the trick is less tricky 🙂
Cheers!
Thank You. Now it makes sense.
Viewing 13 posts - 46 through 58 (of 58 total)
You must be logged in to reply to this topic. Login to reply