With every new year I think a little bit about time and dates. This posts looks a little more at that in TSQL.
Rounding Dates: Which way is best?
Sometimes in TSQL you need to round a datetime value to the precision of either a day, hour, minute, or second.
I realized recently that I have a few ways I know how to do this, but I wasn’t sure which was the most efficient.
I did a little searching and didn’t find anything super conclusive. I had a little chat with Jeremiah Peschka (blog | twitter) and he told me which way he thought was fastest and why.
And so I decided to run some tests. Jeremiah has a way of being right about these things, but I had to see for myself.
I’ll go ahead and tell you: He was totally right, and I’ll show you why. But I learned a couple things along the way.
To get started, let’s review some of our friends, the SQL Server datatypes. Hi friends!
Rounding to the Day
The most frequent case in which I need to round dates is to the day level. So instead of ’1/4/2011 6:15:03.393921′, I want just ’1/4/2011′.
SQL 2008′s date type made this a lot easier for everyone– now we can just cast a datetime or datetime2 value as a date, and we’ve got what we need. PLUS, our new value is nice and small, weighing in at 3 bytes.
I think most everyone agrees, we like this!
SELECT CAST('1/1/2010 23:59:59.000' AS DATE) AS [I'm a date!]
This is still a bit more complicated. When you start thinking about these and different datatypes, you need to make sure you understand what you mean by rounding.
In SQL Server, our datatypes actually have some different opinions about what rounding means. Check this out:
SELECT CAST('1/1/2010 23:59:59.000' AS DATETIME) AS [I'm a DATETIME!], CAST('1/1/2010 23:59:59.000' AS DATETIME2(0)) AS [I'm a DATETIME2(0)!'], CAST('1/1/2010 23:59:59.000' AS SMALLDATETIME) AS [I'm a SMALLDATETIME, AND I'm very confused.], CAST('1/1/2010 23:59:59.000' AS DATE) AS [I'm a DATE!]
The SMALLDATETIME value rounds this up to January 2nd, instead of January 1. The Date datatype does not.
In considering whether or not to use SMALLDATETIME, you need to learn and establish whether or not to round up for minutes and date values. With a different example, if something occurred at 12:30:31 AM, would that be represented as having happened in the 12:30 minute, or at 12:31?
Most of us actually want to round down. We want the largest minute number which is less than or equal to the datetime value. This is similar to what FLOOR does for integers. You could also call this truncating the portion of the datetime value you don’t want. This is not, however, what SMALLDATETIME gives you, so use it with care.
So this is what I’m saying:
Like, seriously, SMALLDATETIME: you are SO messed up.
So given that warning, let’s actually round some date values, and let’s compare the efficiency of each method.
To start out with, let’s create a table and toss in a bunch of date values. We’ll run queries against these dates and measure SQL Server’s abilities to work with it.
To make up a bunch of datetime data, I’m using my trusty recursive CTE from my prior post.
--Populate a table with some data CREATE TABLE dbo.Donuts ( DonutTime DATETIME2(7) ) DECLARE @startDate DATETIME2(7)= '2010-12-01 00:00:00' , @endDate DATETIME2(7)= '2010-12-11 01:30:00' ; WITH MyCTE AS ( SELECT @startDate AS [Makin' the Donuts] UNION ALL SELECT DATEADD(ms, 1225, [Makin' the Donuts]) FROM MyCTE WHERE [Makin' the Donuts] < @endDate ) INSERT dbo.Donuts SELECT [Makin' the Donuts] FROM MyCTE OPTION ( MAXRECURSION 0 ) ; SELECT @@ROWCOUNT --We now have 709716 rows of DonutTime
Now let’s look at different methods to manipulate datevalues. For our examples I’ll be rounding to the minute.
As we learn TSQL, this is the first method that occurs to us. We know DATEPART will return part of a date (great name!), so we can chop apart the bits. However, to get them back together properly we have to turn each part into a string to clue them back together. And then if we want to treat it like a date (which we pretty much always do), we have to cast it back.
Just look at this baby. It’s pretty ugly.
SELECT CAST(CAST(DATEPART(YY, DonutTime) AS CHAR(4)) + '-' + CAST(DATEPART(MM, DonutTime) AS NVARCHAR(2)) + '-' + CAST(DATEPART(DD, DonutTime) AS NVARCHAR(2)) + ' ' + CAST(DATEPART(hh, DonutTime) AS NVARCHAR(2)) + ':' + CAST(DATEPART(mi, DonutTime) AS NVARCHAR(2)) + ':00.000' AS DATETIME2(0)) AS [Wow, that was a lot of typing.] FROM dbo.Donuts
Running this (after cleaning out buffers), I got these results:
There’s a couple of variations on contestant #2. I’ll take the one I like best, which is casting to a smaller byte size by using DATETIME2(0), which is 6 bytes rather than 8 and effectively truncates to the second. Then I’ll subtract the seconds.
SELECT DATEADD(ss, -DATEPART(ss, DonutTime), CAST (DonutTime AS DATETIME2(0))) FROM dbo.Donuts
Running this one (yes, I cleaned out the buffers), I got these results:
Well now, that’s much lower CPU time there.
NB: I did test, and in all my trials it was lower CPU time to cast into DATETIME2 rather than using a nested DATEADD function to subtract milliseconds.
This contestant is near and dear to my heart. I like it because it’s easy for me to remember. You take a short trip into CHAR() with the 121 date format and set the length to chop off the parts of the date you don’t want. Then you cast or convert back to a DATETIME2(0).
I think I like this one because it feels just a little bit violent. But not in a bad way. It’s like roller derby.
SELECT CAST(CONVERT(CHAR(16), DonutTime, 121) AS DATETIME2(0)) FROM dbo.Donuts
Oh, sad. This one didn’t do very well. It’s definitely better than Contestant #1, at least.
Here’s the method Jeremiah suggested to me. The way he described it was “Just figure out the number of minutes since the beginning of time, and use that.”
Being a philosophy major, I of course asked “So, when was the beginning of time?”
Being a developer, he answered, “Just call it zero.”
SELECT DATEADD(mi, DATEDIFF(mi, 0, CAST(DonutTime AS DATETIME2(0))), 0) FROM dbo.Donuts
Here are the results (clean buffers, as usual):
Ooo, check out the CPU time on that one.
Note: I ran a few trials and this is faster on the CPU when you cast as DATETIME2(0) before doing your maths. I did that to make all things equal with the other contestants, who had the same benefit.
Here’s a recap of how everyone performed:
Why did contestants 2 and 4 do so well?
Jeremiah pointed out that datetime values are stored internally as two four byte integers. (BOL reference: see “Remarks”) Performing mathematic functions on an integer value is a nice fast activity on the CPU.
Performing conversions back and forth to character based datatypes, however, is not so natural, nor so fast.
What’s the internal storage format of DateTime2? Well, I’m not sure about that one. BOL isn’t so up-front about these things anymore. If you happen to know, please tell me in the comments. I can tell, however, that it’s something that enjoys mathematics.