Comparison of Dates in SQL

  • Manish Sinha

    Ten Centuries

    Points: 1023

    Comments posted to this topic are about the item Comparison of Dates in SQL

  • Henrik Staun Poulsen

    SSCertifiable

    Points: 6218

    Is SQL Server able to use indexes when using DataDiff(Day, D1, D2) = 0 ?

    Best regards

    Henrik Staun Poulsen

  • mandu

    SSC Enthusiast

    Points: 105

    I think that this articole is very academic and it is not useful for work, because it works, it's true, but the indexes? Do the indexes works?

    Bye

    mandu

  • Tao Klerks

    SSCarpal Tunnel

    Points: 4157

    Hi,

    The following statement is untrue: "it is always a good practice to use DATEDIFF method for date comparison instead of making comparison using logical operators (,=)".

    As the previous two posters noted, using a function call on an indexed date column (or any indexed column, for that matter) prevents SQL Server from doing an efficient "seek" on the values in that column for that index - all the values need to be retrieved first, then the function applied. If the date column has the highest selectivity, then this can be a severe performance killer.

    A safer way to ensure that you get all dates in a certain range, ignoring the "time" component, without applying a function to the source column, is using DateAdd as follows:

    --select the correct data type here, DateTime vs SmallDateTime,

    -- as you might otherwise incurr another conversion function call

    -- on the source column!

    DECLARE @FromDate DateTime

    DECLARE @ToAndIncludingDate DateTime

    SET @FromDate = '2009-04-01'

    SET @ToAndIncludingDate = '2009-04-15'

    SELECT *

    FROM SomeTable

    WHERE (SomeColumn >= @FromDate AND SomeColumn < DateAdd(Day, 1, @ToAndIncludingDate) )

    HTH,

    Tao

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • benr-758674

    Valued Member

    Points: 64

    I am pretty sure that use of datediff won't work well with indexes, since it requires a calculation to be performed on every row that is being compared.

    When I am working with dates, I always say >= the begin date, and then = @startdate and somedate < @enddate -- Note I am selecting all records PRIOR to the next day

    END

    [/code]

    Since the @enddate is effectively the beginning of the day AFTER the date required, saying < will then get all records up to and including the end of the day requested...

    Here are the date functions I use. Unfortunately I cannot remember if I wrote them myself or found them somewhere... My dbdate function used to use the convert function to get a string representation and then truncate and convert the result back into a date, but I ran into trouble with different regional settings (where dates and times are expressed differently). The method below is a bit slower and but seems to work for any regional setting.

    -- Remove the time portion from a date.

    CREATE FUNCTION [dbo].[DBDate] (@DATE datetime)

    RETURNS datetime

    AS

    BEGIN

    DECLARE @Date1 DateTime

    SET @Date1 = @Date

    SET @Date1 = DateAdd(ms, -1 * DatePart(ms, @Date1), @Date1)

    SET @Date1 = DateAdd(ss, -1 * DatePart(ss, @Date1), @Date1)

    SET @Date1 = DateAdd(mi, -1 * DatePart(mi, @Date1), @Date1)

    SET @Date1 = DateAdd(hh, -1 * DatePart(hh, @Date1), @Date1)

    RETURN @Date1

    END

    -- Get 12AM of the next day

    CREATE FUNCTION [dbo].[DBDateNext] (@DATE datetime)

    RETURNS datetime

    AS

    BEGIN

    RETURN DateAdd(dd, 1, dbo.DBDate(@Date))

    END

  • Tao Klerks

    SSCarpal Tunnel

    Points: 4157

    Interesting approach - we use Convert, relying on the fact that DateTime values are internally stored as floating-point values, with every whole number representing a day:

    SET @DateWithoutTime = Convert(DateTime, Floor(Convert(Float, @OriginalDate)))

    Any idea which is faster? I don't have time to test right now, but I'm quite curious 🙂

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • Paul White

    SSC Guru

    Points: 150341

    henrik staun poulsen (4/28/2009)


    Is SQL Server able to use indexes when using DateDiff(Day, D1, D2) = 0 ?

    Best regards

    Henrik Staun Poulsen

    FTFY 🙂

    The answer is yes.

    But only if:

    • There is an index on a persisted column in the table that is defined as datediff(day,d1,d2); or
    • There is an index on an indexed view column which resolves to datediff(day,d1,d2)

    Otherwise, no 😉

    Cheers,

    Paul

  • benr-758674

    Valued Member

    Points: 64

    Hi Tao - guess we could always just run both methods over a largish table and get stats out of the query analyser (like cpu usage, etc) ... In terms of finding the start of a day - the method I proposed has a lot of steps in it than yours - so its probably slower, also, your method is compact enough just to use in the original sproc rather than doing a function call.

  • craig.lovegren

    SSC Veteran

    Points: 229

    guess we could always just run both methods over a largish table and get stats out of the query analyser (like cpu usage, etc)

    I did exactly that, using the small example, you find that the datediff and dateadd are nearly identical, indexes or not. So I grabbed a large table (a few million rows) from our systems, indexed them appropriately and ran a few tests. When comparing my three tests, datediff ranked at 90%, dateadd, although a complex plan, took only 9%, and the final test took the remaining 1%. What we typically do for tables which are queried on a datetime field religiously is to actually create another column with the date portion only. It does require an extra 4 bytes (smalldatetime) per row, but disk space is cheap. Just remember to update the column after the data load in a batch update statement, don't use triggers, computed columns, or calcs in your inserts, as this will slow your data loads down drastically.

  • Paul White

    SSC Guru

    Points: 150341

    Tao Klerks (4/28/2009)


    The following statement is untrue: "it is always a good practice to use DATEDIFF method for date comparison instead of making comparison using logical operators (,=)".

    Agreed.

    Tao Klerks (4/28/2009)


    As the previous two posters noted, using a function call on an indexed date column (or any indexed column, for that matter) prevents SQL Server from doing an efficient "seek" on the values in that column for that index - all the values need to be retrieved first, then the function applied. If the date column has the highest selectivity, then this can be a severe performance killer.

    Not always.

    As mentioned previously, the QO will consider using a useful index containing a function on a persisted column or as part of any indexed view (enterprise only). Note that the QO may consider using an index from a view if that index covers a guaranteed superset of the target rows. See http://msdn.microsoft.com/en-us/library/ms187864(SQL.90).aspx or Books Online for details.

    There are some trivial examples where a function will not prevent a seek - one is adding ISNULL around a non-nullable column. This doesn't work with COALESCE.

    Tao Klerks (4/28/2009)


    A safer way to ensure that you get all dates in a certain range, ignoring the "time" component, without applying a function to the source column, is using DateAdd...

    True. Though the example given may not result in a good plan either, because SQL Server cannot sniff the local variable parameters. The usual solutions apply: create a nested procedure, use dynamic SQL, add OPTION (RECOMPILE) or (OPTIMIZE FOR(@var=value)).

    Cheers,

    Paul

  • Atif-ullah Sheikh

    SSChampion

    Points: 12495

    From the following code, it is verified that the Index is used with or without function on the column...

    Create Table #t1 (mid int,gdate datetime)

    CREATE CLUSTERED INDEX [IX_ALIEN_MISSILE] ON [#t1]

    (

    [gdate] ASC

    ) ON [PRIMARY]

    Insert into #t1 Values ( 1,'01-01-2009 06:00:00' )

    Insert into #t1 Values ( 2,'01-01-2009 07:00:00' )

    Insert into #t1 Values ( 3,'01-01-2009 04:00:00' )

    Insert into #t1 Values ( 4,'01-01-2009 03:00:00' )

    Insert into #t1 Values ( 5,'01-02-2009 04:00:00' )

    Insert into #t1 Values ( 6,'01-02-2009 05:00:00' )

    Insert into #t1 Values ( 7,'01-02-2009 06:00:00' )

    Insert into #t1 Values ( 8,'01-02-2009 06:00:00' )

    Insert into #t1 Values ( 9,'01-03-2009 04:00:00' )

    Insert into #t1 Values ( 10,'01-03-2009 04:00:00' )

    Insert into #t1 Values ( 11,'01-03-2009 05:00:00' )

    Insert into #t1 Values ( 12,'01-03-2009 06:00:00' )

    Insert into #t1 Values ( 13,'01-03-2009 07:00:00' )

    Insert into #t1 Values ( 14,'01-04-2009 08:00:00' )

    Insert into #t1 Values ( 15,'01-04-2009 09:00:00' )

    Insert into #t1 Values ( 16,'01-04-2009 10:00:00' )

    Insert into #t1 Values ( 17,'01-04-2009 10:00:00' )

    Select * from #t1 where Datediff(day,gdate,Convert(datetime,'2009-01-01 00:00:00.000')) = 0

    Select * from #t1 where Convert(datetime,Convert(varchar(20),gdate,110)) = Convert(datetime,'2009-01-01 00:00:00.000')

    Select * from #t1 where gdate = Convert(datetime,'2009-01-01 00:00:00.000')

    drop table #t1

    On checking the execution plan, it is clear that the Inde Scan is Used on the Convert of DateDiff Function and Index Seek is Used if used without function on the column. In either case, Index is Used...

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Sometimes, winning is not an issue but trying.
    You can check my BLOG
    [font="Arial Black"]here[/font][/url][/right]

  • Tao Klerks

    SSCarpal Tunnel

    Points: 4157

    This is not something I had ever heard of:

    Paul White (4/28/2009)


    Though the example given may not result in a good plan either, because SQL Server cannot sniff the local variable parameters.

    Are you saying that the following statement would result in a better query plan? (or more reliably result in a good query plan?)

    SELECT *

    FROM SomeTable

    WHERE (SomeColumn >= '2009-04-01' AND SomeColumn = '2009-04-01' AND SomeColumn < DateAdd(Day, 1, '2009-04-15') )

    Is there any reference to this issue / mechanism anywhere online that I can look at?

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • Paul White

    SSC Guru

    Points: 150341

    benr (4/28/2009)


    When I am working with dates, I always say >= the begin date, and then < the next day. Eg. If you want all records up to and including the end of 31st March 2009 your query should ask for all records prior to the 1st of April. This method should then be able to leverage indexes.

    This is true, and a valid approach, unless either date column can be NULL.

    Adding ISNULL or COALESCE on a NULLable column will prevent index usage (subject to the caveats I already posted). Using OR to cope with the NULLs will almost always prevent index usage too. Using a number of UNION ALLs to cope with the NULLs will usually be able to use indexes, but n seeks and n - 1 UNION ALLs will not always be cheaper than one index scan.

    The functions posted can be improved because they:

    1. Use multiple SETs where a single SELECT would suffice

    2. Does not specify SCHEMABINDING to mark it as deterministic

    3. Does not specify RETURNS NULL ON NULL INPUT to avoid running for NULL inputs

    The function body can be rewritten in one line as below

    RETURN CONVERT(DATETIME, CONVERT(CHAR(8), @Date, 112), 112)

    If the new function is marked with SCHEMABINDING, it is deterministic and can be used in a persisted column. It should also be marked with RETURNS NULL ON NULL INPUT.

    Cheers,

    Paul

  • Paul White

    SSC Guru

    Points: 150341

    Tao Klerks (4/28/2009)


    Interesting approach - we use Convert, relying on the fact that DateTime values are internally stored as floating-point values, with every whole number representing a day

    The FLOAT method is imprecise in the sense required for functions to be used in persisted columns. For that reason, I wouldn't use this code as the basis for a scalar UDF.

    I don't suppose Microsoft are about to change the relationship between FLOATs and DATETIMEs, but it seems an unnecessary hidden dependency to add to your code.

    Some folks also view this trick as a bit of a hack, when a better solution exists.

    Paul

  • Paul White

    SSC Guru

    Points: 150341

    craig.lovegren (4/28/2009)


    I did exactly that, using the small example, you find that the datediff and dateadd are nearly identical, indexes or not. So I grabbed a large table (a few million rows) from our systems, indexed them appropriately and ran a few tests. When comparing my three tests, datediff ranked at 90%, dateadd, although a complex plan, took only 9%, and the final test took the remaining 1%.

    It sounds as if you are quoting percentage batch costs from the execution plan. If so, please be aware that these percentages are based on the optimizers cost estimates. To accurately report performance results, you need a reproducible test set up (with code) and output from SET STATISTICS IO, TIME ON as a minimum. It is also common to start with a cold cache. A posted actual execution plan is often helpful too.

    craig.lovegren (4/28/2009)


    What we typically do for tables which are queried on a datetime field religiously is to actually create another column with the date portion only. It does require an extra 4 bytes (smalldatetime) per row, but disk space is cheap. Just remember to update the column after the data load in a batch update statement, don't use triggers, computed columns, or calcs in your inserts, as this will slow your data loads down drastically.

    Well that all rather depends on your priorities. Computed columns have the distinct advantage of being guaranteed to match the source data at all times. Someone could update one of your date columns and not the other.

    I am confused as to how a computed column slows down your data loads 'drastically'. Also, simple scalar calculations in the SELECT part of an INSERT are typically so much cheaper than the INSERT operation that it makes no odds.

Viewing 15 posts - 1 through 15 (of 111 total)

You must be logged in to reply to this topic. Login to reply