Why is CAST using an Index Seek??

  • SoCal_DBD

    Hall of Fame

    Points: 3051

    I’m a new member on a team and came across the following query (which I’ve obfuscated) in a stored procedure and immediately thought I would be able to add value by explaining that casting a column used in the criteria would be a performance problem. I took the original query and my re-worked version and got the estimated execution plan. Much to my dismay, both queries are doing an index seek! I have nothing to back up my suggestion now. Why on earth is it doing a seek on what I thought was the poorly written query? The table itself is very small since it’s a dev environment – approximately 800 rows. There is a nonclustered index on CreatedDate.

    -- What is being used

    SELECT SomeID,

    CreatedDate

    FROM SomeHistory

    WHERE CAST(CreatedDate AS DATE) BETWEEN '2/1/2016' AND '2/8/2016'

    ORDER BY CreatedDate desc

    -- What I want to suggest

    SELECT SomeID,

    CreatedDate

    FROM SomeHitory

    WHERE CreatedDate BETWEEN '2/1/2016' AND '2/9/2016'

    ORDER BY CreatedDate desc

    I'm also attaching images of the execution plans as well as the properties of the Index Seek items from the plan.

    Thank you!

  • TheSQLGuru

    SSC Guru

    Points: 134017

    LSAdvantage (2/8/2016)


    I’m a new member on a team and came across the following query (which I’ve obfuscated) in a stored procedure and immediately thought I would be able to add value by explaining that casting a column used in the criteria would be a performance problem. I took the original query and my re-worked version and got the estimated execution plan. Much to my dismay, both queries are doing an index seek! I have nothing to back up my suggestion now. Why on earth is it doing a seek on what I thought was the poorly written query? The table itself is very small since it’s a dev environment – approximately 800 rows. There is a nonclustered index on CreatedDate.

    -- What is being used

    SELECT SomeID,

    CreatedDate

    FROM SomeHistory

    WHERE CAST(CreatedDate AS DATE) BETWEEN '2/1/2016' AND '2/8/2016'

    ORDER BY CreatedDate desc

    -- What I want to suggest

    SELECT SomeID,

    CreatedDate

    FROM SomeHitory

    WHERE CreatedDate BETWEEN '2/1/2016' AND '2/9/2016'

    ORDER BY CreatedDate desc

    I'm also attaching images of the execution plans as well as the properties of the Index Seek items from the plan.

    Thank you!

    To my recollection, the SQL Server product team knew that this exact CAST was going to be done SO MUCH that they actually baked it into the engine so that it would be "seekable". Do note that I have just returned from a short trip to Dubai so my brain is not necessarily trust-worthy right now. :w00t:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Orlando Colamatteo

    SSC Guru

    Points: 182269

    I get different plans but indeed see an index seek on both. One item to be careful of with your recommendation regarding the logic:

    -- What you should suggest

    SELECT SomeID,

    CreatedDate

    FROM SomeHistory

    WHERE CreatedDate >= '2/1/2016'

    AND CreatedDate < '2/9/2016' -- up to, but not including 2/9

    ORDER BY CreatedDate desc;

    Testbed:

    USE tempdb;

    DROP TABLE SomeHistory

    CREATE TABLE SomeHistory (SomeID int, CreatedDate datetime);

    INSERT INTO SomeHistory select top 100000 c1.object_id, DATEADD(DAY,CHECKSUM(NEWID()) % (15* 365),c2.create_date) from sys.all_columns c1 cross join sys.tables c2 cross join sys.all_columns c3;

    CREATE NONCLUSTERED INDEX ix1 ON SomeHistory(CreatedDate)

    __________________________________________________________________________________________________
    There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

  • Jacob Wilkins

    One Orange Chip

    Points: 27856

    While it's quite cool that there are CASTs that still allow seeks, in this case you would still want to look into avoiding the CAST.

    As discussed at http://dba.stackexchange.com/questions/34047/cast-to-date-is-sargable-but-is-it-a-good-idea, there are still things about that plan that are suboptimal.

    For one thing, the estimate for the number of rows coming out of the seek can be quite a bit more inaccurate when you do the CAST and force the dynamic seek pattern.

    The performance of the version with the CAST is also not quite as good, so I avoid it if at all possible.

    To illustrate:

    CREATE TABLE test_dates (some_datetime datetime);

    WITH n1(x) AS (SELECT 1 UNION ALL SELECT 1), -- 2 rows

    n2(x) AS (SELECT 1 FROM n1 a, n1 b), -- 4 rows

    n3(x) AS (SELECT 1 FROM n2 a, n2 b), -- 16 rows

    n4(x) AS (SELECT 1 FROM n3 a, n3 b), -- 256 rows

    n5(x) AS (SELECT 1 FROM n4 a, n4 b), -- 65536 rows

    tally(x) AS (SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM n5)

    INSERT INTO test_dates

    SELECT DATEADD(minute,x,'20130101')

    FROM tally;

    CREATE CLUSTERED INDEX CI_some_datetime ON test_dates (some_datetime);

    GO

    CREATE PROCEDURE test_no_cast

    AS

    DECLARE @datetime_bucket as datetime

    SELECT @datetime_bucket=some_datetime

    FROM test_dates

    WHERE some_datetime BETWEEN '20130101' AND '20130103';

    GO

    CREATE PROCEDURE test_sargable_cast

    AS

    DECLARE @datetime_bucket as datetime

    SELECT @datetime_bucket=some_datetime

    FROM test_dates

    WHERE CAST(some_datetime AS DATE) BETWEEN '20130101' AND '20130102';

    GO

    EXEC test_no_cast

    GO 1000

    EXEC test_sargable_cast

    GO 1000

    SELECT proc_name=object_name(object_id, database_id),

    total_logical_reads,

    total_worker_time,

    total_elapsed_time

    FROM sys.dm_exec_procedure_stats ps

    WHERE object_name(object_id, database_id) LIKE 'test%';

    DROP PROCEDURE test_no_cast, test_sargable_cast;

    DROP TABLE test_dates;

    On my instance the actual number of rows returned is 2880 for the non-CAST version and 2879 for the SARGable CAST (the two are not logically the same because the non-CAST version will return midnight of the upper boundary).

    The estimated rows for the non-CAST version is 2879.71, acceptably close 🙂

    The estimated rows for the SARGable CAST is 1439.86, rather off, especially if that under-estimate gets propagated through the rest of a more complex plan.

    Here are the performance results on my lab machine:

    proc_name total_logical_reads total_worker_time total_elapsed_time

    ------------------------------ -------------------- -------------------- --------------------

    test_sargable_cast 9000 994330 995330

    test_no_cast 9000 491606 492603

    While in absolute terms that's not a crushing difference for 1000 executions, it's a pretty significant percentage difference, and there's really no point conceding that performance just to use a CAST that happens to be able to use a seek, however nifty that might seem 🙂

    Especially when combined with the potential cardinality estimation problems, I'd avoid utilizing these SARGable CASTs unless there's a really good reason to use them.

    Cheers!

  • MadAdmin

    SSChampion

    Points: 11260

    Apparently there is a compatibility matrix which lets the engine know if the cast preserves the order of the data between the 2 data types.

    So Cast ( intcol as small int) and cast( datecol as datetime) will use an index, but I dont think it will log a missing index if one does not exist.(Could be mistaken).

    Only cast uses it as far as I know, and not convert.

    I would use it because it is more readable to do the below

    cast(startdatetime as date) = cast(getdate() as date)

    than the whole startdatetime between (dateadd(datepart(())) and dateadd(datepart blah blah -3 millesecond+1day) contrivance to get all events for today.

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • Orlando Colamatteo

    SSC Guru

    Points: 182269

    MadAdmin (2/9/2016)


    Apparently there is a compatibility matrix which lets the engine know if the cast preserves the order of the data between the 2 data types.

    So Cast ( intcol as small int) and cast( datecol as datetime) will use an index, but I dont think it will log a missing index if one does not exist.(Could be mistaken).

    Only cast uses it as far as I know, and not convert.

    Yet another reason to use the ANSI-SQL CAST instead of the proprietary CONVERT by default.

    I would use it because it is more readable to do the below

    cast(startdatetime as date) = cast(getdate() as date)

    than the whole startdatetime between (dateadd(datepart(())) and dateadd(datepart blah blah -3 millesecond+1day) contrivance to get all events for today.

    For performance reasons this query form would be preferable:

    startdatetime >= cast(cast(getdate() as date) as datetime)

    and startdatetime < cast(cast(dateadd(day,1,getdate()) as date) as datetime)

    __________________________________________________________________________________________________
    There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

  • Jacob Wilkins

    One Orange Chip

    Points: 27856

    To back up what Orlando said, here are some performance metrics from my machine using the test harness from above (just modified to have datetimes today, and with the procedures modified to use the new code patterns):

    proc_name total_logical_reads total_worker_time total_elapsed_time

    ---------------- -------------------- -------------------- --------------------

    test_MadAdmin 12000 637796 639004

    test_Orlando 9000 329322 330552

    We see the same cardinality estimation issue I mentioned in my previous post as well (only much worse in this particular case), with the version that doesn't CAST the column correctly estimating 1440 rows, while the version that CASTs the column estimates 1 row, and returns the same 1440 (but at least it uses a seek! :hehe:).

    For me, the very slight increase in simplicity of the code is not worth all those performance worries, but if the goal is to make the code as easily readable as possible, and you know for sure that the code will never, ever be run very frequently or as part of a more complex query, then maybe I would understand.

    Still, the increase in readability is oh so slight, and I have never yet felt confident of assurances that the code pattern would never be used frequently or in more complex queries 🙂

    For those interested, the modified test script for this case is attached as a text file (the site's being finicky about letting me put the code in the post again).

    Cheers!

  • SoCal_DBD

    Hall of Fame

    Points: 3051

    Wow! Thank you or the great information everyone! I will read up on the link Jacob Wilkins provided as well.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply