The Cost of Function Use In A Where Clause

  • George Heinrich (2/28/2008)


    Great article! I'm glad I read it, but now i'm having trouble writing one of my queries...

    I need to join 2 tables on smalldatetime columns. One of the tables stores the date and time and the other table doesn't. So table A has '2008-02-27 10:32:31' and table B has '2008-02-27 00:00:00' and I need to join them on table A date = table B date (ignoring the time info).

    Normally I use:

    ...FROM

    TableA JOIN TableB

    ON Cast(Cast(TableA.DateColumn AS INT) AS Smalldatetime) =

    Cast(Cast(TableB.DateColumn AS INT) AS SmallDateTime)

    ...

    Will wrapping these columns in the JOIN clause also cause a degradation of performance like wrapping them in a where clause would?

    Is there a more optimized way of writing this?

    Thanks,

    George

    Yes, functions in the Join clause have the same effect as functions in the Where clause.

    What I generally recommend, if possible, in this case, is add calculated columns to the tables, like so:

    alter table TableA

    add DateColumnYear as datepart(year, datecolumn),

    DateColumnMonth as datepart(month, datecolumn),

    DateColumnDay as datepart(day, datecolumn)

    go

    create index IDX_TableA_DateParts on dbo.TableA (DateColumnDay, DateColumnMonth, DateColumnYear)

    Then do your join/where clauses on the calculated columns. Like:

    select ...column list...

    from dbo.TableA

    inner join dbo.TableB

    on TableA.DateColumnYear = TableB.DateColumnYear

    and TableA.DateColumnMonth = TableB.DateColumnMonth

    and TableA.DateColumnDay = TableB.DateColumnB

    Alternately, if you don't often select by year, month, day, but do often join on the whole date, you can add a calculated column:

    alter dbo.TableA

    add DateColumnDate as cast(convert(varchar(25), DateColumn, 102) as datetime)

    go

    create index IDX_TableA_DateColumnDate on dbo.TableA (DateColumnDate)

    Do that to both tables, then join on the calculated column.

    In my experience, this is very fast and efficient. Of course, it only applies if you are in a position to add calculated columns to tables.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • If you convert your @varDate local to a smalldatetime local before using it in the SELECT statement you can do a straight comparison of the local variable and the table column.

    INSTEAD OF:

    select * from table_name where dateDiff(d,@varDate,Created_Date) =0

    TRY THIS:

    DECLARE @sdtDate smalldatetime

    SELECT @sdtDate = cast(@varDate AS smalldatetime)

    SELECT * FROM table_name WHERE Created_Date = @sdtDate

  • select * from table_name where dateDiff(d,@varDate,Created_Date) =0

    It seems to me that this test simply wants to compare an element to a static variable to determine if they match (which is the case if the dateDiff result is 0). Why not convert the variable to the appropriate type and format to match Created_Date (this avoids having CONVERT called explicitly in the WHERE clause by the parser---very expensive) and compare the fields directly. No function wrapped around the real element, so no hiding of the index.

    Or is it too early for the caffeine to kick in and I'm missing something very obvious?

    Back to the original subject, very good article. I spent a some time myself testing the performance of function calls (intrinsic and UDF) and arrived at the same conclusions. And that is also where I ran into the explicit CONVERT function call in tracing the performance of a WHERE clause.

    ------------
    Buy the ticket, take the ride. -- Hunter S. Thompson

  • GSquared (2/28/2008)


    Good article.

    One small quibble with it. In one of the examples, you use "((FullName1 = 'Ed Jones') OR (FullName1 IS NULL))", as an example of how to get Index Seeks, instead of Scans.

    In many cases, the moment you put an "or" in a Where clause, you get a scan instead of a seek. Not always, but it is something to watch out for. The same applies for "in ()" (since that's just a way to shorthand multiple "or" statements).

    If you really HAVE to do the OR clause, I've found an approach like this works best, to avoid scans as much as possible. For example, if you have one of those stinking reports where the users have demanded 15 different parameters, but they don't know which one they'll use so you have to stack up NULL checks as well as column comparisons:

    SELECT...

    FROM....

    WHERE

    CASE WHEN @MyParam IS NULL THEN 1

    WHEN @MyParam = table1.Column1 THEN 1

    ELSE 0

    END = 1

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I'm curious to know if the 'ON' predicate will behave the same as the 'WHERE' statement in all cases described in the article?

  • Does the case not cause very similar performance problems, I remember having to rewrite an SP a little while ago to improve performance by removing the case from a where clause

  • SSC-Addicted,

    Thanks for the good idea. There are LOTS of issues with this database. I could gripe all day about inconsistencies and poor design, but I just keep on smiling every single day!! (Anyone have any prozac?)

    George

  • nice article - clean, simple and informative.

  • leea (2/28/2008)


    Does the case not cause very similar performance problems, I remember having to rewrite an SP a little while ago to improve performance by removing the case from a where clause

    Ah, but check out this case statement. It's actually not a function against columns. Instead it's a series of checks that result in a 1 or a 0. It uses indexes very well. I was shown this by a Microsoft consultant named Bill Sulcius. We called him the Magic Conch because we did whatever he told us to. It looks like it should be a RBAR process, but it actually performs these checks in a set-based manner.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hi Gregory,

    What is the comparative performance of the ISNULL() ... against the OR columnname is NULL

    Thanks for a great article!

  • Nicely done, Gregory Jackson. Outstanding intro to the pitfalls of doing calculations in the WHERE clause.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good Article. One comment that perhaps others want to expand upon. Null in an index should be avoided. -Thanks

  • David Jackson (2/28/2008)


    This looks complex but will use your index.

    where CreatedDate between dateadd(dd, datediff(dd,0,Getdate()),0) --midnight TODAY

    and dateAdd(ss,-1,dateAdd(dd,1,dateadd(dd, datediff(dd,0,Getdate()),0))) --23:59:59 TODAY

    HTH

    Dave J

    Edit: Oops! I see w.lengenfelder posted this already! :w00t:

    Even then, there is the possibility of something slipping in during that 1 second you've ignored. And, with the new date types coming out for 2k8, it becomes even more important to not ignore that 1 second. With that in mind, I recommend NOT using BETWEEN because of the inclusive nature of both endpoints... instead, allow me to suggest something like the following...

    SELECT SomeColList

    FROM dbo.SomeTable

    WHERE SomeDateCol >= DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)

    AND SomeDateCol < DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0) +1

    ... and it allows an index to be correctly used (SEEKs) if it is available and the rest of the query allows it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GSquared (2/28/2008)


    Alternately, if you don't often select by year, month, day, but do often join on the whole date, you can add a calculated column:

    alter dbo.TableA

    add DateColumnDate as cast(convert(varchar(25), DateColumn, 102) as datetime)

    go

    create index IDX_TableA_DateColumnDate on dbo.TableA (DateColumnDate)

    Do that to both tables, then join on the calculated column.

    In my experience, this is very fast and efficient. Of course, it only applies if you are in a position to add calculated columns to tables.

    I second the vote for that method! It works very well for whole day joins and other whole day calculations including aggregation...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Why not create an index computed column where you can provide this capability for routinely used functions.

Viewing 15 posts - 16 through 30 (of 98 total)

You must be logged in to reply to this topic. Login to reply