The Cost of Function Use In A Where Clause

  • I did some tests on the use of IsNull in the Where clause and found the following:

    If the column being tested allows nulls, it will use an index or table scan (depending on whether there's an index available, of course), even if there are no nulls in the column. If the column does not allow nulls, SQL server ignores the isnull function and uses an index seek.

    This made me wonder if a check constraint on a column would have similar behavior.

    create table dbo.CKTest (

    Date datetime primary key)

    go

    insert into dbo.CKTest (date)

    select distinct date

    from common.dbo.dates

    where datepart(year, date) = 2007

    Then I ran:

    select date

    from dbo.CKTest

    where datepart(year, date) = 2007

    Got a clustered index scan (of course).

    Then added:

    alter table dbo.CKTest

    add constraint CK_Date_2007 check(datepart(year, date) = 2007)

    Ran the same select. Still a clustered index scan. Not a seek.

    So there's a slightly different mechanism in place for making SQL ignore IsNull vs DatePart, even when it could know before-hand what the result is going to be.

    Good to know on the IsNull thing. Would never have thought of checking that, since I'd never run IsNull on a non-nullable column. Not sure what practical use it is, but definitely interesting to know.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • ruben ruvalcaba (2/29/2008)


    Hi,

    First of all thank you, it's a good point to be considered. In fact, I use functions on the WHERE clause very often, specially when filtering dates ... how do you optimize a query like this:

    SELECT SUM(Amount) FROM myEntries WHERE YEAR(myEntries.myDate) = @Year AND MONTH(myEntries.myDate) <= @Month

    Any clue? thank you

    regards

    Not positive nobody's replied yet... and just doing this off the cuff,...

    Assuming @year and @month are parms...

    Instead of @year-- declare @gedate and init as jan 1 of whatever @year is...

    Instead of @month, declare @ltdate and init as the first of the month AFTER @month...

    SO let's say you had @year=2008 and @month = 3

    Set @gedate=1/1/2008 and @ltdate=4/1/2008. (easily derived from old parms)

    Your WHERE turns to:

    ...WHERE myentries.mydate >= @gtdate AND myentries.mydate < @ltdate

    EDIT

    Ahh... as it turns out, I am third or fourth on the "response" list to this one. One thing I do prefer (and I *thought* I'd benchmarked it to confirm it causes no additional overhead) is to use

    ...WHERE foo >= @gedate and foo < @ltdate

    rather than

    ...WHERE foo BETWEEN (@d1 and @d2) -- or is that "IS BETWEEN"; I never use this construct anyway...

    I like to use the appropriately named GEdate (greater or equal) and LTdate (lessthan) so I don't get messed up based on the time portion of a (small)datetime. In environments I've worked in, sometimes we will zero out the time portion of a (small)datetime-- (Expiration Dates); in others (Phone Call Taken) we do not. If I write a routine I can use the logic above without worrying about whether a date is 3/3/2008 00:00:00 or 3/3/2008 23:59:59.890


    Cursors are useful if you don't know SQL

  • Greg, a very nice and useful article.I understand the demerit of using functions in 'Where' clause.

    Do/does you/anybody have any idea/s on usage of function in 'Select' clause OR in 'JOINS' (User defined tabular functions)on the performance of a query?

    Thanks in advance

    Rishi

  • It's certainly no Panacea, but the following article has some insight as to why you can have performance problems with some UDF's in the SELECT list...

    http://www.sqlservercentral.com/articles/Test+Data/61572/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • function use in the SELECT List and or on the JOIN can also cause severe problems (maybe even worse then in the WHERE Clause).

    by placing a Function in the select, that function will execute 1x per row.

    We had these at my work also when I first got there. They were causing terrible performance system wide.

    Removing them all helped immensely. typically you can replace the functions with some other mechanism of retrieving the data needed (Derived tables, correlated subQueries, etc)

    you really need to pay close attention to the Query Plan and specifically to the SubTree cost of the plan. If the SubTree cost approaches 10.0 (in my experience), you need to consider modifying the query.

    Cheers,

    GAJ

    Gregory A Jackson MBA, CSM

  • GregoryAJackson (3/4/2008)


    function use in the SELECT List and or on the JOIN can also cause severe problems (maybe even worse then in the WHERE Clause).

    by placing a Function in the select, that function will execute 1x per row.

    We had these at my work also when I first got there. They were causing terrible performance system wide.

    Removing them all helped immensely. typically you can replace the functions with some other mechanism of retrieving the data needed (Derived tables, correlated subQueries, etc)

    you really need to pay close attention to the Query Plan and specifically to the SubTree cost of the plan. If the SubTree cost approaches 10.0 (in my experience), you need to consider modifying the query.

    Cheers,

    GAJ

    Actually - there was a thread recently on this - and that is by no means true in general. Meaning - there are lots of exceptions. Simple scalar functions, especially deterministic ones, tend to be treated as scalar values that simply replicate on down the line.

    It's largely a matter of what goes in (inputs), whether the inputs are based on the row, whether the function is in effect deterministic AND is marked correctly as such (as I came to find out).

    Now - if you were using functions to mask a CSQ - then yes - that can be a terrible drag on your performance, but still - it depends a LOT on what the function is doing.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Carl Federl and I did some pretty good testing a couple of years back and it very much supports what Matt says... it depends on the function, how it's written, how it's used, where it's used, etc. Carl and I found out that some functions actually operated quicker than the equivalent inline code while others cause huge drags... and everything in between. It also depends on whether or not parallelism is spawned and for what reason.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ahh....good catch.

    I was Assuming that the function used in the SELECT depended on the Row (as in a correlated subQuery).

    Yes it very much depends on the function, what it's doing, how it's used, etc.

    I needed to be more clear.

    GAJ

    Gregory A Jackson MBA, CSM

  • If you want to check out the thread I was referring to - it's over here:

    http://www.sqlservercentral.com/Forums/Topic454846-371-2.aspx#bm455633

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks a bunch Matt,

    I definately will

    GAJ

    Gregory A Jackson MBA, CSM

  • Thanks GAJ, Matt and Jeff for the valuable information on functions usage.

    Rishi

  • You bet, Rishi... thank you for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Rishi,

    You're very welcome.

    I never imagined this trivial little article would have generated so much interest and such a vigorous discussion thread.

    I'm very glad that SQLServerCentral published the article now.

    cheers,

    Gregory A Jackson MBA, CSM

  • Rishi -

    Glad it's been useful. I learned a lot from it for myself...

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • GregoryAJackson (3/5/2008)


    Rishi,

    You're very welcome.

    I never imagined this trivial little article would have generated so much interest and such a vigorous discussion thread.

    I'm very glad that SQLServerCentral published the article now.

    cheers,

    Apprently, it wasn't so trivial... Thanks for taking the time to write it, Greg.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 76 through 90 (of 98 total)

You must be logged in to reply to this topic. Login to reply