SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Function Calls are Evil Incarnate

The following queries look extremely innocent. However, they are all application killers once the target table grows to more than a couple thousand rows:

Select Count(*) From Person_Data Where Upper(Name_Last) = ‘SMITH’

Select Avg(Collar_Size) From Person_Data Where dbo.MyFunc(Person_Data_ID) > 500

For each of these statements, a table scan will be used to try and fetch data. A table scan is simply an operation where the database looks at each and every record in the table to see which ones meet its criteria in the Where clause. Since any unneeded I/O is wicked bad, forcing a table scan is wicked bad multiplied by the number of times it does it. A table scan, therefore, is equivalent to evil incarnate.

We would much rather the database use an index and only fetch the 20 records it needs rather than reading 200,000 records in a table scan fto locate the 20 records it needs. However, the use of the functions (Upper(), MyFunc(), MyOtherFunc(), etc) keep it from doing that.

Why do functions in the WHERE clause keep the database from using indexes? Simply put, it is because the database does not know the result of the function. To abuse my standard analogy, this would be like being the super-secret agent in the warehouse using a magic decoder digital watch. The agent would have to open each individual file, key in some information found in the file into the watch and see if the resulting answer meets the criteria.

Each time MyFunc() (a.k.a. your slick little digital watch) is used it could come back with a different result since the function could potentially be based on the input value, the time of day, or data that is being modified elsewhere in the database. The only way the database can know for certain the answer to the function call is to make it and see what the answer is. Since it doesn’t know ahead of time what the answer is, it cannot use any existing indexes (er, indices) to go to the corresponding entry.

This isn’t really a problem if the function call is in the Select, only when it’s in the WHERE or other key clauses (see below). The Select is safe since the function call is only made for records where the criteria is already met, hence it’s only called for the 20 matches and NOT for the 2.78 billion records that don’t match.

So, how do we get around using functions in a WHERE clause?

1) In the case of UPPER(), I suggest you don’t even bother. In SQL Server, by default, string handling is case-insensitive anyway. Thus, WHERE Name_Last = ‘Smith’ will match regardless of the casing in the database.
2) There are times where pre-calculating the result of the function call and placing it is a variable and the using the variable in the WHERE clause will save you time. This technique comes into play when you are using a function that SQL Server views as NONDETERMINISTIC, however, we being rather smarter know that it can be considered DETERMINISTIC (see below for a little discussion on Deterministic vs. Non-deterministic) A common example of this is a function call that accepts a Product Code and returns a Product ID. The same ID would always come back for a given code, but since there is a table call involved, SQL Server has to assume the function to be Non-deterministic. Pre-fetching this result into a variable solves the problem entirely.
3) If it’s a common request, or part of a “must have fast results” search screen, etc., then consider trading off some disk space to increase speed. You can store the raw data in the table and create a second column that contains the results of the function call. Simply index the column with the results of the function call. For an overly simplified example, if you have Price and Quantity in the table, and your function calculates Total Price, you could simply store the Total Proc in the record and index it. There are other techniques here, such as computed columns, indexed views, and so on, but you’ll definitely want to get a seasoned DBA in the loop before tackling such methods.
4) Tell the user to wait! If you just can’t get around using a function, then so be it. However, be courteous enough to let the user know that the search could take a while. This gives them the opportunity to flirt with co-workers, take a nap, or other productive uses of company time.

A couple of disclaimers:
1) There are some circumstances where using functions is OK. These have to do with whether or not a function is considered DETERMINISTIC. In other words, if it’s a guaranteed that a function will return the same answer given the same input, then the function is DETERMINITIC. In such a case, SQL Server can anticipate what the answer is without actually making the call, thus indexes can be used. For further discussion on DETERMINISTIC functions, see http://msdn.microsoft.com/en-us/library/aa214775(SQL.80).aspx.
2) So why have functions if they are so bad? Functions in a Select clause can be very useful! There is nothing evil about this! 3) It's not just the WHERE clause you have to watch out for. FROM (think Join clauses), GROUP BY, ORDER BY, and HAVING clauses are equally evil places to toss in a function. Buyer beware!

Want more? Please see my other articles in this Performance Tuning Series.


Posted by Wesley Brown on 30 June 2010

"SQL Server, by default, string handling is case-insensitive"

While true, can be misleading. For instance, if you install some third party applications they may use a case sensitive collation and require you to do fun stuff like UPPER(). If it isn't needed in the query but is consumed by an application I try and move stuff like that to the application layer.

Posted by Phil Factor on 3 July 2010

C'mon. I realise it's only an example, but instead of the UPPER, you'd surely just specify the collation in the string comparison and all is sweetness and light. Never, ever assume the collation. There is always some loony who sets the collation for the database to case-sensitive. (Incidentally, am I right in assuming that accent-sensitivity in European languages causes problems?)

Posted by lrutkowski on 3 July 2010

I have to disagree a little.  I know there is a myth that table scans are bad, but not always.  i.e.  If you have to hit each row in a table, for whatever reason, you don't want it hitting an index, and then the table.  1 scan through the table is much better than an index io, and then a table io.  But your right about functions, especially when the functions are doing table lookups.

Posted by peter.town on 4 July 2010

Table scans are sometimes OK with very small tables/databases, but an absolute killer in large DB's.

Personally I would avoid them if at all possible, and will never not have an index on any table even if the index wizard reckons an index is not needed.

Posted by michael-blackburn on 6 July 2010

Phil, I think the point was that he had a function in the WHERE clause. And, in a way, the UPPER() function was upholding your dictum not to assume the collation. It is a classic example of doing the right thing the wrong way, and a good illustration of why it's wrong: it forces a table scan.

Posted by michael-blackburn on 6 July 2010

Rereading the text, Phil, your point is well taken. Instead of "not bothering" you should specify the collation.

Posted by Ben Vegiard on 7 July 2010

Agreed.  The “Don’t Bother” solution is only viable if you double-check that your collation is case-insensitive.  I used that example as it’s a common error I see programmer turned databasers make and it’s easy to get your head around.  As with anything, you still have to do due diligence before calling this a solution.

Leave a Comment

Please register or log in to leave a comment.