The Cost of Function Use In A Where Clause

A common mistake made in writing SQL statements is to wrap filtering columns of a WHERE clause inside of a function. SQL server performance is very dependant on SQL's ability to properly use indexes when it retrieves records. The lifeblood of proper query plan creation is using very deterministic filtering logic on indexed columns (Some refer to this as Sargable Where clauses).

The biggest problem we see is when a filtering column is wrapped inside of a function, the query optimizer does not see the column and if an index exists on that column, the index likely will not be used (of course, just like anything else in our hi-tech world, this is not always the case).

Take this example

In the CadencedEventCustomer table, we have a nonclustered index on FullName1. The simple query below should use this index and should perform an index seek to find the qualifying records instantly (Seek = FAST).

Select
   *
FROM
   CadencedEventCustomer
WHERE
   isNull(FullName1,'') = 'Ed Jones'

Our intention here (albeit incorrect) is to correctly handle records in which the FullName1 field is null. Our concern is that Null records will be missed (more on this later).

Since we are wrapping the FullName1 column in the IsNull function, the query optimizer doesn't see it and the index is not used.

Here is the query plan produced by SQL Server's query engine:

|--Index Scan(OBJECT:([CIPv3].[dbo].[CadencedEventCustomer].[nci_CadencedEventCustomer_FullName1]), WHERE:(isnull([CIPv3].[dbo].[CadencedEventCustomer].[FullName1],'bob')='Ed Jones'))

We see an Index Scan on the FullName1 index (the index exists, but a seek cannot be performed due to the function wrapping the indexed column). A Scan means the entire table is being searched from beginning to end until SQL happens to find all the records satisfying the condition. On a table with millions of records, this obviously takes a long time.

The subtree cost of this query is 10.58. Subtree cost is a relative cost to SQL Server indicating how difficult this query is for SQL Server to retrieve the data. The higher the relative cost, the more of a hit on performance (CPU, IO, RAM, etc). From my experience, anything in the double digit range is getting pretty spendy and will not perform well in a production environment. In this case, the cost is attributed to the inability of the optimizer to use an index which means SQL Server is having to do a full table scan to return all the data for the query result...OUCH.

If we remove the IsNull() function our query looks like this:

Select
   *
FROM
   CadencedEventCustomer
WHERE
  FullName1 = 'Ed Jones'

This is a much cleaner WHERE clause and SQL Server will now see the indexed column and will use the index correctly:

|--Index Seek(OBJECT:([CIPv3].[dbo].[CadencedEventCustomer].[nci_CadencedEventCustomer_FullName1]), SEEK:([CIPv3].[dbo].[CadencedEventCustomer].[FullName1]='Ed Jones') ORDERED FORWARD)

Notice we now have a SEEK (Fast). The query subtree cost is now at 0.0093 !

Keep in mind, this behavior is true for ALL Functions (Not just IsNull). We should try very hard to not wrap filtering WHERE clause columns within a function.

Now....as for IsNull. The reason we are using IsNull() in the first place is to return records where the value is null in the column (I recommend not allowing nulls in the database, but that topic is for another article).

However, if we look closely at this WHERE clause, we can immediately see that the net result of our query (in this case) is identical whether we use the IsNull function or not.

Select
   *
FROM
   CadencedEventCustomer
WHERE
   isNull(FullName1,'') = 'Ed Jones'

What this query says is return all records where the FullName1 = 'Ed Jones' and IF the FullName1 column in the database holds a Null Value assume that value is '' (an empty string instead of a Null value).

So....Null values will be replaced with '' which still does NOT Equal 'Ed Jones'. The net effect is that records with NULL will NOT be returned.

So if we change our query to this:

Select
  *
FROM
  CadencedEventCustomer
WHERE
   FullName1 = 'Ed Jones'

We would get exactly the same result.

So the question is, when DO we need IsNull?

Well....if the IsNull function is replacing nulls with a value that satisfies the comparison and returns TRUE, then the IsNull function is more useful (still not needed as we'll see below).

Take this example:

Select
   *
FROM
   CadencedEventCustomer
WHERE
   isNull(FullName1,'Ed Jones') = 'Ed Jones'

In this case, if we find Null values in the FullName1 column, we will replace them with 'Ed Jones'. Notice that 'Ed Jones' satisfies our comparison and will return true. So in a sense, this logic is forcing the Query to return Null values instead of filtering them out. In this case, we can argue that the IsNull function is useful.

Now, having said that, although the IsNull() logic is now useful, the problem is that the column is being hidden from the optimizer and we lose the index relief provided by the index.

We can rewrite this SQL statement to get the same result AND to still get index relief for the optimizer.

Essentially what we want is this:

Select All CadencedEventCustomers Where the FullName1 is 'Ed Jones' OR where the FullName1 is a null value.

So the correct rewrite to still obtain index relief is as follows:

Select
   *
FROM
   CadencedEventCustomer
WHERE
   ((FullName1 = 'Ed Jones') OR (FullName1 IS NULL))

Now we get the best of both worlds. We get the correct logic producing the results we want and SQL servers query engine can see the column so we get index relief.

A few examples other than IsNull()

I often see the use of Substring() and other string functions used in WHERE Clauses.

Take the following

WHERE SUBSTRING(MasterDealer.Name,4) = 'Ford'

This is effectively hiding the MasterDealer.Name from the optimizer and is negating the use of any index on the Name column.

The fix

WHERE MasterDealerName Like 'Ford%'

This produces the exact same result, yet also provides index relief.

Another example working with dates

WHERE DateDiff(mm,PlacedOnQueue,GetDate()) >= 30

Again, this is a common technique used when working with DateTime data. This is, again, hiding the PlacedOnQueue column from the optimizer.

The fix

WHERE PlacedOnQueue < DateAdd(mm,-30,GetDate())

The exact same results, yet this time we get index relief.

Keep in mind, this behavior holds true for all functions wrapping columns in your WHERE clause. So remember anytime you feel a need to wrap a WHERE clause filtering column within a function, try really hard of a way to rewrite the statement without function use. If you need help rewriting these types of queries, please feel free to email me.

The Cost of Function Use In A Where Clause