Improving Query Performance by using correct Search Arguments

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In today’s blog posting I want to talk about a very specific performance problem related to indexing in SQL Server.

The Problem

Imagine the following simple query, which you have already seen hundreds of times in your SQL Server life:

-- Results in an Index Scan
SELECT * FROM Sales.SalesOrderHeader
WHERE YEAR(OrderDate) = 2005 AND MONTH(OrderDate) = 7
GO

With that simple query, we request sales information for a specific month in a specific year. Not very complicated. Unfortunately that query doesn’t perform very well – even with a Non-Clustered Index on the column OrderDate. When you look at the execution plan, you can see that the Query Optimizer has chosen the Non-Clustered Index that is defined on the column OrderDate, but unfortunately SQL Server performs a complete Scan of the index, instead of performing an efficient Seek operation.

Non Clustered Index Scan

This isn’t really a limitation of SQL Server, this is the way how relational databases are working and thinking . As soon as you are applying an expression (function calls, calculations) on an indexed column (a so-called Search Argument), the database engine HAS TO SCAN that index, instead of performing a seek operation.

The Solution

To get a scalable seek operation in the execution plan, you have to rewrite your query in a way to avoid the call of the function DATEPART:

-- Results in an Index Seek
SELECT * FROM Sales.SalesOrderHeader
WHERE OrderDate >= '20050701' AND OrderDate < '20050801'
GO

As you can see from the rewritten query, the query returns the same result, but we have just eliminated the function call of DATEPART. When you look at the execution plan, you can see that SQL Server performs a seek operation – in that specific case it is a so-called Partial Range Scan: SQL Server seeks to the first value, and scan until he hits the last value of the requested range. If you have to call functions in the context of indexed columns, you should always make sure that these function calls are performed on the right hand side of your column in the query. Let’s have a look at a concrete example. The following query casts the indexed column CreditCardID to the data type CHAR(4):

-- Results in an Index Scan
SELECT * FROM Sales.SalesOrderHeader
WHERE CAST(CreditCardID AS CHAR(4)) = '1347'
GO

When you have a more detailed look on the execution, you can see that SQL Server scans again the whole Non-Clustered Index. Not really scalable if your table gets larger and larger. If you are performing that conversation on the right hand side of your indexed column in the query, you can again eliminate the function call on the indexed column, and SQL Server is able to perform a seek operation:

-- Results in an Index Seek
SELECT * FROM Sales.SalesOrderHeader
WHERE CreditCardID = CAST('1347' AS INT)
GO

Another nice example where you can run into the same problems is the use of the CASE expression in SQL Server. Let’s have a look at the following query, where the column PersonType is indexed through a Non-Clustered Index:

-- Results in an Index Scan
SELECT
CASE PersonType
WHEN 'IN' THEN 1
WHEN 'EM' THEN 2
WHEN 'SP' THEN 3
WHEN 'SC' THEN 4
WHEN 'VC' THEN 5
WHEN 'GC' THEN 6
END AS ConvertedPersonType
FROM Person.Person
GO

You are calling here again a function on an indexed column, in our case the CASE expression. You can’t directly see that function call in the T-SQL query, but internally it’s nothing more than a function call. You are getting again a Scan of the Non-Clustered Index instead of a Seek operation. How can you avoid that problem? You can rewrite the query to get rid of the CASE expression. One example is a join against a Common Table Expression, which stores the needed lookup values. Let’s have a look on the rewritten query:

-- Results in an Index Seek
WITH LookupCTE AS
(
SELECT * FROM
(
VALUES
(N'IN', 1),
(N'EM', 2),
(N'SP', 3),
(N'SC', 4),
(N'VC', 5),
(N'GC', 6)
) tbl (PersonType, Value)
)
SELECT cte.Value FROM Person.Person p
INNER JOIN LookupCTE cte ON cte.PersonType = p.PersonType
GO

With that approach, SQL Server is able to perform a seek operation on the indexed column PersonType.

Summary

As you have seen in this blog posting, it is very important that you are NOT calling any function directly or indirectly on your indexed columns. Otherwise SQL Server has to scan your index, instead of performing an efficient seek operation. And scans will never ever scale, when your table gets more and more rows.

Please feel free to leave a comment, if you want to share other good examples where you have encountered this specific behavior.

Thanks for reading!

-Klaus

Improving Query Performance by using correct Search Arguments

The Problem

The Solution

Summary

Rate

Share

Share

Rate

Improving Query Performance by using correct Search Arguments

The Problem

The Solution

Summary

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts