Natural Born Killers (The series) – Day 6 Sargability

To those that don’t know me “Natural Born Killers (The Series)” sounds like a very strange choice for a SQL Server blog title and to be fair it is. Those who do know me, will know that I have been presenting a session this year titled “Natural Born Killers, performance issues to avoid” around the UK. The “Series” element of the title is there because I have decided to serialise the session for the benefit of those who have been unable to see it thus far.

The plan is to release one topic from the presentation each day of SQL Relay 2013. Today is the sixth day of SQL Relay, this leg is being held in Southampton, unfortunately I won’t be able to make this leg, but there are some great sessions lined up.

What is Sargability?

As a SQL Server professional you are probably aware of the concept of Sargability but may not be aware of the term. That was certainly my experience anyway.

I first heard the term coined by Rob Farley (Blog | Twitter) when he did a presentation called “Understanding SARGability” for 24hours of PASS back in 2010. If you didn’t see it then you can find a recording of the same presentation that he did for SQLBits.

Sargability is a covering phrase for the abbreviation SARGable which in turn came from “Search Argument Able”

What this means is that you can satisfy the query by using an index, unfortunately there are times when you would expect the optimizer to choose an index only for it not to. The reason is that “SARGability” has been broken.

Demo

It’s quite common to want to check for something at the beginning of a string and as consummate professionals you have probably tried to index that attribute. This is a quick demo you can try on a test instance, remember to turn on show actual execution plan and then run the following code:

USE AdventureWorks2012;
go
/*
SARGability demo
*/SET STATISTICS IO ON;
SET STATISTICS TIME ON;
GO

IF EXISTS
(SELECT * FROM sys.indexes SI
WHERE SI.object_id=OBJECT_ID(‘Person.Person’) And SI.name=’IX_WebDemo_LastName’)
DROP INDEX Person.Person.IX_WebDemo_LastName

SELECT
BusinessEntityID, LastName
FROM Person.Person — WITH (INDEX = 1)
WHERE LEFT(LastName,2) = ‘De’;
GO

When we look at the information returned from SET STATISTICS IO in the messages tab we see the following:

From this we see that there were 117 rows returned yet 108 pages were touched. This seems a lot as the rows are not that wide. Let’s have a look at the plan to see what was happening.

The plan shows that it thought that the index “IX_Person_LastName_FirstName_MiddleName” would be best, however it is performing a scan not a seek. Let’s take a quick look at the properties of the index to see if it’s a strange naming convention because it looks like LastName is the leading attribute and that “should” result in a seek in this scenario. Right?

Hmm, LastName is marked as the leading attribute, that should be a seek. Something is definitely not right here.Let’s take a closer look at the properties of the scan in the execution plan.

Just so you don’t miss it, I’ve put a big red box around the interesting part. Instead of seeing the LEFT function in the predicate you will note that the engine is actually using the SUBSTRING function. Interesting eh! I’m making a bit of an assumption here and am going to say that because it is using a SUBSTRING it is deemed not to be deterministic as it could start anywhere within the string, not what you were expecting I’m sure.

So how do we get around this? We are after all using the left most values of an attribute which just happens to be the leading attribute of an index, we should be seeing seeks, not scans here. Unfortunately, we are going to have to refactor our code. In this example there are actually two ways that we can around this.

SELECT
BusinessEntityID, LastName
FROM Person.Person
WHERE LastName &gt;= 'De'
AND LastName &lt; 'Df';
GO

SELECT
BusinessEntityID, LastName
FROM Person.Person
WHERE LastName like 'De%';
GO

If you run the above code you will see the following STATISTICS IO information:

This seems much better, in both versions of the code we are now seeing only 4 logical reads instead of the 108 from our original example. If this query were to run millions of times a day we would be saving a bucketload of I/O. I’m pretty confident that we are now using a seek rather than a scan, stranger things have happened though so we should check the query plan…

Please note I have edited this to show properties of the index seek operators side by side. We can see from the above image that both are using a similar methodology for the predicate which is allowing SARGability. What’s really interesting is that the estimated number of rows for each operator is different.

There are of course many other occurrences of this kind of issue which would make a single blog post far too long, so I do urge you to check out the video I mentioned earlier in the post as you can stream or download it for free.