Case Sensitivity in Selects - Part 1

,

If you don't have to deal with case sensitivity, count yourself lucky! SQL

2000 is much better than SQL 7 with regard for collations, supporting collations

at database, table, and column level. In this article I'm going to delve into a

interesting (or obvious?) item I noticed recently when I did some performance

tuning.

Basically we were doing a join to a table where we had to treat one column as

case sensitive. The table in question had about 130k rows and had an index (not

unique) on the column we were joining to. Not a huge table, a join to it should

be reasonably fast, yet I saw a high number of reads. Digging into the code -

which I sadly admit to writing - I found that we were handling the issue of case

sensitivity by converting the column to varbinary, a technique that works in

SQL7 or SQL 2000. An example written using a different table and simplified

syntax is shown here:

select * from instantforum_members where convert(varbinary(50), fullname) = 'andy warren'

If you're using SQL2K, you can accomplish the same thing or even better

depending on your needs by using a collation instead, like this:

select * from instantforum_members where fullname='andy warren' collate Latin1_General_CS_AI

Before we go on, let's start with what a simpler non case sensitive query

would look like and do:

select * from instantforum_members where fullname='andy warren'

You can see that just doing a simple select for my name results in a plan

that starts with an index seek, which is good, and has a cost of .003283. The

cost is relative, but in practical terms this is a very fast efficient query.

Now let's try the method I was using to handle sensitivity:

Here you can see the percentages shift as it does an index scan rather than

an index seek, no doubt caused by the convert statement which requires it to

touch every row. What's far more interesting is that the query cost jumps to

.508702, a huge jump! I ran the statement that used collate, not really

expecting a difference but you have to try, resulting in the exact same query

plan. That seems to make sense, it has to do the same work. Being curious, I

tried one more query specifying the collation as case insensitive and wound up

with the same results. Double checking the database found that I wasn't

specifying the correct & current collation which was

SQL_Latin1_General_CP1_CI_AS. Running one more time resulted in the same results

that I got in the first query with no collation specified.

Sorry if that is confusing. The net is that if you specify any collation

except the one in use on that column or you use a convert you're going to force

SQL to check each row, most likely resulting in either an index scan or a table

scan.

That got me thinking about how I could improve the performance without

changing the collation (an article for another day!). My thought process was

that the query itself was quick, the case sensitive compare slow only because it

had to look at every row, how to limit the query first and then do the compare.

For once it turned out to be exactly that easy, resulting in the following:

select * from instantforum_members where fullname = 'andy warren' and convert(varbinary(50), fullname) = 'andy warren'

Just that simple change gave me the results I needed at basically zero cost.

I've only tested on two tables so I can't say with certainty that the results I

show here are always true, but so far it seems to be consistent. It also shows

how even the simplest change can have an impact on query efficiency. In a follow

up article I'll look into others options including making the column case

sensitive.

Rate

5 (1)

Share

Share

Rate

5 (1)