SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Case Sensitivity in Selects - Part 1

By Andy Warren,

If you don't have to deal with case sensitivity, count yourself lucky! SQL 2000 is much better than SQL 7 with regard for collations, supporting collations at database, table, and column level. In this article I'm going to delve into a interesting (or obvious?) item I noticed recently when I did some performance tuning.

Basically we were doing a join to a table where we had to treat one column as case sensitive. The table in question had about 130k rows and had an index (not unique) on the column we were joining to. Not a huge table, a join to it should be reasonably fast, yet I saw a high number of reads. Digging into the code - which I sadly admit to writing - I found that we were handling the issue of case sensitivity by converting the column to varbinary, a technique that works in SQL7 or SQL 2000. An example written using a different table and simplified syntax is shown here:

select * from instantforum_members where convert(varbinary(50), fullname) = 'andy warren'

If you're using SQL2K, you can accomplish the same thing or even better depending on your needs by using a collation instead, like this:

select * from instantforum_members where fullname='andy warren' collate Latin1_General_CS_AI

Before we go on, let's start with what a simpler non case sensitive query would look like and do:

select * from instantforum_members where fullname='andy warren'

You can see that just doing a simple select for my name results in a plan that starts with an index seek, which is good, and has a cost of .003283. The cost is relative, but in practical terms this is a very fast efficient query. Now let's try the method I was using to handle sensitivity:

Here you can see the percentages shift as it does an index scan rather than an index seek, no doubt caused by the convert statement which requires it to touch every row. What's far more interesting is that the query cost jumps to .508702, a huge jump! I ran the statement that used collate, not really expecting a difference but you have to try, resulting in the exact same query plan. That seems to make sense, it has to do the same work. Being curious, I tried one more query specifying the collation as case insensitive and wound up with the same results. Double checking the database found that I wasn't specifying the correct & current collation which was SQL_Latin1_General_CP1_CI_AS. Running one more time resulted in the same results that I got in the first query with no collation specified.

Sorry if that is confusing. The net is that if you specify any collation except the one in use on that column or you use a convert you're going to force SQL to check each row, most likely resulting in either an index scan or a table scan.

That got me thinking about how I could improve the performance without changing the collation (an article for another day!). My thought process was that the query itself was quick, the case sensitive compare slow only because it had to look at every row, how to limit the query first and then do the compare. For once it turned out to be exactly that easy, resulting in the following:

select * from instantforum_members where fullname = 'andy warren' and convert(varbinary(50), fullname) = 'andy warren'

Just that simple change gave me the results I needed at basically zero cost. I've only tested on two tables so I can't say with certainty that the results I show here are always true, but so far it seems to be consistent. It also shows how even the simplest change can have an impact on query efficiency. In a follow up article I'll look into others options including making the column case sensitive.

Total article views: 10052 | Views in the last 30 days: 2
Related Articles

Case sensitive columns - Collation

Hi, I am implementing a products webpage. The products table has a discount code column. I want thi...



using collate in query union with other query without collate


“Distinct” column in SQL query results

“Distinct” column in SQL query results


Collation for case sensitive being converted from DB2

Case sensitive in SQL Server 2008 is not the same as DB2


Saving query results in Excel with column names

Saving query results in Excel with column names