Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Case Sensitivity in Selects - Part 2

By Andy Warren,

In a recent article I discussed some potential performance issues that result from doing a case sensitive join on a column that is not case sensitive and offered one alternative. This time I'm going to look at whether it would make sense to change the collation on the column to case sensitive. You definitely should read the earlier article before continuing.

I've started by setting the column to case insensitive. Running the following two queries confirms that it is working correctly. The first query returns one row, the second query returns zero rows.

select * from instantforum_members where fullname='andy warren' 
select * from instantforum_members where fullname='Andy warren'

Good. I made a new table that contained just an identity column as a primary key and the fullname column, with the collation for the fullname column set to case insensitive. I then ran a simple query to see what would happen.

select top 10 * from instantforum_members m inner join andytemp a on m.fullname = a.fullname

This results in an error:


 

This is a big problem. If you change the collation on the column and you have existing queries that already join to it you may be breaking quite a few things. Changing the query to include a collation (sensitive or insensitive as you prefer) allows the query to run normally.

select top 10 * from instantforum_members m inner join andytemp a on m.fullname = a.fullname
collate sql_latin1_general_cp1_ci_as
select top 10 * from instantforum_members m inner join andytemp a on m.fullname = a.fullname 
collate sql_latin1_general_cp1_cs_as

Or does it? I set one row to have the value 'andy warren', the other to have 'Andy Warren', both changes made to the andytemp table, verified that I had the same configuration in members (I was experimenting while writing). I ended up testing four versions of the query:

--#1, sensitive, where on the sensitive column
select top 10 a.fullname as afullname, m.fullname as mfullname
 from instantforum_members m 
	inner join andytemp a 
		on m.fullname = a.fullname
 collate sql_latin1_general_cp1_cs_as 
 where m.fullname='andy warren'

--#2, sensitive, where on the insensitive column
select top 10 a.fullname as afullname, m.fullname as mfullname 
 from instantforum_members m 
	inner join andytemp a 
		on m.fullname = a.fullname
 collate sql_latin1_general_cp1_cs_as 
 where a.fullname='andy warren'

--#3, insensitive, where on the sensitive column
select top 10 a.fullname as afullname, m.fullname as mfullname 
 from instantforum_members m 
	inner join andytemp a 
		on m.fullname = a.fullname
 collate sql_latin1_general_cp1_ci_as 
 where m.fullname='andy warren'

--#4, insensitive, where on the insensitive column
select top 10 a.fullname as afullname, m.fullname as mfullname 
 from instantforum_members m 
	inner join andytemp a 
		on m.fullname = a.fullname
 collate sql_latin1_general_cp1_ci_as 
 where a.fullname='andy warren'

Care to guess how many rows each query returns?

  • #1    1 row
  • #2    2 rows
  • #3    2 rows
  • #4    4 rows

Column level collations give me a headache. We haven't even gotten back to my original focus which was making the query perform better, what I see here is that if I did change the collation I could get one of 3 possible result sets back depending on how I modified the queries.

I've looked at it three times, the headache is getting worse. How can I finish the article if I don't know why I get three different answers! Ever notice that growing is painful?

How to figure out what is going on? I go back to look at the query plans as a starting point. Here is what the cost of each was:

  • #1    1.15
  • #2    .00994
  • #3    .00994
  • #4    1.18

Now let's dive into each.

Query #1 only returns one row because there is only one row in members that matches the where clause and it has only one match in andytest. It's generating an index scan against andytest.

Query #2 matches two rows (I think) because andy warren = andy warren and Andy Warren = Andy Warren and the where clause is applied to a case insensitive column. The results of the query are shown below.

Query #3 is a case insensitive join that I expect to return 4 rows - the product - but only two are returned, apparently because the where clause restricts the member table to one row after the join is applied.

Query #4 is an insensitive join and the where applied on the other side, and it does return the expected four rows. It also results in an index scan against members, a result of having to convert the case sensitive column for the join.

Am I deciphering that correctly? I'm open to other interpretations.

As far as performance, the combination of the collation and where you apply the where filter make all the difference. If the optimizer can restrict the results up front by filtering against a column that doesn't require applying a different collation it's fast. If it has to apply the collation and then filter, it's slow.

To wrap up, applying the collation to the column solves one problem but it potentially creates another (note that this is only when you are mixing collations) in that where and how you filter makes a huge difference in how many rows you get back. I'm not sure I'd want to try to explaining this or be responsible for testing all the results. Maybe I'm just used to coding it and it works out to be the same. Definitely we know that you can't just go changing the collation without being prepared to fix everything that joins to it.

I've still got an idea or two to pursue on this, we'll take that up next time.

Part 3 is available.

Total article views: 8321 | Views in the last 30 days: 2
 
Related Articles
FORUM

collate

using collate in query union with other query without collate

SCRIPT

Change the collation of Columns

This script changes the collation of columns to default database collation in all the tables

FORUM

Different collation

Different collation

FORUM

how to write collation agnostic queries?

collation agnostic queries and best practices for writing applications targeting multiple countries

FORUM

Collation problem! dont know what to do :(

Silly collation...

Tags
administration    
advanced querying    
sql server 7    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones