Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Case Sensitivity in Selects - Part 3

By Andy Warren,

I've published two prior articles (Case Sensitivity in Selects - Part 1 and Case Sensitivity in Selects - Part 2), you'll benefit from reading them before continuing with this one. I'd like to start by looking at part of a great email I received from reader John Sands (check out his blog!). As John explained, he read the first article and wanted to make sure he understood the code AND it worked before passing it around. Bravo! Anyway, he wrote because he was unable to achieve the results I mentioned using the convert to binary option. I'll start by including his text, then look at what happened. After that I'll discuss the underlying problem that started this journey and see if I can come up with any alternatives.

I did this update in Northwind to test it:

update Orders
set CustomerID = 'frans', ShipCity = 'torino'
where orderid = 10753

I did two columns to try nchar and nvarchar columns. CustomerID is
char(5) and ShipCity is nvarchar(15).


These queries return six rows:

select OrderID, CustomerID, ShipCity
from Orders
where CustomerID = 'frans'

select OrderID, CustomerID, ShipCity
from Orders
where ShipCity = 'torino'

OrderID CustomerID ShipCity
----------- ---------- ---------------
10422 FRANS Torino
10710 FRANS Torino
10753 frans torino
10807 FRANS Torino
11026 FRANS Torino
11060 FRANS Torino

(6 row(s) affected)

OrderID CustomerID ShipCity
----------- ---------- ---------------
10422 FRANS Torino
10710 FRANS Torino
10753 frans torino
10807 FRANS Torino
11026 FRANS Torino
11060 FRANS Torino

(6 row(s) affected)


I want to return just the lowercase row but NONE of these return any
results:

select OrderID, CustomerID, ShipCity
from Orders
where convert(varbinary(50), CustomerID) = 'frans'

select OrderID, CustomerID, ShipCity
from Orders
where convert(varbinary(15), ShipCity) = 'torino'

select OrderID, CustomerID, ShipCity
from Orders
where convert(varbinary(5), CustomerID) = 'frans'

select OrderID, CustomerID, ShipCity
from Orders
where convert(binary(5), CustomerID) = 'frans'

select OrderID, CustomerID, ShipCity
from Orders
where convert(varbinary(5), CustomerID) = convert(varbinary(5), 'frans')

select OrderID, CustomerID, ShipCity
from Orders
where cast(CustomerID as varbinary) = cast('frans' as varbinary)

select OrderID, CustomerID, ShipCity
from Orders
where cast(ShipCity as varbinary) = cast('torino' as varbinary)

select OrderID, CustomerID, ShipCity
from Orders
where cast(CustomerID as binary) = cast('frans' as binary)

select OrderID, CustomerID, ShipCity
from Orders
where convert(binary(5), CustomerID) = convert(binary(5), 'frans')
 

He's exactly right, none of those work! Took a few minutes to figure out, but the culprit was that the customerid column was an nchar. Because it's a binary, you have to make sure to compare apples to apples. To help me figure out what was going on, I included the converts in the select so I could see exactly what was occurring, like this:

select OrderID, CustomerID, ShipCity, convert(varbinary(5), convert(varchar(5), customerid)),
convert(varbinary(5), 'frans') from Orders
where convert(varbinary(5), convert(varchar(5), CustomerID)) = convert(varbinary(5), 'frans')



Just another 'gotcha' when dealing with mixed case sensitivity.

As I mentioned way back in part one this all started with me tracking down a performance issue. That got me curious and now that I've explored this ugly part of SQL, it's time to see if I can't solve the problem more elegantly. Here is basically the table I'm dealing with at work:

A pretty standard lookup table right? Well, where we got off track is that the product that implements this was meant to run on multiple database platforms and because (as I understand it) not all databases supported case insensitivity, they went with the lowest common denominator. All that means is that they always did a case sensitive comparison. We could have stopped the madness by just making the column unique! Instead, we have junk like I've shown above where we have a case insensitive column that has duplicate values that aren't duplicates. That means we have to account for them in every join we do. What's worse, we can't even easily stop users from adding more since we can't set a unique index (case insensitive).

The first option is to identify the duplicates and remove them. To do that you have to find every place where it might have been used and replace it with the new code. Potentially in any of 40 or so columns across 10 tables in each of about 200 databases. Doable, just takes a little time to set up. Once done I could add a unique index to make sure the problem doesn't recur.

Now that I think about it, what's wrong with that? Do it and be done!

One thought I had was to add a new column called perhaps CodeInsensitive and store a binary representation of 'Code'. That way I could still use the technique I offered in article #1 about doing the join plus the compare, but now I'd have the big part of the work already done. Why convert each time if you can just do it once? Easy enough to maintain via a computed column or triggers, and it would be indexable. Even uniquely indexable. Worth doing? Questionable I think, doesn't seem like I'd gain much in performance over what I've already done.

Tried to think of a way to use a view to help, but it doesn't take me anywhere.

The only remaining option I can see is to modify everything that does the old style case sensitive join to use the new method I described. I'll have to check to see how many stored procs this might touch. Changing them is probably faster than doing all the updates to remove the duplicates, about even in risk maybe.

At this point I haven't decided. Removing the duplicates is clearly the right thing to do, just changing the code is probably faster. Tradeoffs to be made.

I imagine this only scratches the surface of issues you can have using multiple collations. I think I'll just try to avoid them. In closing, you might think about how case sensitivity affects other tools/languages. XML is case sensitive. VB6 and VB.Net are not case sensitive, but the string comparisons they do are unless you've set Option Compare Text on (please don't). C# and many others the language is case sensitive.

Total article views: 7339 | Views in the last 30 days: 4
 
Related Articles
FORUM

Converting a date within the WHERE or JOIN areas

Why convert/cast a date within the WHERE or JOIN?

FORUM

convertion

convertion

FORUM

For development/Application DBA's- Pls suggest , I need to filter data according to customer group(CustomerID) for each login

making changes in joins (to include customerId,wherever misisng) in various SP's may not be feasible...

FORUM

common Question about CustomerID search

can it search 2 tables, one after another

FORUM

Converting from Varchar to bigint

Varchar is sent where it is split and should bge converted to bigint

Tags
administration    
advanced querying    
sql server 7    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones