The other day I was working with some data transfers on the SQLServerCentral.com systems and ran into an error that I'd never encountered.
As expected with any sort of upgrade, there will be issues that crop up. It's inevitable that you'll forget to test something, that some data will be messed up, something will cause a glitch. Hopefully they're small glitches, or at least ones you can fix quickly.
However I recently ran into an interesting glitch when comparing data between the old and new schemas and decided to share my adventure. I doubt many of you will run into it, but you never know. The story rambles on a bit, so if you came here to solve your 468 error, jump down to the Quick Fix section.
The Collation Error
We had a problem with some data that didn't convert properly, or at least I thought it didn't convert, so I went to compare a couple of rows from the old schema and the new schema.
One of the things that we process at SQLServerCentral every month is the payments to authors. I ran a report to see which authors needed to be paid and I found that the payment details were missing for some authors. I was sure they'd been in the old tables, so I ran a quick cross database query to check:
from old.dbo.authors a
inner join dbo.authorlist b
on a.email = b.email
where authorname = 'Steve Jones'
A simple query just designed to compare two different sets of data and let me see what might be missing. I was surprised to all of a sudden get this error:
Server: Msg 468, Level 16, State 9, Line 1
Cannot resolve the collation conflict between "Latin1_General_CI_AS"
and "SQL_Latin1_General_CP1_CI_AS" in the equal to operation.
The lines have been reformatted slightly to fit the new layout, but this was surprising to me as I've had no issues to date with the server, which was built in the UK. I connect remotely from the US, but I was surprised to see a collation issue.
A quick search on the error brought me to a few different explanations, but the best one is Tony Rogerson's.
Basically the server does not know how to make a comparison between the two sets of data. The data is stored in the same format, in this case the bits for 'Steve Jones' are the same in both cases, same data types, sizes, etc. The problem is that the interpretation of these bits is something the server is not sure about.
The problem is that the collation settings between the two tables are different. This used to be set at the server for all databases, but in SQL Server 2000 you could set a default for the server, a separate default for the database, and even specific collations for tables or column. While most DBAs in the US never deal with any collations other than the default, there are people all around the world that are more familiar with this.
The collation setting also specifies the sort order and how comparisons are made. We could be case sensitive, i.e. "Steve Jones" != "steve jones", or case insensitive where "Steve Jones" = "steve jones" = "STEVE JONES". Most people I know go with case insensitive, but you should be aware of other options.
Ordering also matters because in the English language, we typically expect that "A" comes before "B", but we often see "A" and "a" as coming equally before "B". However each of these characters has a binary value and you could expect the "A" comes before "B" on up to "Z", and then "a" would come next.
So there are any number of settings here. The default for most English speaking areas is code page 1252, Latin1_General_CI_AS. However the US decided to be different and we have "SQL_Latin1_General_CP1_CI_AS", same code page, as our default.
I'm not completely sure what the difference is between these two code pages, but I'm checking.
The Quick Fix
The quick fix is to choose one collation for your query. You can tell SQL Server which collation to use for the final results. You do this by adding the COLLATE clause to your SELECT statement. In my case, I would do this:
FROM old.dbo.authors a
INNER JOIN dbo.authorlist b
ON a.email = b.email
WHERE authorname = 'Steve Jones'
Note that it doesn't matter in this case whether I use "COLLATE Latin1_General_CI_AS" or "SQL_Latin1_General_CP1_CI_AS" as the collation setting. I'm explicitly telling SQL Server how to interpret both sets of data here. The white paper references below gives some good explanations on how this works.
Also note that this clause comes before the WHERE clause. Most of the examples I saw didn't have a WHERE clause and put COLLATE at the end. That won't solve the error; it needs to be before the WHERE clause.
This was a fairly simple fix, one that I actually solved in about 15 minutes, but it was something I'd never encountered before and I'm guessing many of you haven't seen it either. I hope this article sticks in the back of your mind in case you ever run into this.
And I'm certainly not an expert multi-language or collation DBA, so feel free to add to my explanation or point out any shortcomings in the discussion for this article.
Steve Jones © 2007, dkranch.net
A few references: