RE: character ordering – SQLServerCentral

SSC Guru

Points: 104773

April 17, 2014 at 8:21 pm

TomThomson (11/5/2012)
The CHAR function depends on the effective default collation, not on any other collation however specified. It's fairly easy to get round it, although somewhat verbose: instad of "CHAR(n) collate Latin1_General_CI_AS" we can use "cast(NCHAR(n) COLLATE Latin1_General_CI_AS as CHAR(1))", resulting in
...
...
which delivers 132 rows (and displays the Latin1 symbols).

I must have been on a bad day when I wrote that line without noticing :blush: that 132 and 139 were different.

Recently I noticed a similar problem (it wasn't with that Japanese collation, but the same misbehaviour of the CHAR and NCHAR functions) and did enough work to find out why. So I remembered that I had written something that claimed the problem could be avoided without switching to a context where the default collation was OK and thought I'd better post a correction, as in fact I now know it can't be done. When I saw that I had claimed code which poducse 132 instead of 139 solved the problem I was rather surprised (more that no-one picked me up on it than that I made a mistake - I do plenty of those).

Why (with vk-kirov's example Japanes collation) was it missing 7 characters, with all those collate clauses in there? The answer is simple - cast(NCHAR(n) COLLATE Latin1_General_CI_AS as CHAR(1)) might as well (given the range of n is 0 to 255) just be NCHAR(n), the cast is pointless and the collate clause is pointless too, because it's applied to the result of the call on the NCHAR funcion, and the NCHAR(1) value doesn't change just because a COLLATE clause is applied to it; so vk-kirov was right, and to get the correct result you have to run the query in an environment where the default code set doesn't cause NCHAR to map a code to NULL or to NCHAR(63) (the latter is a question mark, used for "I don't know what this code means" as well as for "?") that wouldn't be so mapped with the default collation. The 7 missing characters are all mapped to NCHAR(63) or CHAR(63) by the NCHAR or CHAR function when that Japanese collation is the database default collation, that's why they go missing. This of course means that this collation doesn't allow unicode conformant T-SQL programming. Most collations don't, I believe.

CHAR of course messes up a lot more stuff than NCHAR does, losing 77 of the "alphnumeric" characters, rather than just 7, by generating a mix of 63s and NULLs, but then CHAR doesn't pretend to do unicode while NCHAR does.

Tom