RE: character ordering – SQLServerCentral

SSC Guru

Points: 104773

November 5, 2012 at 8:26 am

vk-kirov (11/2/2012)
Incorrect. Try the following script and you'll get 62 characters:
CREATE DATABASE qod_collation_db COLLATE Japanese_CI_AS_KS_WS;
GO
USE qod_collation_db;
GO
WITH Numbers AS
( SELECT 0 AS n
UNION ALL
SELECT n + 1
FROM Numbers
WHERE n < 255
)
SELECT n AS code, CHAR(n) AS symbol
FROM Numbers AS n
WHERE CHAR(n) COLLATE Latin1_General_CI_AS BETWEEN '0' COLLATE Latin1_General_CI_AS AND 'Z' COLLATE Latin1_General_CI_AS
OPTION(MAXRECURSION 255);
GO
USE master;
GO
DROP DATABASE qod_collation_db;
GO
What's interesting if we execute 'SELECT CHAR(n)' on a Japanese_CI_AS_KS_WS database we'll get NULLs for the values of 'n' from 129 to 159 and from 224 to 252.

Yes, of course, you are completely right. The CHAR function depends on the effective default collation, not on any other collation however specified. It's fairly easy to get round it, although somewhat verbose: instad of "CHAR(n) collate Latin1_General_CI_AS" we can use "cast(NCHAR(n) COLLATE Latin1_General_CI_AS as CHAR(1))", resulting in

CREATE DATABASE qod_collation_db COLLATE Japanese_CI_AS_KS_WS;

GO

USE qod_collation_db;

GO

WITH Numbers AS

( SELECT 0 AS n

UNION ALL

SELECT n + 1

FROM Numbers

WHERE n < 255

)

SELECT n AS code, cast(NCHAR(n) COLLATE Latin1_General_CI_AS as CHAR(1)) AS symbol

FROM Numbers

WHERE cast(NCHAR(n) COLLATE Latin1_General_CI_AS as CHAR(1))

BETWEEN '0' COLLATE Latin1_General_CI_AS AND 'Z' COLLATE Latin1_General_CI_AS

OPTION(MAXRECURSION 255);

GO

USE master;

GO

DROP DATABASE qod_collation_db;

GO

which delivers 132 rows (and displays the Latin1 symbols).

I ought to have got that right first time, as I had plenty involvement over the years on internationalisation of text in SQL databases involving Latin, Cyrillic, Arabix, and both traditional and simplified Chinese writing. Perhaps I never noticed the behaviousr of the CHAR function because after I decreed that we would use Unicode for all text I didn't bother to look at CHAR - if so that was careless of me, if not then forgetting it was careless of me. :blush:

Another example (sorry for using a non-Latin character here; I don't know how you will see that character – in its original form, as a rectangle or something else; actually I'm using a cyrillic letter http://en.wikipedia.org/wiki/Tse_(Cyrillic)):

original form for me, although of course I normally use nchar(1094) rather than char(246) for that character. My default display font is Lucida Sans Unicode so I tend to see most characters (Arial Unicode MS has slightly better coverage, but it makes an ugly mess of quite a lot of symbols - perhaps has bad spacing hints).

Interesting that the upper-case tse shows as '?' both in Latin and in Japanese, while the lower case tse shows as white space in Japanese.

Tom