Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234

character ordering Expand / Collapse
Author
Message
Posted Sunday, November 04, 2012 8:00 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Monday, December 10, 2012 12:11 PM
Points: 66, Visits: 101
Interesting remarks form vv-kirov, though l'eomot's querie is not all incorrect.

The point is that the function char() will give an output based on the character set of the code page in the collation of the database (at least that is the behaviour vv-kirov's example shows).

So l'eomot's querie will only give the right answer if the collation has already a latin1 codepage base, because when running it on a cyrillic or japanese based collation its output can give totally other characters.

A "collate" function afterwards (in the "between" statement) cannot change the character itself, only its position in the order.

Post #1380796
Posted Monday, November 05, 2012 8:26 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 8:32 PM
Points: 7,080, Visits: 7,125
vk-kirov (11/2/2012)

Incorrect. Try the following script and you'll get 62 characters:

CREATE DATABASE qod_collation_db COLLATE Japanese_CI_AS_KS_WS;
GO
USE qod_collation_db;
GO
WITH Numbers AS
( SELECT 0 AS n
UNION ALL
SELECT n + 1
FROM Numbers
WHERE n < 255
)
SELECT n AS code, CHAR(n) AS symbol
FROM Numbers AS n
WHERE CHAR(n) COLLATE Latin1_General_CI_AS BETWEEN '0' COLLATE Latin1_General_CI_AS AND 'Z' COLLATE Latin1_General_CI_AS
OPTION(MAXRECURSION 255);
GO
USE master;
GO
DROP DATABASE qod_collation_db;
GO

What's interesting if we execute 'SELECT CHAR(n)' on a Japanese_CI_AS_KS_WS database we'll get NULLs for the values of 'n' from 129 to 159 and from 224 to 252.

Yes, of course, you are completely right. The CHAR function depends on the effective default collation, not on any other collation however specified. It's fairly easy to get round it, although somewhat verbose: instad of "CHAR(n) collate Latin1_General_CI_AS" we can use "cast(NCHAR(n) COLLATE Latin1_General_CI_AS as CHAR(1))", resulting in
CREATE DATABASE qod_collation_db COLLATE Japanese_CI_AS_KS_WS;
GO
USE qod_collation_db;
GO
WITH Numbers AS
( SELECT 0 AS n
UNION ALL
SELECT n + 1
FROM Numbers
WHERE n < 255
)
SELECT n AS code, cast(NCHAR(n) COLLATE Latin1_General_CI_AS as CHAR(1)) AS symbol
FROM Numbers
WHERE cast(NCHAR(n) COLLATE Latin1_General_CI_AS as CHAR(1))
BETWEEN '0' COLLATE Latin1_General_CI_AS AND 'Z' COLLATE Latin1_General_CI_AS
OPTION(MAXRECURSION 255);
GO
USE master;
GO
DROP DATABASE qod_collation_db;
GO

which delivers 132 rows (and displays the Latin1 symbols).

I ought to have got that right first time, as I had plenty involvement over the years on internationalisation of text in SQL databases involving Latin, Cyrillic, Arabix, and both traditional and simplified Chinese writing. Perhaps I never noticed the behaviousr of the CHAR function because after I decreed that we would use Unicode for all text I didn't bother to look at CHAR - if so that was careless of me, if not then forgetting it was careless of me.

Another example (sorry for using a non-Latin character here; I don't know how you will see that character – in its original form, as a rectangle or something else; actually I'm using a cyrillic letter http://en.wikipedia.org/wiki/Tse_(Cyrillic)):
original form for me, although of course I normally use nchar(1094) rather than char(246) for that character. My default display font is Lucida Sans Unicode so I tend to see most characters (Arial Unicode MS has slightly better coverage, but it makes an ugly mess of quite a lot of symbols - perhaps has bad spacing hints).
Interesting that the upper-case tse shows as '?' both in Latin and in Japanese, while the lower case tse shows as white space in Japanese.


Tom
Que conclure à la fin de tous mes longs propos? C'est que les préjugés sont la raison des sots. (Voltaire, 1756)
Post #1381114
Posted Monday, November 05, 2012 12:15 PM


UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Wednesday, April 17, 2013 10:57 PM
Points: 1,491, Visits: 3,008
Thanks for the QOD, Tom.

I appreciate the heads-up that a deceptively simple condition (like "where myLetter between 'a' and 'Z'") could result in a surprise if one expects it to filter just the 26 English letters in upper or lower case. The answer of exactly how many it would include is, of course not nearly so important as the possibility that it's greater than 52 (or 62 in your original poser which starts with the numeric digits).
Post #1381233
Posted Wednesday, November 07, 2012 7:20 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, May 15, 2013 3:35 AM
Points: 75, Visits: 62
I'm supprised no-one has mentioned another implication of collation ordering and special characters that can be quite important; the impact on LIKE clauses that use set's of characters e.g.

DECLARE @var CHAR(1) = '¾'

SELECT @var
WHERE @var LIKE '%[0-9]%'

Similarly, this applies to the accented characters, so trying to use [a-z0-9] will not yield the standard set of alphanumeric characters that might be expected.
Which given the low success rate on this question (including my own error), I would suspect that this is a common oversight.

So thank you for what is actually an excellent question.
Post #1382233
« Prev Topic | Next Topic »

Add to briefcase «««1234

Permissions Expand / Collapse