Why ''Æ'' equals ''AE'' ?

  • Hello Everyone,

    Each and every day we learn something new...

    We just noted that issuing those statement return the same record:

    select dary_id from dary where dary_id = 'SAET'

    select dary_id from dary where dary_id = 'SÆT'

    The record that contains 'SAET'.

    Our database collation is: Latin1_General_CS_AS

    For me its a little bit surprising... why this behavior since Æ is a distinc character in the 1252 codepage?

    Also is this behavior avoidable?

    Thank's a lot,

    Carl

  • hi carl,

    >> Also is this behavior avoidable?

    yes, it all has to do with Collation, you must be aware

    of your data and what type of restrictions you want

    performed when using the Sequel statement.

    here is an example:

    --DROP TABLE dary

    CREATE TABLE [dary] (

    [dary_id] [char] (10) COLLATE Latin1_General_CS_AS NULL

    ) ON [PRIMARY]

    GO

    INSERT INTO [dary] VALUES('SAET')

    INSERT INTO [dary] VALUES('SÆT')

    GO

    --without using collation

    select dary_id from dary where dary_id = 'SAET'

    select dary_id from dary where dary_id = 'SÆT'

    --resultset will return 2 rows for both query

    --with using collation that focuses on kanatype-sensitive

    select dary_id from dary where dary_id COLLATE Albanian_CS_AS_KS = 'SAET'

    select dary_id from dary where dary_id COLLATE Albanian_CS_AS_KS = 'SÆT'

    select dary_id from dary where dary_id COLLATE Latin1_General_BIN='SAET'

    select dary_id from dary where dary_id COLLATE Latin1_General_BIN= 'SÆT'

    --resultset will return only 1 row for each query

     

    hope this helps

    below is the collation name and description on how it treats

    data while performing sql comparision:

    --------------------------------

    /*

    SELECT *

    FROM ::fn_helpcollations()

    WHERE NAME = 'Albanian_CS_AS_KS'

    --Albanian_CS_AS_KS

    --Albanian,

    --case-sensitive,

    --accent-sensitive,

    --kanatype-sensitive,

    --width-insensitive

    SELECT *

    FROM ::fn_helpcollations()

    WHERE NAME = 'Latin1_General_CS_AS'

    --Latin1-General,

    --case-sensitive,

    --accent-sensitive,

    --kanatype-insensitive,

    --width-insensitive

    */

    -------------------------------------------------

  • Hello Bersileus,

    Thanks for that info.

    I never tougth kantype sensitiveness would have something to do with this behavior.

    Look at what it is said in BOL about that:

    "Specifies that SQL Server distinguish between the two types of Japanese kana characters: Hiragana and Katakana.

    If not selected, SQL Server considers Hiragana and Katakana characters to be equal."

    Far away from what is my problem now...

    Ok starting from that why using the collation LATIN_GENERAL_CS_AS_KS does not works (does not distinguihs 'Æ' from 'AE')?

    Regards,

    Carl

  • Hello Bersileus,

    think it miss me something. There is something I don't understand.

    How do you know which collation are "focussed" on kanatype sensitiveness?

    I would have first tougth it is just related to the fact of using KS or not in the collation designator but it is clearly not the case.

    DANISH_NORWEGIAN_CS_AS distinguish 'Æ' from 'AE' the same way as DANISH_NORWEGIAN_CS_AS_KS does.

    Can you shed some ligth on this for me?

    Thank's a lot,

    Carl

  • hi carl,

    references:

    (nice reference)

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqldev/html/sqldev_06112004.asp

    (examples using collation - is an MVP thanks to umachandar)

    http://www.umachandar.com/technical/SQL2000Scripts/Main10.htm 

    yes, you are right about the Danish Collation, collation (in a highlevel view) is kind of a language vocabulary (speech modex comparision), for each region/country. I have never been to denmark but i guessing that in their language Æ is a common letter or word to them. Each region treats the alphabet & characters differently (sounds the same, spelled different, etc..) - hence the use of case,accent,kanatype and width, using collation you can "Write Language-Portable Transact-SQL". Try the french it does not matter what collation you pick except for the French_BIN it always returns 2 resultset. (as a footnote the <country>_BIN always does comparative weight on each individual characters). (btw you should also have some knowledge on how the country treats its characters, to really understand things better - since SQL Server does this for us and is transparent - totally cool mechanism on characters weights). Collation is a rule on how characters are to be weighted against each other. See below definition, this extract was taken from BOL(using the reference above)

    NOTE:

    Whether SQL Server 2000 considers a character less than or greater than another character is collation-dependent: a collation defines the bit patterns that represent each character in a character string and the rules controlling

    how characters are sorted and compared.

    hope that sheds some light on the matter

  • Thank's Barsileus,

    The concept of collation is not the problem here the problem is that since SQL Server 2000 was deliver it was written in BOL:

    About Kana-sensitivenes:

    "Specifies that SQL Server distinguish between the two types of Japanese kana characters: Hiragana and Katakana.

    If not selected, SQL Server considers Hiragana and Katakana characters to be equal."

    And in this definition, there is nothing to do with the problem we just got .

    Now what you said help me learn a little bit more about this "strange" behavior .

    Best regards,

    Carl

  • The collation sets the equality of characters.

    If you must absolute distinguish between characters that SQL considers equal:

    in SQL  where A = a is true,

    but

    where convert(varbinary,'A') = select convert(varbinary,'a') is false

    compare their binary values.

    select convert(varbinary,'A')

    select convert(varbinary,'a')

    This gives different results.

     

     

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply