CS Collation

  • Alex Fekken (9/12/2012)


    I don't get it either: so 'feet' would come after 'foot'?

    That puzzled me as well, so I did some more testing and found that

    "final" comes before "initial"

    "eleven" comes before "ten"

    "lots and lots" comes before "not many at all"

    and

    "bags" comes before "nowt"

    Should this be reported to Microsft as a bug?

  • vk-kirov (9/12/2012)

    Well, I should note that nouns ending with the letter S are considered plural in English. But here we deal with a Latin collation, and the Romans had their Latin plural nouns ending with 'ae', 'i', 'ia', 'es' and other stuff :hehe:

    If we substitute 'peppers' by 'pepperl', that 'pepperl' still will be in the end of the list... So I don't think that plural is the explanation.

    Latin1_General_CS_AS is Windows collation so it is follows the same word sort rules as a normal Windows string compare.

    Check the property of this collation by executing the below sql

    SELECT

    description

    ,COLLATIONPROPERTY(name, 'CodePage') AS CodePage

    ,COLLATIONPROPERTY(name, 'LCID') AS LCID

    FROM fn_helpcollations()

    WHERE name = N'Latin1_General_CS_AS';

    This query shows that the collation is case-sensitive, accent-sensitive, and the code page 1252 and LCID 1033, if you have locale as 1033, then default Windows string comparision rules will provide the same behavior as the database engine like any other in plain english.

    The interpretation of the word "latin" does not indicated the real latin language here. 🙂 and the code page 1252 is like most used common code page in general everywhere. So I also added the soundex code to show how the value changes when you add "s" at the end where the consideration of the sort will change.

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • Toreador (9/12/2012)


    Alex Fekken (9/12/2012)


    I don't get it either: so 'feet' would come after 'foot'?

    That puzzled me as well, so I did some more testing and found that

    "final" comes before "initial"

    "eleven" comes before "ten"

    "lots and lots" comes before "not many at all"

    and

    "bags" comes before "nowt"

    Should this be reported to Microsft as a bug?

    SELECT SOUNDEX ('feet')

    RESULTS: F300

    SELECT SOUNDEX ('foot')

    RESULTS: F300

    SELECT SOUNDEX ('ball')

    RESULTS: B400

    SELECT SOUNDEX ('balls')

    RESULTS: B420

    for the foot and feet even though it is plural the soundex is same, but check on the ball and balls, where the soundex differs so the the plural takes the position after ball.

    Like I earlier said "to my knowledge...." , sorting mainly happens on the word and its weight, so may be it takes other stuff also in to the consideration.

    (hang on, some expert will come and will answer us with correctly )

    EDIT: my saying refers to the char "s" which was added in the end, but not sure of the sentence sorting like what you have mentioned.

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • SQL Kiwi (9/11/2012)


    Avocado 😀

    Doh! Don't tell my kids, I've been tormenting them with the word for months. (Thank you!)

    Toreador (9/12/2012)


    None of the options gave the expected order, so it was obviously a question about the Latin1_General collation rather than case sensitivity.

    The family of Latin1_General collation tend to be the ones I see as default. (Yes, a US bias, and I didn't play with the collations from other languages because the languages I know beyond English tend only to be used for programming.) Honestly I started playing with them a little after reading "Stairway to T-SQL DML Level 6: The Basics of Sorting Data Using the ORDER BY Clause[/url]". There were a couple surprises I found, apparently most critical to me was that what I always assumed was a 'correct' sort order was what SQL calls the binary sort order. After poking this bear for a while, I thought, N'I wonder if everyone else already knows this. I could imagine someone writing this up as a QotD, I wonder if I could.' 😉

    vk-kirov (9/12/2012)


    I don't understand why peppers are in the end of the list. Shouldn't they be between pepper and Pepper? 🙂

    My thought exactly. My interpretation was that it considers peppers as a different word for purposes of dictionary sort order. I wish I could have found a better reference to include in the explanation. I think Raghavendra Mudugal is now providing a better explanation than I could possibly hope for. THANKS!

  • Thanks for the question... I never understand these collation quirks.

    According to the documentation:

    _CS_AS = Case-sensitive, accent-sensitive, kana-insensitive, width-insensitive

    If it's case-sensitive but NOT width-sensitive, why is "peppers" at the end??

  • For my understanding:

    1. It sorts the words regardless of their case one letter at a time. A shorter word will come before a longer one because spaces come before letters and numbers (and numbers before letters). eg.

    p = p

    e = e

    p = p

    p = p

    e = e

    r = r

    s = ' '

    2. It then sorts same words by case, sorting lower case first and then upper case.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • sestell1 (9/12/2012)


    Thanks for the question... I never understand these collation quirks.

    According to the documentation:

    _CS_AS = Case-sensitive, accent-sensitive, kana-insensitive, width-insensitive

    If it's case-sensitive but NOT width-sensitive, why is "peppers" at the end??

    Width-sensitive will compare characters that are "the same" but will use one or two bytes depending on their definition.

    It's not width sensitive of the strings but for the characters.

    From BOL:

    Width-sensitive (_WS)

    Distinguishes between a single-byte character and the same character when represented as a double-byte character.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (9/12/2012)Width-sensitive will compare characters that are "the same" but will use one or two bytes depending on their definition.

    It's not width sensitive of the strings but for the characters.

    Doh! Thanks Luis!!

  • Good question, though I'm still having a hard time trying to understand collations.

    "El" Jerry.

    "A watt of Ottawa" - Gerardo Galvan

    To better understand your help request, please follow these best practices.[/url]

  • Guacamole

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Nice question. I almost never deal with case sensitivity but this is good to know anyways.

    This talk of avocados and guacamole is making me hungry. 🙂

  • Really good one - had me scratch my head. Thanks!

  • Taking a step back, why should the sort order depend on either SOUNDEX (especially when case sensitive) or gramnatical interpretation at all?

    That sounds like total stupidity to me.

    // Corrected "atupid" typos

  • I'm also a bit confused why peppers is at the end. If lowercase p is before uppercase p, which I knew it would be, why was it at the end? And the, I hope joking, about soundex and plurals is not the reason.

    try this:

    VALUES ('Asdfgh'),('ASDFGH'),('asdfgh'),('a3df3r'),('asdfghi');

    which essentially takes the meaning of the word out of the picture. "asdfghi" still sorts at the end... even after the uppercase "a" words.

  • emiddlebrooks (9/12/2012)


    I'm also a bit confused why peppers is at the end. If lowercase p is before uppercase p, which I knew it would be, why was it at the end? And the, I hope joking, about soundex and plurals is not the reason.

    try this:

    VALUES ('Asdfgh'),('ASDFGH'),('asdfgh'),('a3df3r'),('asdfghi');

    which essentially takes the meaning of the word out of the picture. "asdfghi" still sorts at the end... even after the uppercase "a" words.

    It is as if a case-insensitive sort is carried out first and then the collation order is used only to break the ties in the case-insensitive sort order.

    Certainly not what I would have expected. Brilliant question....

Viewing 15 posts - 16 through 30 (of 45 total)

You must be logged in to reply to this topic. Login to reply