CS Collation

Question

CS Collation

Viewing 15 posts - 16 through 30 (of 45 total)

You must be logged in to reply to this topic. Login to reply

Raghavendra Mudugal SSChampion Points: 10658 More actions · Answer 1

vk-kirov (9/12/2012)
Well, I should note that nouns ending with the letter S are considered plural in English. But here we deal with a Latin collation, and the Romans had their Latin plural nouns ending with 'ae', 'i', 'ia', 'es' and other stuff :hehe:
If we substitute 'peppers' by 'pepperl', that 'pepperl' still will be in the end of the list... So I don't think that plural is the explanation.

Latin1_General_CS_AS is Windows collation so it is follows the same word sort rules as a normal Windows string compare.

Check the property of this collation by executing the below sql

SELECT

description

,COLLATIONPROPERTY(name, 'CodePage') AS CodePage

,COLLATIONPROPERTY(name, 'LCID') AS LCID

FROM fn_helpcollations()

WHERE name = N'Latin1_General_CS_AS';

This query shows that the collation is case-sensitive, accent-sensitive, and the code page 1252 and LCID 1033, if you have locale as 1033, then default Windows string comparision rules will provide the same behavior as the database engine like any other in plain english.

The interpretation of the word "latin" does not indicated the real latin language here. 🙂 and the code page 1252 is like most used common code page in general everywhere. So I also added the soundex code to show how the value changes when you add "s" at the end where the consideration of the sort will change.

ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

Raghavendra Mudugal SSChampion Points: 10658 More actions · Answer 2

Toreador (9/12/2012)
Alex Fekken (9/12/2012)
I don't get it either: so 'feet' would come after 'foot'?
That puzzled me as well, so I did some more testing and found that
"final" comes before "initial"
"eleven" comes before "ten"
"lots and lots" comes before "not many at all"
and
"bags" comes before "nowt"
Should this be reported to Microsft as a bug?

SELECT SOUNDEX ('feet')

RESULTS: F300

SELECT SOUNDEX ('foot')

RESULTS: F300

SELECT SOUNDEX ('ball')

RESULTS: B400

SELECT SOUNDEX ('balls')

RESULTS: B420

for the foot and feet even though it is plural the soundex is same, but check on the ball and balls, where the soundex differs so the the plural takes the position after ball.

Like I earlier said "to my knowledge...." , sorting mainly happens on the word and its weight, so may be it takes other stuff also in to the consideration.

(hang on, some expert will come and will answer us with correctly )

EDIT: my saying refers to the char "s" which was added in the end, but not sure of the sentence sorting like what you have mentioned.

ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

Rich Weissler Hall of Fame Points: 3286 More actions · Answer 3

SQL Kiwi (9/11/2012)
Avocado 😀

Doh! Don't tell my kids, I've been tormenting them with the word for months. (Thank you!)

Toreador (9/12/2012)
None of the options gave the expected order, so it was obviously a question about the Latin1_General collation rather than case sensitivity.

The family of Latin1_General collation tend to be the ones I see as default. (Yes, a US bias, and I didn't play with the collations from other languages because the languages I know beyond English tend only to be used for programming.) Honestly I started playing with them a little after reading "Stairway to T-SQL DML Level 6: The Basics of Sorting Data Using the ORDER BY Clause[/url]". There were a couple surprises I found, apparently most critical to me was that what I always assumed was a 'correct' sort order was what SQL calls the binary sort order. After poking this bear for a while, I thought, N'I wonder if everyone else already knows this. I could imagine someone writing this up as a QotD, I wonder if I could.' 😉

vk-kirov (9/12/2012)
I don't understand why peppers are in the end of the list. Shouldn't they be between pepper and Pepper? 🙂

My thought exactly. My interpretation was that it considers peppers as a different word for purposes of dictionary sort order. I wish I could have found a better reference to include in the explanation. I think Raghavendra Mudugal is now providing a better explanation than I could possibly hope for. THANKS!

sestell1 SSChampion Points: 10230 More actions · Answer 4

Thanks for the question... I never understand these collation quirks.

According to the documentation:

_CS_AS = Case-sensitive, accent-sensitive, kana-insensitive, width-insensitive

If it's case-sensitive but NOT width-sensitive, why is "peppers" at the end??

Luis Cazares SSC Guru Points: 183706 More actions · Answer 5

For my understanding:

1. It sorts the words regardless of their case one letter at a time. A shorter word will come before a longer one because spaces come before letters and numbers (and numbers before letters). eg.

p = p

e = e

p = p

e = e

r = r

s = ' '

2. It then sorts same words by case, sorting lower case first and then upper case.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Luis Cazares SSC Guru Points: 183706 More actions · Answer 6

sestell1 (9/12/2012)
Thanks for the question... I never understand these collation quirks.
According to the documentation:
_CS_AS = Case-sensitive, accent-sensitive, kana-insensitive, width-insensitive
If it's case-sensitive but NOT width-sensitive, why is "peppers" at the end??

Width-sensitive will compare characters that are "the same" but will use one or two bytes depending on their definition.

It's not width sensitive of the strings but for the characters.

From BOL:

Width-sensitive (_WS)
Distinguishes between a single-byte character and the same character when represented as a double-byte character.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

sestell1 SSChampion Points: 10230 More actions · Answer 7

Luis Cazares (9/12/2012)Width-sensitive will compare characters that are "the same" but will use one or two bytes depending on their definition.
It's not width sensitive of the strings but for the characters.

Doh! Thanks Luis!!

EL Jerry SSCertifiable Points: 7065 More actions · Answer 8

Good question, though I'm still having a hard time trying to understand collations.

"El" Jerry.

"A watt of Ottawa" - Gerardo Galvan

To better understand your help request, please follow these best practices.[/url]

SQLRNNR SSC Guru Points: 281344 More actions · Answer 9

Guacamole

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Ken SSCoach Points: 17191 More actions · Answer 10

Nice question. I almost never deal with case sensitivity but this is good to know anyways.

This talk of avocados and guacamole is making me hungry. 🙂

Revenant SSC-Forever Points: 42467 More actions · Answer 11

Revenant

SSC-Forever

Points: 42467

September 12, 2012 at 12:30 pm

#1536870

Really good one - had me scratch my head. Thanks!

Alex Fekken Ten Centuries Points: 1109 More actions · Answer 12

Taking a step back, why should the sort order depend on either SOUNDEX (especially when case sensitive) or gramnatical interpretation at all?

That sounds like total stupidity to me.

// Corrected "atupid" typos

emiddlebrooks Hall of Fame Points: 3473 More actions · Answer 13

I'm also a bit confused why peppers is at the end. If lowercase p is before uppercase p, which I knew it would be, why was it at the end? And the, I hope joking, about soundex and plurals is not the reason.

try this:

VALUES ('Asdfgh'),('ASDFGH'),('asdfgh'),('a3df3r'),('asdfghi');

which essentially takes the meaning of the word out of the picture. "asdfghi" still sorts at the end... even after the uppercase "a" words.

Alex Fekken Ten Centuries Points: 1109 More actions · Answer 14

emiddlebrooks (9/12/2012)
I'm also a bit confused why peppers is at the end. If lowercase p is before uppercase p, which I knew it would be, why was it at the end? And the, I hope joking, about soundex and plurals is not the reason.
try this:
VALUES ('Asdfgh'),('ASDFGH'),('asdfgh'),('a3df3r'),('asdfghi');
which essentially takes the meaning of the word out of the picture. "asdfghi" still sorts at the end... even after the uppercase "a" words.

It is as if a case-insensitive sort is carried out first and then the collation order is used only to break the ties in the case-insensitive sort order.

Certainly not what I would have expected. Brilliant question....

Alex Fekken Ten Centuries Points: 1109 More actions · Answer 15

Ah: that's basically what Luis already said: sorry Luis!