Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

SQLStudies

My name is Kenneth Fisher and I am Senior DBA for a large (multi-national) insurance company. I have been working with databases for over 20 years starting with Clarion and Foxpro. I’ve been working with SQL Server for 12 years but have only really started “studying” the subject for the last 3. I don’t have any real "specialities" but I enjoy trouble shooting and teaching. Thus far I’ve earned by MCITP Database Administrator 2008, MCTS Database Administrator 2005, and MCTS Database Developer 2008. I’m currently studying for my MCITP Database Developer 2008 and should start in on the 2012 exams next year. My blog is at www.sqlstudies.com.

Collation: Definition

I had a recent run in with collation problems and it got me started reading. As I read I started to realize that for a seemingly simple subject there was a lot to it. So I thought I would go over some of what I found out. First things first of course.

What is a collation? Collation is a property assigned to any character field that defines the sort and comparison rules for the field. The information I’m going over is mostly found in BOL under Collation and Unicode Support. It’s something like 10 pages long which just goes to show I wasn’t exaggerating about the depth of the subject.

Options

Prior to SQL Server 2012 there were four options associated with collation with a 5th added in SQL Server 2012. They are specified by appending them to the collation name along with a _. So Collation_CS_AS_KS_WS_SC is case sensitive, accent sensitive, kana sensitive, width sensitive and supplementary character sensitive.

  • Case sensitivity (CS/CI) – Will the sort/comparison take case (upper/lower) of the characters into account. If CS (case sensitive) it will and if CI (case insensitive) it will not.
  • Accent sensitivity (AS/AI) – Will the sort/comparison take into account accented characters (a/?). If AS (accent sensitive) then it will. If AI (accent insensitive) it will not.
  • Kana sensitivity (KS) – This option is specific to Japanese kana characters. There are two types of kana characters, Hiragana and Katakana. If KS is specified then the sort/comparison will be treat the kana characters differently. If it is not specified then they are treated the same. There is no KI.
  • Width sensitivity (WS) – I find this an interesting option. This determines if the storage (one byte or two) is taken into account in the sort/comparison. There is no WI. If WS is included then the collation is width sensitive. If it isn’t included then the collation is not width sensitive
  • UTF-16 Supplementary Characters (SC) – Thanks to Bob Beauchemin for pointing this one out to me. But basically this is new as of SQL 2012 and it supports UTF16 characters that require more than a 16bit (2 byte) word. It is supported in 90 level and up collations.

Collation sets

SQL Server supports three different collation sets. Windows collations, Binary collations, and SQL Server collations. I honestly found the differences rather confusing so I’ll let you read up on them yourself.

Code pages

In addition to the sort/comparison options, for non-Unicode data types (char/varchar/text) collation also determines the code page. Because non-Unicode types only have one byte per character they are more restricted in the number of characters available. The code page determines what character set is available. So for example Hebrew collations will use an entirely different set of characters than English ones. Unicode datatypes (nchar/nvarchar/ntext) don’t have the same problem. Because they have the additional byte per character they can hold a large enough number of possible characters that a code page isn’t really necessary.


Over the next few weeks I’m going to do a number of different posts around the surprisingly deep subject of collation.

As a start listen to Rob Farley’s hilarious talk on collation
Collation: Location
Collation: Expression
Collation: Correction -> Expression, Column, Database
Collation: Correction -> Instance
Collation: Temporary objects & variables: Demonstration
Collation: Confusion


Filed under: Collation, Microsoft SQL Server, SQLServerPedia Syndication, T-SQL Tagged: code language, collation, language sql, microsoft sql server, T-SQL

Comments

Leave a comment on the original post [sqlstudies.com, opens in a new window]

Loading comments...