06 April 2017

156664 views

75 0

Questions About SQL Server Collations You Were Too Shy to Ask

Of course we all like our colleagues to think that we know everything there is to know about SQL Server Collations. However, the truth is that it is a rather complicated topic to fully understand and the cost of getting collation wrong can be great. If only one could ask certain questions on forums or at conferences without blushing. Help is at hand, because Robert Sheldon once again makes the complicated seem simple by answering those questions that you were too shy to ask

What is a SQL Server collation?
How do I find the collations supported on a SQL Server instance?
How do I identify Unicode-only collations?
What are the differences between Windows collations and SQL Server collations?
How do I set the collation at the server level?
How do I configure a database’s default collation?
How do I assign a collation to a column?
How do I specify a collation when querying data?
How do I query columns with different collations?
How do I create a foreign key on columns with different collations?
How do collations affect temporary tables?
How do I know what collations to use?

1. What is a SQL Server collation?

A collation is a configuration setting that determines how the database engine should treat character data at the server, database, or column level. SQL Server includes a large set of collations for handling the language and regional differences that come with supporting users and applications in different parts of the world.

Each collation serves two purposes. The first is to provide a character set that defines the bit pattern associated with each character.

SQL Server stores character data using either one byte or two bytes per character, depending on the column’s data type and assigned collation. For example, European languages require only a single-byte character set, which supports up to 256 bit patterns. On the other hand, many Asian languages include thousands of characters and require a double-byte character set, which supports up to 65,536 bit patterns.

Collations are intricately tied to the data types assigned to character columns. In SQL Server, you can configure a character column with a Unicode data type (nchar, nvarchar, or ntext) or non-Unicode data type (char, varchar, or text). For Unicode types, the character bit patterns conform to international standards that define a double-byte encoding scheme for mapping most of the world’s written languages, ensuring that the same bit pattern is always associated with the same character, regardless of the underlying environment.

Most collations support both Unicode and non-Unicode character data, although some support only Unicode data. Each collation that supports non-Unicode data is associated with a code page that defines the bit patterns for the non-Unicode characters. The collation must point to a code page supported by the underlying operating system; however, multiple collations can point to the same code page if they share the same character set. Unicode-only collations do not require code pages because they conform to the universal encoding model.

To view the code page associated with a collation, you can use the COLLATIONPROPERTY system function, as shown in the following example:

1	SELECT COLLATIONPROPERTY('Modern_Spanish_100_CS_AS', 'CodePage');

The function takes two arguments. The first is the name of the collation, in this case, Modern_Spanish_100_CS_AS, and the second is the CodePage property. The SELECT statement returns the value 1252, which is the code page identifier. On my system, 894 collations share this code page. (I’m running SQL Server 2016 Developer Edition on Windows 10).

The second purpose of the SQL Server collation is to specify the rules that govern how character data is sorted and compared, taking into account not only language and regional conventions, but also the options specific to a collation, such as case-sensitivity and accent-sensitivity.

Each collation enforces its own sorting and comparison rules, whether or not it points to the same code page as other collations. For this reason, query results can differ when accessing identical data in two columns configured with different collations, even if those collations are based on the same language.

In SQL Server, you can specify a collation at the server level, database level, or column level, and you can specify collations within your queries. Keep in mind, however, that you can run into query and performance issues when working with multiple collations at the same time.

2. How do I find the collations supported on a SQL Server instance?

To view a list of the available collations, you can use the sys.fn_helpcollations system table function to retrieve the collation names and their descriptions, as shown in the following example:

1 2	SELECT name, description FROM sys.fn_helpcollations();

On my system, the statement returns 3,887 collations that support a wide range of languages, with numerous collations for each language family. For example, the following table shows a number of the collations available to the Tibetan family (but not all):

Name	description
Tibetan_100_CI_AI_SC	Tibetan-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive, supplementary characters
Tibetan_100_CI_AI_WS_SC	Tibetan-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-sensitive, supplementary characters
Tibetan_100_CI_AI_KS_SC	Tibetan-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-insensitive, supplementary characters
Tibetan_100_CI_AI_KS_WS_SC	Tibetan-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive, supplementary characters
Tibetan_100_CI_AS_SC	Tibetan-100, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive, supplementary characters
Tibetan_100_CI_AS_WS_SC	Tibetan-100, case-insensitive, accent-sensitive, kanatype-insensitive, width-sensitive, supplementary characters
Tibetan_100_CI_AS_KS_SC	Tibetan-100, case-insensitive, accent-sensitive, kanatype-sensitive, width-insensitive, supplementary characters
Tibetan_100_CI_AS_KS_WS_SC	Tibetan-100, case-insensitive, accent-sensitive, kanatype-sensitive, width-sensitive, supplementary characters
Tibetan_100_CS_AI_SC	Tibetan-100, case-sensitive, accent-insensitive, kanatype-insensitive, width-insensitive, supplementary characters
Tibetan_100_CS_AI_WS_SC	Tibetan-100, case-sensitive, accent-insensitive, kanatype-insensitive, width-sensitive, supplementary characters

Most language families include numerous collations, each configured with specific options, such as width-sensitivity and accent-sensitivity. When specifying a collation in your T-SQL script, you must use the name exactly as it is returned by the sys.fn_helpcollations function. The collation name includes the supported language and, in some cases, the collation version. For example, the collations in the table above are all specific to the Tibetan language and are all version 100.

The remaining codes in each collation name (such as _CS_AI_WS_SC) indicate the options preconfigured on the collation. For example, CI means case-insensitive and AS means accent-sensitive. The description column shows what the option codes mean. For more detailed information, see the Microsoft article “Collation and Unicode Support.”

3. How do I identify Unicode-only collations?

Some collations support only Unicode characters and consequently can be used only with the nchar, nvarchar, or ntext data types, either within column definitions or when querying data from those columns. Unfortunately, you cannot tell from the collation name whether or not it is Unicode-only, but you can use a combination of the sys.fn_helpcollations and COLLATIONPROPERTY system functions to retrieve a list of Unicode-only collations:

SELECT name, description

FROM sys.fn_helpcollations()

WHERE COLLATIONPROPERTY(name, 'CodePage') = 0;

On my system, the SELECT statement returns 510 rows. You can, of course, narrow down the search to specific languages, but you get the idea. You use the COLLATIONPROPERTY system function to find the collations with a CodePage value of 0. Because a Unicode-only collation does not point to a code page, the database engine simply returns a 0, rather than a NULL value or some obscure number, or worse still, an error.

4. What are the differences between Windows collations and SQL Server collations?

SQL Server supports two collation types: Windows collations and SQL Server collations. You can tell them apart by their names. All SQL Server collation names begin with SQL_.

Each collation type supports both character-based collations and binary-based. Once again, you can distinguish them by their names. Binary-based collations end with either _BIN or _BIN2.

Character-based Windows collations represent the bulk of the collations available to SQL Server. The collations are based on the Windows locales available to the host operating system where SQL Server is installed. A locale is a collection of information specific to a location or culture. Windows uses locales to customize the interface for regional groups of users.

Windows collations in SQL Server use the same code pages as the Windows locales, as well as the same sorting and comparison rules. However, SQL Server provides collations for only a subset of Windows locales, and each collation supports only a subset of the attributes available to the Windows locale. The Windows collations in SQL Server are concerned with locales only in terms of data management.

Windows collations use the same algorithm for sorting and comparing both Unicode and non-Unicode data, making it easier to work with character data stored with different data types. You can retrieve a list of the character-based Windows collations available to your system by running the following SELECT statement:

SELECT name, description

FROM sys.fn_helpcollations()

WHERE name NOT LIKE 'sql_%' AND name NOT LIKE '%_bin%';

All we’re doing here is taking advantage of how collations are named by eliminating any names that start with sql_ and end with _bin or _bin2. On my system, I came up with 3,552 character-based Windows collations.

Binary-based collations represent the other type of Windows collations. They are similar to the character-based collations except in how the data is sorted. For non-Unicode data, sorting is based on the coded numerical values (code points) defined by the Windows locale. For Unicode data, sorting is based on the code points defined by the Unicode standard, so locale is not a factor. In either case, the sort order is simpler to enforce, helping to improve application performance.

SQL Server supports the two types of binary-based collations:

BIN: The database engine compares the first character of a data value by code point and the remaining characters by their bit patterns. This type represents the original approach to binary-based collations in SQL Server.
BIN2: The database engine compares all characters based on code points. This type represents the newer generation of binary-based collations.

To view a list of binary-based Windows collations, we can use the following SELECT statement, which returns 258 collations on my system:

SELECT name, description

FROM sys.fn_helpcollations()

WHERE name NOT LIKE 'sql_%' AND name LIKE '%_bin%';

In case you’re wondering how many of the binary-based Windows collations are Unicode only, we can run the following SELECT statement, which gives me 30 collations:

SELECT name, description

FROM sys.fn_helpcollations()

WHERE COLLATIONPROPERTY(name, 'CodePage') = 0

AND name NOT LIKE 'sql_%' AND name LIKE '%_bin%';

Now we move into the SQL Server collations, which were created before SQL Server supported windows collations. Microsoft keeps these collations on-hand for backward compatibility with older versions of SQL Server and legacy applications. You should use not use SQL Server collations for new development work.

That said, let’s take a closer look at SQL Server collations, should you need to use them. Like Windows collations, SQL Server collations use a code page for non-Unicode characters and follow the Unicode standard for the rest.

But here’s where things get tricky with SQL Server collations. The database engine applies non-Unicode sorting rules to the non-Unicode data, and Unicode sorting rules to the Unicode data, which can give you inconsistent results when working with character data stored in columns with different data types. Again, you should be using SQL Server collations only to support those legacy databases and applications.

To learn what character-based SQL Server collations an instance supports, we can run the following query, which gives me 73 collations:

SELECT name, description

FROM sys.fn_helpcollations()

WHERE name LIKE 'sql_%' AND name NOT LIKE '%_bin%';

Our final collation type is the binary-based SQL Server collation, of which there are few. These collations work much like the binary-based Windows collations, except with the limitations that come with the character-based SQL Server collations. To view a list of the binary-based SQL Server collations available to an environment, we can run the following SELECT statement, which returns only four rows on my system:

SELECT name, description

FROM sys.fn_helpcollations()

WHERE name LIKE 'sql_%_bin%';

It should also be noted that SQL Server does not support Unicode-only SQL Server collations, either character-based or binary-based. For example, if we were to run the following SELECT statement:

SELECT name, description

FROM sys.fn_helpcollations()

WHERE COLLATIONPROPERTY(name, 'CodePage') = 0

AND name LIKE 'sql_%';

We would end up with an empty result set.

5. How do I set the collation at the server level?

You set the collation at the server level when you initially install SQL Server. The collation serves as the default collation for all system databases and user databases, unless a different collation is specified at the database or column level. You can assign any collation at the server instance, except Unicode-only collations.

If you’re uncertain what collation has been assigned to a SQL Server instance, you can use the SERVERPROPERTY system function to find out:

1	SELECT SERVERPROPERTY('collation');

On my system, the SELECT statement returns the collation SQL_Latin1_General_CP1_CI_AS, which was the default when I installed SQL Server. The SQL Server installation wizard used this collation because I live in the US and the underlying Windows operating system uses the English (United States) locale. Microsoft selects this collation as the default to maintain compatibility with earlier versions of SQL Server and with applications developed against those versions.

If you want to use a collation other than the default when installing SQL Server, be sure to change the collation on the Collation tab of the wizard’s Server Configuration screen.

You should try your best to get the server collation right when you install SQL Server because changing it after the fact is no small feat. You must take such steps as backing up the data, rebuilding the master database, recreating the user databases and all the objects in them, and importing the data into the newly created tables.

Fortunately, instead of changing the server collation, you can assign a different default collation to your user databases, and you can assign a specific collation to a character column.

6. How do I configure a database’s default collation?

When you define a user database, you can set the default collation as part of the database definition. If you don’t specify the collation, SQL Server uses the collation assigned to the server instance.

You cannot assign a collation to a system database, nor can you directly modify its collation. The only way you can change the collation on a system database is to change the server-level collation.

To specify a collation when creating a user database, you add a COLLATE clause to the database definition. The clause must point to one of the collations supported by your SQL Server instance. As with server-level collations, you cannot specify a Unicode-only collation. To test this out, let’s try to create a database using the Indic_General_100_CS_AS collation:

USE master;

DROP DATABASE IF EXISTS TestCollations;

CREATE DATABASE TestCollations

COLLATE Indic_General_100_CS_AS;

Because Indic_General_100_CS_AS is a Unicode-only collation, the database engine returns the following error message:

1	Collation 'Indic_General_100_CS_AS' is supported on Unicode data types only and cannot be set at the database or server level.

The issue is easily remedied by specifying a collation that is not Unicode-only:

USE master;

DROP DATABASE IF EXISTS TestCollations;

CREATE DATABASE TestCollations

COLLATE Latin1_General_100_CS_AI;

This time, the CREATE DATABASE statement will run and the specified collation will be assigned to the database. SQL Server will apply the collation to all metadata within that database. The collation also serves as the default for character columns, variable names, and other character data within the database.

We can confirm that the database was created using the specified collation by running the following query:

SELECT collation_name

FROM sys.databases

WHERE NAME = 'TestCollations';

As expected, the statement returns Latin1_General_100_CS_AS, which means that the database engine will enforce both case-sensitivity and accent-sensitivity when sorting and comparing character data.

Another way we can verify the collation is to use the DATABASEPROPERTYEX system function:

1	SELECT DATABASEPROPERTYEX('TestCollations', 'Collation');

The SELECT statement again returns the returns the Latin1_General_100_CS_AS collation.

We can also use the COLLATE clause in an ALTER DATABASE statement if we want to change the default collation:

ALTER DATABASE TestCollations

COLLATE Latin1_General_100_CI_AS;

When we change the database collation, the database engine updates the collation for all the character columns in the system tables, as well as existing character parameters and scalar values returned by stored procedures and user-defined functions. The database engine also applies the collation to new columns configured with character data types. However, SQL Server does not change the collations of existing character columns in user-defined tables.

7. How do I assign a collation to a column?

As with database definitions, you can add the COLLATE clause when defining a character column. In this way, you can apply a specific collation to the column’s data, without impacting the rest of the database. You can use the COLLATE clause in either a CREATE TABLE statement or ALTER TABLE statement. When you specify a collation on a column, you override the database’s collation, which is applied by default to all character columns.

For columns configured with the nchar, nvarchar, or ntext data type, you can assign any type of collation, including Unicode-only collations. For char, varchar, and text columns, you must use a collation that supports non-Unicode characters, which rules out Unicode-only collations.

To assign a collation to a column, you include the COLLATE clause within the column’s definition, as shown in the following example:

USE TestCollations;

DROP TABLE IF EXISTS LastNames;

CREATE TABLE LastNames(

NameID INT IDENTITY PRIMARY KEY,

LastName VARCHAR(50) COLLATE Traditional_Spanish_CI_AS NOT NULL);

INSERT INTO LastNames(LastName)

VALUES('Carrión'), ('Carrion'), ('carrión'), ('carrion'),

('Chaves'), ('Chavira'), ('Cúneo'), ('Cuneo'), ('cúneo');

In this case, we’re configuring the LastName column with the Traditional_Spanish_CI_AS collation, which is case-insensitive, but accent-sensitive. Now let’s query the character data:

1 2	SELECT * FROM LastNames ORDER BY LastName;

The SELECT statement returns the following results:

NameID	LastName
2	Carrion
4	carrion
3	carrión
1	Carrión
8	Cuneo
9	cúneo
7	Cúneo
5	Chaves
6	Chavira

Because of the assigned collation, the database engine treats uppercase and lowercase letters as the same, and treats accented and unaccented letters as different. But something else is going on here. The database engine also treats “c” and “ch” as separate letters, which is why Cúneo is listed before Chaves.

You can change a column’s collation by updating the table definition, unless the column is being referenced by an index, computed column, check constraint, foreign key constraint, or distribution statistics. With that in mind, let’s change the collation of the LastName column to Modern_Spanish_CI_AI, which is case-insensitive and accent-insensitive:

ALTER TABLE LastNames

ALTER COLUMN LastName VARCHAR(50)

COLLATE Modern_Spanish_CI_AI NOT NULL;

If we rerun the preceding SELECT statement, Chaves will now come before Cúneo, as shown in the following results:

NameID	LastName
1	Carrión
2	Carrion
3	carrión
4	carrion
5	Chaves
6	Chavira
7	Cúneo
8	Cuneo
9	cúneo

Notice that, even though the collation is case-insensitive and accent-insensitive, the database engine still lists the names with an initial capital letter first for each sorted group. You’ll find that the case-insensitive collations are not consistent in how they treat the first letter. Some will prioritize values with an initial capital letter, some will prioritize values with an initial lowercase letter, and some will not prioritize either.

At some point, you might want to change a column’s collation back to the database’s default collation. To do so, you can use the database_default keyword instead of the actual collation name (which is currently Latin1_General_100_CI_AS):

ALTER TABLE LastNames

ALTER COLUMN LastName VARCHAR(50)

COLLATE database_default NOT NULL;

The database_default keyword saves you the trouble of having to look up the database’s collation, should you not know it off the top of your head. And even if you do, the keyword is a lot easier to type. If we rerun our SELECT statement, the database engine now returns the following results:

NameID	LastName
4	carrion
2	Carrion
3	carrión
1	Carrión
5	Chaves
6	Chavira
8	Cuneo
9	cúneo
7	Cúneo

For the Latin1_General_100_CI_AS collation, the database engine lists the name that starts with a lowercase letter first in each sorted group (e.g., carrion then Carrion), which is the opposite of the Modern_Spanish_CI_AI collation. The Traditional_Spanish_CI_AS collation is inconsistent in this regard (e.g., Carrion then carrion vs. cúneo then Cúneo).

Despite the fact that the collations Latin1_General_100_CI_AS, Traditional_Spanish_CI_AS, and Modern_Spanish_CI_AI sort data differently, they are all based on code page 1252, which means that no matter which collation is specified, the characters all map to the same set of bit patterns.

If you want to view a column’s collation, you can query the sys.columns catalog view:

SELECT collation_name

FROM sys.columns

WHERE OBJECT_NAME(object_id) = 'LastNames'

AND name = 'LastName';

Because we had configured the column with the database’s default collation, the statement returns the Latin1_General_100_CI_AS collation.

8. How do I specify a collation when querying data?

You can also use the COLLATE clause directly in your queries. For example, suppose we’ve assigned the Latin1_General_100_CI_AS collation to our database and its columns, but we want to return the data sorted according to the rules of the Traditional_Spanish_CI_AS collation. We need only add the COLLATE clause to the SELECT statement’s ORDER BY clause:

1 2	SELECT * FROM LastNames ORDER BY LastName COLLATE Traditional_Spanish_CI_AS;

As the following results show, the names are now sorted according to the Traditional_Spanish_CI_AS collation, which is case-insensitive and accent-sensitive, with the “c” and “ch” treated as separate letters:

NameID	LastName
2	Carrion
4	carrion
3	carrión
1	Carrión
8	Cuneo
9	cúneo
7	Cúneo
5	Chaves
6	Chavira

Now suppose we want to use the following SELECT statement to retrieve a specific name from the table:

1 2	SELECT * FROM LastNames WHERE LastName = 'carrion';

Because the collation is case-insensitive, but accent-sensitive, the results include only the following two names:

NameID	LastName
2	Carrion
4	carrion

This time, let’s specify the Traditional_Spanish_CS_AI collation when we query the table:

1 2	SELECT * FROM LastNames WHERE LastName = 'carrion' COLLATE Traditional_Spanish_CS_AI;

Because the new collation is case-sensitive, but accent-insensitive, we now get these results:

NameID	LastName
3	carrión
4	carrion

By specifying the collation, we can better control how the data is compared and the results sorted. But be aware that using collations in your queries can impact performance, especially when they involve large data sets. For example, a query might require additional table scans, depending on the nature of the data.

9. How do I query columns with different collations?

You might run into situations in which you need to query data from multiple columns that use different collations. The columns might be in the same table, different tables in the same database or different databases. In such cases, you can use the COLLATE clause in your queries to resolve possible incompatibilities.

Let’s look at a few examples to demonstrate how this works. We’ll start by creating two tables in the TestCollations database (currently configured with the Latin1_General_100_CI_AS collation):

DROP TABLE IF EXISTS Emp1;

CREATE TABLE Emp1(

EmpID INT IDENTITY PRIMARY KEY,

AltID CHAR(5) NOT NULL,

JobTitle VARCHAR(25) NOT NULL);

INSERT INTO Emp1(AltID, JobTitle)

VALUES('ab101', 'sales rep'), ('cd102', 'CEO'),

('ef103', 'test engineer'), ('gh104', 'HR specialist');

DROP TABLE IF EXISTS Emp2;

CREATE TABLE Emp2(

EmpID INT IDENTITY PRIMARY KEY,

AltID CHAR(5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

LoginID VARCHAR(25) NOT NULL);

INSERT INTO Emp2(AltID, LoginID)

VALUES('ab101', 'jchaves'), ('cd102', 'jdoe'),

('ef103', 'rschmidt'), ('gh104', 'llee');

Notice that we’ve assigned the SQL_Latin1_General_CP1_CI_AS collation to the AltID column in the Emp2 table, but use the default database collation for everything else. Now let’s try to join the tables based on the two AltID columns:

SELECT e1.EmpID, e1.AltID, e1.JobTitle, e2.LoginID

FROM Emp1 e1 JOIN Emp2 e2

ON e1.AltID = e2.AltID

ORDER BY e1.EmpID;

Because the AltID columns are configured with different collations, the database engine returns the following error message:

1	Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_100_CI_AS" in the equal to operation.

To get around this, we can add the COLLATE clause to the join condition when we reference the AltID column in the Emp2 table:

SELECT e1.EmpID, e1.AltID, e1.JobTitle, e2.LoginID

FROM Emp1 e1 JOIN Emp2 e2 ON

e1.AltID = e2.AltID COLLATE Latin1_General_100_CI_AS

ORDER BY e1.EmpID;

Now the SELECT statement returns these results:

EmpID	AltID	JobTitle	LoginID
1	ab101	sales rep	jchaves
2	cd102	CEO	jdoe
3	ef103	test engineer	rschmidt
4	gh104	HR specialist	llee

Because the AltID column in the Emp1 table uses the database’s default collation, we can instead specify the database_default keyword in our COLLATE clause, rather than the name of the collation:

SELECT e1.EmpID, e1.AltID, e1.JobTitle, e2.LoginID

FROM Emp1 e1 JOIN Emp2 e2 ON

e1.AltID = e2.AltID COLLATE database_default

ORDER BY e1.EmpID;

The SELECT statement returns the same results as the preceding statement, as does the next SELECT statement:

SELECT e1.EmpID, e1.AltID, e1.JobTitle, e2.LoginID

FROM Emp1 e1 JOIN Emp2 e2 ON

e1.AltID COLLATE SQL_Latin1_General_CP1_CI_AS = e2.AltID

ORDER BY e1.EmpID;

This time, we’ve added the COLLATE clause to the first AltID reference, which achieves the same results, that is, to join the tables based on the AltID columns.

Now let’s look at what happens when we try to concatenate the LoginID and AltID columns in the Emp2 table:

1 2	SELECT EmpID, (LoginID + AltID) NewAltID FROM Emp2 ORDER BY EmpID;

As with our join, we’re working with two character columns configured with different collations. Not surprisingly, the database engine returns the following error message:

1	Implicit conversion of varchar value to varchar cannot be performed because the collation of the value is unresolved due to a collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_100_CI_AS" in add operator.

Again, the COLLATE clause comes to the rescue:

1 2	SELECT EmpID, (LoginID + AltID COLLATE database_default) NewAltID FROM Emp2 ORDER BY EmpID;

The SELECT statement now returns the results we’re looking for:

EmpID	NewEmpID
1	jchavesab101
2	jdoecd102
3	rschmidtef103
4	lleegh104

Again, being able to use the COLLATE clause in your queries is a handy tool, but keep in mind that the database engine has to work harder to make this happen.

10. How do I create a foreign key on columns with different collations?

Just like you cannot change the collation on a column that is being referenced by a foreign key, you cannot create a foreign key between two columns with different collations. To test this out, let’s update the Emp1 table we created above by adding a unique constraint to the AltID column so it can be referenced by a foreign key:

1 2	ALTER TABLE Emp1 ADD CONSTRAINT uqAltID UNIQUE(AltID);

Now let’s try to add a foreign key constraint to the AltID column in the Emp2 table, referencing the AltID column in Emp1 table:

ALTER TABLE Emp2

WITH CHECK ADD CONSTRAINT fkAltID FOREIGN KEY(AltID)

REFERENCES Emp1(AltID);

Although both AltID columns are configured with the char(5) data type, when we try to run the ALTER TABLE statement, the database engine returns the following error message:

1	Column 'Emp1.AltID' is not of same collation as referencing column 'Emp2.AltID' in foreign key 'fkAltID'.

One way to get around this is to alter the AltID column in Emp2 table by changing the default collation to the same as the AltID column in Emp1 table:

1 2	ALTER TABLE Emp2 ALTER COLUMN AltID CHAR(5) COLLATE database_default NOT NULL;

In some cases, however, you will not be able to change the referencing column’s collation. Another approach you might consider is to add a persisted computed column to the referenced table. For example, the following ALTER TABLE statement adds the AltID2 computed column to the Emp1 table:

1 2	ALTER TABLE Emp1 ADD AltID2 AS AltID COLLATE SQL_Latin1_General_CP1_CI_AS PERSISTED;

The computed column contains the same data as AltID but specifies the SQL_Latin1_General_CP1_CI_AS collation. Next, we define a unique constraint on the AltID2 column so it can be referenced by a foreign key:

1 2	ALTER TABLE Emp1 ADD CONSTRAINT uqAltID UNIQUE(AltID2);

Finally, we add a foreign key constraint to the AltID column in the Emp2 table, pointing to the AltID2 column in the Emp1 table:

ALTER TABLE Emp2

WITH CHECK ADD CONSTRAINT fkAltID FOREIGN KEY(AltID)

REFERENCES Emp1(AltID2);

Although this will do the trick, adding a persisted computed column with a unique index adds overhead, so be sure this alternative won’t have too serious an impact on your transactional workloads.

11. How do collations affect temporary tables?

If your SQL Server environment uses the same collation for the server, databases, and columns, then temporary tables present no problem in terms of collations. All the character columns throughout the database adhere to the same rules when performing sorts, comparisons, joins, concatenations, or other operations.

However, if you use different collations for your databases and columns, you might run into unexpected results. For example, suppose we create and populate the following temporary table within a database configured with the Latin1_General_100_CI_AS collation:

CREATE TABLE #Emp3(

AltID CHAR(5) NOT NULL,

LoginID VARCHAR(25) NOT NULL);

INSERT INTO #Emp3

VALUES('ab101', 'jchaves'), ('cd102', 'jdoe'),

('ef103', 'rschmidt'), ('gh104', 'llee');

We might expect that the AltID column will use the same collation as the database, given that no collation has been specified, but that’s not the case. Because temporary tables are stored in the tempdb database, they are assigned the same collation as the server, no matter what the database collation might be.

We can test this scenario by running the following query:

SELECT e1.EmpID, e1.AltID, e1.JobTitle, e3.LoginID

FROM Emp1 e1 JOIN #Emp3 e3

ON e1.AltID = e3.AltID

ORDER BY e1.EmpID;

In this case, we’re joining the Emp1 and #Emp3 tables based on their AltID columns. When we run this statement, the database engine returns the following error message:

1	Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_100_CI_AS" in the equal to operation.

Fortunately, we can get around this issue by including a COLLATE clause in our query, as we saw in earlier examples:

SELECT e1.EmpID, e1.AltID, e1.JobTitle, e3.LoginID

FROM Emp1 e1 JOIN #Emp3 e3

ON e1.AltID = e3.AltID COLLATE database_default

ORDER BY e1.EmpID;

Now the SELECT statement runs with no problem and returns the results shown in the following table:

EmpID	AltID	JobTitle	LoginID
1	ab101	sales rep	jchaves
2	cd102	CEO	jdoe
3	ef103	test engineer	rschmidt
4	gh104	HR specialist	llee

Another approach we can take is to create our temporary table using the same collation as that of the database:

CREATE TABLE #Emp3(

AltID CHAR(5) COLLATE database_default NOT NULL,

LoginID VARCHAR(25) NOT NULL);

INSERT INTO #Emp3

VALUES('ab101', 'jchaves'), ('cd102', 'jdoe'),

('ef103', 'rschmidt'), ('gh104', 'llee');

As with other table definitions, we simply include the COLLATE clause in the column definition, allowing us to create a join without having to deal with collation differences:

SELECT e1.EmpID, e1.AltID, e1.JobTitle, e3.LoginID

FROM Emp1 e1 JOIN #Emp3 e3

ON e1.AltID = e3.AltID

ORDER BY e1.EmpID;

When working with different collations, any objects stored in the tempdb database can cause issues. Not only temporary tables, but also variables, temporary stored procedures, and GOTO labels.

12. How do I know what collations to use?

The ideal solution is to choose a collation when setting up SQL Server that can be used for all your user databases and character columns. Using one collation removes any issues you might encounter when querying the data in different ways. It can also be the best approach in terms of performance if multiple collations impact your queries. However, this approach works only if the same language and collation settings are appropriate for all your users and applications—or at least a good majority of them.

If you support multi-cultural environments, you’ll need to take into account a number of considerations. To begin with, you should pick collations that support the most users, and you should use Unicode data types where possible because they can help avoid code page conversion issues. Just keep in mind the storage requirements that come with Unicode’s two bytes per character.

When you implement Unicode data types, you have the option of using Unicode-only collations for your columns and queries. This type of collation can be useful when you’re supporting client applications installed on older operating systems or when the applications or client drivers don’t recognize the new Windows collations. Because Unicode-only collations don’t use code pages, the applications won’t try to map to incorrect bit patterns or adhere to conflicting rules. Instead, both the client and server treat all character data according to the Unicode standards.

However, using Unicode data types with Unicode-only collations is not always an option, in which case, you might need to select your collations based on your client applications or their environments. For example, you might choose SQL Server collations because they can support the legacy systems. Depending on the circumstances, a good strategy might be to assign a SQL Server collation to the server or at least to the user databases, as applicable.

Of course, you should upgrade the client operating system, application, or provider, when possible, but that might not always be a practical solution.

If you’re developing an entirely new data-driven application, you should stick with Windows collations, implementing Unicode data types and Unicode-only collations if and when they make sense. But be sure to keep performance and maintenance in mind, especially when you’re mixing Unicode and non-Unicode data (or single-byte and double-byte data) or when your queries result in code page conversions between the client and server.

The bottom line is that you must keep your client applications and users at the forefront of your thinking when choosing collations. You should also strive to keep things as simple as possible. The more collations you assign to a database, the harder the database engine might have to work. For large data sets, you should test various scenarios to determine their impact on performance, maintenance, and storage, and be sure to look at where code page conversions might be occurring and where conflicts could arise.

Questions About SQL Server Collations You Were Too Shy to Ask

1. What is a SQL Server collation?

2. How do I find the collations supported on a SQL Server instance?

3. How do I identify Unicode-only collations?

4. What are the differences between Windows collations and SQL Server collations?

5. How do I set the collation at the server level?

6. How do I configure a database’s default collation?

7. How do I assign a collation to a column?

8. How do I specify a collation when querying data?

9. How do I query columns with different collations?

10. How do I create a foreign key on columns with different collations?

11. How do collations affect temporary tables?

12. How do I know what collations to use?

Subscribe for more articles

Rate this article

Robert Sheldon

Related articles

Questions About HIPAA That You Were Too Shy to Ask

Snake draft sorting in SQL Server, part 2

Snake draft sorting in SQL Server, part 1

Tags