Beware of Mixing Collations - Part 2
Converting Collations
By Gregory A. Larsen
With SQL
Server 2000 you are able to create databases or columns with a different
collating setting than the server (discussed in Part 1: http://www.sqlservercentral.com/columnists/glarsen/collate_part1.asp).
This is especially useful when you are trying to restore from a SQL Server 7.0
database backup, where you are not sure of the collating sequence for the SQL
Server 7.0 machine. If after your
restore you desire to change the collation sequence, then SQL Server 2000 will
allow this. Depending on which
collation settings you want to change will determine the method, and how much
effort it will take to pull off the collation change. This article will explain
a couple of methods I have used to successfully change collation settings.
There are
three distinct collation settings. The
first one is the server collation setting that is initially set when you
install SQL Server. There is the
database collation setting, which defaults to the server collation setting, or
can be specifically defined to be different than the server collation setting
when creating a database. Lastly there
are collation settings for columns, that by default take the database collation
setting, or if you wish you can specify a collation setting when creating a
column.
Changing
the collation setting for a server is quite complex, so I will save that for
the end. Although saying that doesn’t
mean that changing the collation setting for a database and all the character
columns in the database is a piece of cake either. Lets first look at how you might change the collation setting for
a given column.
Changing a Columns Collation Setting
To alter the collation setting for a given column you use
the ALTER TABLE statement. Columns that
are referenced in a foreign key, check constraint, index, and/or have
distributed statistic cannot be altered without first dropping the column
constraints.
Let’s go through a couple of examples. The first example will change the collation
of a varchar column from Danish_Norwegian_CS_AS to Latin1_General_CI_AS. This example uses the “ALTER COLUMN” clause
of the “ALTER TABLE” statement to make the column collation conversion.
|
CREATE TABLE Test_tbl
(Record_id int PRIMARY KEY,
Name varchar(50) COLLATE
Danish_Norwegian_CS_AS NOT NULL
)
GO
select * from information_schema.columns
go
--
-- Here is where the collation setting is changed
--
ALTER TABLE Test_tbl ALTER COLUMN Name
varchar(50)
COLLATE Latin1_General_CI_AS NOT NULL
GO
select * from information_schema.columns
|
If
you run the above code then review the output displayed from the
information_schema.columns view, you can see that the collation setting for the
column NAME was changed from Danish_Norwegian_CS_AS to Latin1_General_CI_AS.
Now
lets modify the above example slightly, by making the NAME column in table
Test_tbl the primary key. By doing this
you now will not be able to alter the NAME columns collation setting without
first dropping the primary key. As you
can see in the example below, I altered the table to drop the primary key
constraint, then altered the column collation, and finally re-created the
dropped primary key constraint.
|
CREATE TABLE Test_tbl
(Record_id int ,
Name varchar(50) COLLATE
Danish_Norwegian_CS_AS NOT NULL
constraint pk_name primary key
)
GO
select * from information_schema.columns where table_name
= 'Test_tbl'
go
--
-- Drop the primary key constraint
--
alter table Test_tbl Drop constraint pk_name
--
-- Here is where the collation setting is changed
--
ALTER TABLE test_tbl ALTER COLUMN Name
varchar(50)COLLATE Latin1_General_CI_AS NOT NULL
--
-- Re-create the primary key constraint
--
alter table Test_tbl add constraint pk_name primary key
(name)
GO
select * from information_schema.columns where table_name
= 'Test_tbl'
drop table test_TBL
|
Now
imagine a database that contained a large number of tables and a fair number of
constraints (foreign key, check, and index).
The process of identify all the constraints, then building the script to
drop the constraints, altering each character column collation and the
re-creating the constraints sounds like a lot of work, right? Well it is!
It can be done, but it will be a time consuming, labor-intensive
process. When using this method you
might be in for a few small bumps in the road, as you try to ALTER COLUMNS that
are still referenced by constraints.
I’ve found
this process to be too frustrating for most cases, except when I only have a
few columns that need their collation settings changed. If you have a large number of character
fields then you might want to read the section below, on migrating a database
and columns to a new database with a new collation settings.
Changing
the Database Collation Setting
Each
database has a default collation setting.
For each new character field created, that does not have a COLLATE
clause, this default setting is used.
If you desire to change the default collation setting for a database,
you will need to use the ALTER DATABASE command. Here is an example that sets the default collations of the
SAMPLE_DB database to French_CI_AS collation:
|
ALTER DATABASE SAMPLE_DB COLLATE French_CI_AS
|
Be aware that changing the default collation of a database does not change the collation settings of any columns in existing tables. The ALTER COLUMN clause of the alter table statement will need to be used to alter the collation settings for any existing columns in user-defined tables, as described above.
Changing
the Server Collation Setting
Changing
the collation settings isn’t an easy task.
There is no ALTER SERVER command to help change the collation setting
for the server. In order to change the
server collation you will need to rebuild the server.
This means
you will need to export all the data in the existing databases. Gather up and/or generate all the scripts to
recreate the user databases and objects in those user databases. Drop all the user databases and rebuild the
MASTER database. When you rebuild the
MASTER database you will specify your new collation setting.
Note that
if you are generating table create scripts from a SQL Server 2000 database then
they might contain COLLATE clauses.
Prior to executing generated scripts scan your CREATE TABLE and DATABASE
scripts to verify that there are no COLLATE clauses. If you don’t do this then potentially your newly migrated columns
might end up with their original collation settings.
Once you
have rebuilt the MASTER database then you can re-create all the user
databases. Each object in each user
database will need to be re-created with the collation setting of the
database. If your create table
statements do not have any COLLATE clauses then all columns will be given the
default collation setting for the database.
Once all your tables are created you will need to import the data using
the files exported from the database that contained the wrong collation
settings.
Once again
not a simple task, but it can be done.
This is why choosing your collation setting up front is very important.
Alternative
Method for Migrating Database and All Columns Within to a New Collation
Now
if you need to change all the collation settings in a database, then you might
consider migrating your data to a new database that has the desired collation
setting. This typically is how I migrate
from one collating sequence to another.
I
accomplish this by performing the following multi-step process. This process can be done in a number of
different ways, I will walk you though the steps I normally use.
Step 1: Create
a new database with the desired default collation setting.
Step 2:
Generate all the CREATE TABLE statements from the database that has the wrong
collation settings. Only generate the
create table statements. Don’t generate
any constraints. If your scripts where
generated from a SQL Server 2000 database remember to review your scripts for
COLLATE clauses (see note above).
Step
3: DTS the data from the database that
has the wrong collation settings, to your newly created database.
Step
4: Generate all the constraints from
the database with the wrong collation settings. Apply the generated constraints to your newly created and
populated database. If you don’t have
clean data you might have to apply some of the constraints with the NOCHECK
option.
Step 5:
Generate the scripts to create all views, stored procedures, roles, user and/or
any other objects you might need from the database with the wrong collation
settings and execute the generated script on the new database.
Step 6:
Backup your newly created database and restore it over the top of the database
with the wrong collation. Of course
only do this if you want to replace your original database with your new
database, which has a new collation setting.
This
method is not simple, and is still error prone. You need to take care when generating scripts that nothing is
missed and the scripts generated build things in the proper order.
I’ve found
that SQL Server tends to generate things in alphabetical order, which is not
always an appropriate order. I’ve
resorted to using a third party tool “SQL COMPARE” from Red Gate Software for
generating my scripts. I also used this
tool to compare both old and new databases to make sure no objects were missed
when migrating objects from the database with the wrong collation settings to
the new database.
Even
though this method is time consuming I’ve found that it is a reliable method to
convert from one collation to another.
I’m sure there might be other methods out there for converting collation
settings.
Conclusion
Converting
from one collation setting to another is not easy. There is no simple tool to perform this transformation. In order to minimize collation issues it is
best to require all server databases to have the same collation settings, even
if it requires you to migrate your data from one collation setting to another.