Blog Post

Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?

,

Demystifying What “sqlservr -q” Actually Does

(last updated: 2019-01-07 @ 02:25 EST / 2019-01-07 @ 07:45 UTC )

For various reasons, people sometimes find themselves in the unfortunate (and unenviable) situation of having an Instance of SQL Server configured with the wrong Collation. This can often lead to unexpected errors and/or sorting and comparison behavior.

People sometimes try to “fix” this problem by adding the COLLATE DATABASE_DEFAULT to string columns of temporary tables, and/or COLLATE {collation_name} to WHERE / JOIN predicates. While this might work in some situations, it is not a true fix since it does not address areas controlled by the Instance-level Collation: resolving names of variables, cursors, and GOTO labels. It also does not affect the behavior of any Microsoft-provided feature that is contained within msdb and might use the system Collation (possibly including SQL Server Agent, Database Mail, Central Management Server (CMS), Maintenance Plans, Policy Management, etc).

For situations where adding the COLLATE {collation_name} keyword is inadequate or undesirable, there is the option of changing the Instance-level Collation. There are two ways to accomplish this: one “official” and the other “unofficial”. But first, before seeing how to change an Instance’s Collation, it is important to understand exactly what the total impact of that change could be.

General Concepts

The following should be a mostly-complete list of affected areas. It assumes that the Collation will be changing across all levels (Instance, Database, and column) since one of the two methods for changing the Instance-level Collation will change all levels.

  1. Simplistically, changing any Collation, even to the same Collation of a newer version (e.g. Latin1_General_CI_AS to Latin1_General_100_CI_AS) can easily change sort orders as well as what equates to what:
    1. Since the Database default Collations are changing, IF / WHILE / etc conditions for variable and input parameters can change behavior. For example, consider the following code:

      IF (@Variable = N'some string')

      Passing in yes for @Variable would evaluate to True in a case-insensitive Database. But if the Database’s Collation changes to be binary or case-sensitive, then that same value would evaluate to False.

    2. Since the Instance default Collation is changing, this impacts resolution of names of variables / parameters, cursors, and GOTO labels. For example, consider the following code:

      DECLARE @Var INT;

      SET @var = 1;

      This will work just fine on an Instance having a case-insensitive Collation. But, if that Instance changes to either a binary or case-sensitive Collation, then that code will break due to the difference between @Var and @var.

    3. JOINs and WHERE predicates can change behavior based on columns having a new Collation:
      • More rows might match when moving from a binary or case-sensitive Collation to a case-insensitive Collation. This could result in a Cartesian product.
      • Conversely, fewer rows than before might match when moving from a case-insensitive Collation to either a binary or case-sensitive Collation. This could result in missing rows.
    4. GROUP BY and DISTINCT behavior could change in the same manner.
    5. FOREIGN KEYS: Rows that equated to the Primary Key reference due to using a Collation with one or more insensitivities (case, accent, etc.) might no longer equate to the PK when moving to a binary or case-sensitive Collation.
    6. ORDER BY behavior could change.
    7. CHECK CONSTRAINT behavior.
  2. Data-loss potential for non-Unicode string columns: these rely upon the Code Page used by the Collation of the column to determine which characters can be stored in the column. Data loss can occur only if all 3 of the following conditions are true:
    1. Data is stored in a column using a non-Unicode type: CHAR / VARCHAR / TEXT (FYI: the TEXT type has been deprecated starting with the release of SQL Server 2005, so don’t use it, but it might exist in some places)
    2. Characters with underlying values of 128 (0x80) – 255 (0xFF) are being used.
    3. The old and new Collations use different Code Pages. It does not matter if the old and new Collations use different LCIDs (i.e. Locales); only a change in the column’s Code Page matters. For example, the following cultures (and several others) all use Code Page 1252, and hence switching between them will not be a Code Page conversion: French, Finnish_Swedish, German, Latin1_General, Spanish, Norwegian, etc.

    Even if all three conditions are met, that does not guarantee that there will be data loss. Many Code Pages have many of the same characters in the 0x80 – 0xFF range. Whether or not there will be data loss depends on the specific characters being used, what the new Code Page supports, and which method of updating the Collations is being used:

    • The documented method does an actual conversion: if switching to a Collation that uses a different Code Page than the current one, it is possible that some characters might not be available in the new Code Page and would be converted to either a “best fit” mapping, if one can be found, or else to a “?”. But if the characters are available, then the underlying byte value will be changed if necessary.
    • The undocumented method simply changes the Collation: the bytes of the source data will remain the same, but what characters they represent might change if the new Code Page has a different character with that same underlying byte value.
  3. Similar to #2, if a Database’s new Collation uses a different Code Page than the previous Collation, then non-Unicode string literals in that Database (i.e. those not prefixed with an upper-case “N”) can have one or more characters changed to either a best-fit mapping or into a “?” if not available in the Code Page of the new Database default Collation. This can be PRINT statements, RAISERROR messages, literals used for INSERT or UPDATE statements. PLEASE NOTE: this conversion will take place during parsing and hence will not be visible to you. Please see the following two part series for a full explanation: “Which Collation is Used to Convert NVARCHAR to VARCHAR in a WHERE Condition?” (Part A of 2: “Duck”) and (Part B of 2: “Rabbit”).
  4. All Indexes containing string-type columns in key fields (INCLUDE columns shouldn’t matter unless one or more of them are non-Unicode and the old and new Collations use different Code Pages) need to be fully rebuilt (or more likely dropped and recreated):
    1. Their ordering might have changed
    2. If using a filter expression (and this applies equally to filtered statistics) that filters on a string column, then the rows that have been included / excluded from the index (or statistic) might change!
    3. For non-Unicode string columns, if the old and new Collations use different Code Pages, then it is possible that characters might change (especially if their underlying value is between 0x80 (128) and 0xFF (255)). See item #2 above regarding data-loss.
  5. Potential breaking of code: similar to #1, but here the code will actually break (not just silently behave differently) in any of the following situations if a column in a User Table (i.e. Collation will be changing) is used in combination with a column from a System Table that is NOT based on the Instance default Collation or the Database default Collation (some Collations are hard-coded and are the same between all systems) such that the Collations between the columns is no longer the same:
    • JOIN or WHERE predicate
    • String concatenation (e.g. column1 + column2 )
    • UNION / UNION ALL
    • COALESCE (but ISNULL is fine)
    • CASE statements returning the columns in question
    • CONCAT function
    • possibly some other situations
  6. Some columns might not be desirable to change. Some (perhaps many) applications use the same Collation for all string columns. But, for any columns that are set to a different Collation for a specific reason, it might not be desirable to change that to the same Collation as all other columns. On the other hand, it might be less work to change a few columns back to non-standard Collations than it would be to change most of them manually. At the very least you need to do an audit to make sure that you know where all of your “special” columns with differing Collations are, and what Collation they are using so that they can be set back to that after the mass-update.
  7. System objects might be in conflict and/or code might break if object names can no longer be resolved, or resolve to duplicates. If using inconsistent casing between object definition and object reference in code (e.g. Table name = “Customers”, Table reference in stored procedures / functions / views = “customers”) and moving to a case-sensitive or binary Collation, that code will fail. If currently using a binary or case-sensitive Collation and having objects named both “Customers” and “customers”, moving to a case-insensitive Collation will fail on a unique constraint violation for what is the internal table holding what we see in sys.objects.
  8. What about Full Text Search? Not sure if that is impacted or not, and if so, how…

Methods

IMPORTANT: Before making any changes to your system, be sure to have a complete (and working) set of backups, just in case something goes wrong. “Working” here means that you have tested them by restoring them (somewhere).

Documented Approach

Below is a general overview of this approach that only mentions the parts that actually change the Collation of some part of the system. This is not a full, step-by-step guide. Please see the “Set or Change the Server Collation” link below for additional details.

Rebuild System Databases

Do this to change the Instance-level default Collation, as well as the Collation of the system Databases: master, model, msdb, and tempdb (which is just a copy of model).

Note: the ^ character is for line-continuation. Otherwise, hitting Enter executes the command.

SETUP.EXE /QUIET /ACTION=REBUILDDATABASE ^
/INSTANCENAME=InstanceName ^
/SQLCOLLATION=CollationName ^
/SQLSYSADMINACCOUNTS=accounts  [ ^
/OptionalSwitches ]
 

PLEASE BE AWARE of the following warning in the documentation:

RebuildDatabase scenario deletes system databases and installs them again in clean state. Because the setting of tempdb file count does not persist, the value of number of tempdb files is not known during setup.

This step does not do anything more than change the Collation of the four system Databases, Instance-level meta-data, and the Instance itself. User Databases, as well as the string columns of the User Tables within them, are ignored.

You can find SETUP.EXE in the C:\Program Files\Microsoft SQL Server\{INT_version_number}\Setup Bootstrap\{product_common_name}\ folder. Some examples are:

  • C:\Program Files\Microsoft SQL Server\110\Setup Bootstrap\SQLServer2012
  • C:\Program Files\Microsoft SQL Server\120\Setup Bootstrap\SQLServer2014
  • C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\SQLServer2016
  • C:\Program Files\Microsoft SQL Server\140\Setup Bootstrap\SQL2017
  • C:\Program Files\Microsoft SQL Server\150\Setup Bootstrap\SQL2019CTP2.1

Change Database-level Collation of User Databases

Do this to change the Database’s default Collation, as well as the Collation of Database-level meta-data:

ALTER DATABASE { database_name | CURRENT } COLLATE {new_collation_name} ;

For example:

ALTER DATABASE [TestDB] COLLATE Latin1_General_100_CI_AS_SC;

or:

ALTER DATABASE CURRENT COLLATE Latin1_General_100_CI_AS_SC;

PLEASE NOTE: You cannot change the Collation for any of the system Databases. Attempting to do so will result in the following error:

Msg 3708, Level 16, State 5, Line 271

Cannot alter the database ‘model’ because it is a system database.

Also, there are certain conditions which can prevent this command from completing. For example, if there are any objects that were created with the SCHEMABINDING option and that use the Database’s default Collation. For more details, please see the documentation for “ALTER DATABASE: Changing the Database Collation“.

This step does not do anything more than change the Collation of Database-level meta-data, and the Database itself. String columns of the User Tables within the Database are ignored.

Change Collation of Columns

Do this to change a column’s Collation:

ALTER TABLE { schema_name }.{ table_name }
   ALTER COLUMN { column_name } { datatype }
   COLLATE { new_collation_name }
   { [ NOT ] NULL } ;

If you do not specify the current NULL / NOT NULL setting, it might change! The default is usually NULL.

Certain conditions may prevent you from being able to alter the column. For example, if you have a Check Constraint defined that references that column, you will get the following error:

Msg 5074, Level 16, State 1, Line XXXXX

The object ‘CK_test’ is dependent on column ‘col1’.

Msg 4922, Level 16, State 9, Line XXXXX

ALTER TABLE ALTER COLUMN col1 failed because one or more objects access this column.

You should also get an error if either a Primary Key or Foreign Key exists that references the column. However, having a Default Constraint on the column does not cause an error.

Undocumented Approach

How to Do it

The following approach has some advantages over the documented approach described above, especially being one step instead of three, but it is undocumented and hence unsupported, so if you run into any problems, Microsoft is not likely to help, nor will they fix any buggy or unexpected behavior. The -q switch of SQLSERVR.EXE is not found in the Microsoft documentation, nor is it listed when passing in the -? switch to get the help info. However, it can be used as follows:

sqlservr -c -m -T4022 -T3659 -s"{instance_name}" -q"{new_collation_name}"

For example:

sqlservr -c -m -T4022 -T3659 -s"CHANGECOLLATION" -q"Estonian_100_CS_AS_SC"

You must execute this command in an “Administrator” Command Prompt, not a regular Command Prompt. Attempting to do this in a regular Command Prompt will result in the following error messages:

2018-06-10 16:08:19.06 Server Error: 17058, Severity: 16, State: 1.

2018-06-10 16:08:19.06 Server initerrlog: Could not open error log file ”. Operating system error = 3(The system cannot find the path specified.).

… {same error repeated 9 more times)

2018-06-10 16:08:22.08 Server SQL Server shutdown has been initiated

You will also need to make sure that the Instance is not currently running before executing that command-line. You can run the following in the Command Prompt:

NET STOP MSSQL${InstanceName}

Trace Flag 4022 instructs SQL Server to not execute any Stored Procedures marked as “execute at startup”.

Trace Flag 3659 instructs SQL Server to log all errors to the error log during server startup.

Once it completes, you should see the following in the Command Prompt window:

2018-06-10 16:10:45.97 spid6s   The default collation was successfully
                                changed.
2018-06-10 16:10:46.12 spid6s   Recovery is complete. This is an
                                informational message only. No user action
                                is required.

However, the process is still running and does not self-terminate.

Hit Control-C.

You will then see:

Do you wish to shutdown SQL Server (Y/N)?

Hit y.

You should then see the following:

2018-06-10 16:10:56.13 spid6s   SQL Server shutdown due to Ctrl-C or
                                Ctrl-Break signal. This is an informational
                                message only. No user action is required.
2018-06-10 16:10:56.13 spid6s   SQL Server shutdown has been initiated
2018-06-10 16:10:56.13 spid6s   SQL Trace was stopped due to server
                                shutdown. Trace ID = '1'. This is an
                                informational message only; no user action
                                is required.

Now you can start the Instance again. You can run the following in the Command Prompt:

NET START MSSQL${InstanceName}

You can find sqlservr.exe in the Binn folder within the main directory for the Instance (not the SQL Server main directory for that version of SQL Server). If you need help finding it, just run the following from a Command Prompt (does not need to be an Administrator Command Prompt):

REG QUERY "HKLM\SOFTWARE\Microsoft\Microsoft SQL Server" ^
 /f "Setup" /s /v "SQLBinRoot" /k /e | FINDSTR SQLBinRoot
 

You will get back entries such as:

... C:\Program Files\Microsoft SQL Server\110\LocalDB\Binn\

... C:\Program Files\Microsoft SQL Server\120\LocalDB\Binn\

... C:\Program Files\Microsoft SQL Server\130\LocalDB\Binn\

... c:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Binn

... c:\Program Files\Microsoft SQL Server\MSSQL14.SQL2017EXPRESS\MSSQL\Binn

... C:\Program Files\Microsoft SQL Server\140\LocalDB\Binn\

... C:\Program Files\Microsoft SQL Server\MSSQL15.SQL2019\MSSQL\Binn

What it Does

  1. Converts one Database at a time (appears to be in database_id order):
    1. master
    2. tempdb
    3. model
    4. msdb
    5. User DBs

       

  2. Will roll back a DB if it does not complete fully, but DBs that have completed will remain converted if an error occurs. This behavior holds true for the system Database’s also. This could lead to the process leaving the Instance with inconsistent Collations for the system DBs if they do not all complete successfully. Meaning, if less than all 4 system DBs complete successfully, there will be a mismatch between the system DBs. And if they do complete successfully but a User DB fails, then the system DBs won’t match the Instance-level Collation. This is not terribly bad, though, since you can fix the problem and restart the operation.

     

  3. Conversion bypasses restrictions imposed on documented method (ALTER DATABASE, and ALTER TABLE…ALTER COLUMN). The following do not cause an error with this method:
    1. Schema-bound objects
    2. Check constraints that use the Database’s Collation
    3. Computed columns that use the Database’s Collation
    4. Table-Valued Functions (TVFs) that pass back string columns that did not specify the COLLATE keyword.
    5. Indexes on string columns

       

  4. Indexes containing string columns are dropped and recreated. If the Clustered Index contains at least one string column, then all Indexes on the Table are dropped and recreated (even if they do not contain any string columns).

     

  5. Conversion bypasses checks meant to prevent leaving the data in an invalid state:
    1. NTEXT columns do not error when being set to a Collation that supports Supplementary Characters. This leaves the column as effectively read-only: you can select from it, but attempting to modify it will get the following error:

      Msg 4189, Level 16, State 0, Line XXXXX

      Cannot convert to text/ntext or collate to ‘{collation_name}’ because these legacy LOB types do not support the Unicode supplementary characters whose codepoints are U+10000 or greater. Use types varchar(max), nvarchar(max), or a collation which does not have the _SC flag.

      However, it is easy enough to manually change the Collation of these columns to one that is not Supplementary Character Aware.

    2. Foreign Key rows that referenced a PK value due to relying upon an insensitivity (e.g. “Y = y” due to case-insensitivity) might no longer reference any PK value. There is no constraint verification, so the FK will be left as “enabled” and “trusted” (assuming it was both “enabled” and “trusted” prior to the operation), yet these same values can no longer be added due to now getting the expected FK violation error.

       

  6. Data loss can occur in VARCHAR / CHAR / TEXT columns if the new Collation uses a different Code Page than was being used previously, and the new Code Page does not have the same character having the same numeric value. This is a different type of data loss than what happens with the other method. The documented method does an actual Code Page conversion, which will both (a) adjust the byte value to a different one for characters that exist in both Code Pages but with different underlying values, and (b) attempt to find a similar looking character, known as a “best fit” match, if a “best fit” mapping exists. This undocumented method does neither of those character conversions; no attempt is made to maintain consistency of the character / glyph itself. The underyling byte values remain the same, but the character that they map to might change between Code Pages. For example, in Code Page 1252 (Latin1_General), byte value 0xC6 equates to “Æ” (Latin Capital Letter AE), but using this method to change to Code Page 1257 (Baltic Rim), the data would show “Ę” (Latin Capital Letter E with Ogonek) because that is what 0xC6 maps to in that Code Page, even though Code Page 1257 contains “Æ”, but having an underlying value of 0xAF instead of 0xC6. A true Code Page conversion would have changed the underlying byte value from 0xC6 to 0xAF so that the data would have shown the same character, “Æ”, after the operation.

    This alone indicates that the operation is not doing any actual string conversions, but instead is merely updating the meta-data for string columns directly to the system catalog tables.

    (please also see related item #15 below regarding In-Memory OLTP)

     

  7. User-Defined Table Types (UDTTs) are completely ignored! Collations for string columns of UDTTs, whether explicitly set using the COLLATE keyword in the CREATE TYPE statement or not (hence using the Database’s default Collation) are not changed by this process. It is unclear whether this behavior is intentional (i.e. by design) or an oversight (i.e. nobody has looked as this code since it was first introduced, prior to the existence of UDTTs), but at the very least it is inconsistent with the “change-everything” approach of this process. If you need the Collations of the string columns in UDTTs to be the same as the Collation you are updating everything else to, then you will need to manually drop and recreate the UDTTs (there is no ALTER TYPE).

     

  8. The result set meta-data for User-Defined Table-Valued Functions (TVFs) is updated to the new Collation, even if the column’s Collation is being explicitly set with the COLLATE keyword. This happens for both Multi-Statement TVFs and Inline TVFs, though the effect of this change is different between them:
    • Multi-Statement TVFs use the Collation as recorded in the meta-data. Even if a column is using the COLLATE keyword in the RETURNS @{table_name} TABLE () definition in the CREATE FUNCTION statement to explicitly set the Collation, the Collation shown in sys.columns is what will be used. In this case, the behavior of the function will be inconsistent with its definition, and the output might be incorrect (especially for CHAR / VARCHAR fields).
    • Inline TVFs use the Collation as recorded in the definition of the function or source table. Regardless of the Collation shown in sys.columns, a Collation that is either explicitly set using the COLLATE keyword in the SELECT statement in the function, or specified for a column when selecting from a table, is what will be used. In this case, the behavior of the function will be inconsistent with its meta-data, but the output should be correct.

    In both cases, executing sys.sp_refreshsqlmodule for the TVF will correct the meta-data shown in sys.columns. There will be no behavioral change for Inline TVFs. But, for Multi-Statement TVFs, because the output was based on the meta-data, the behavior will change to be consistent with the function definition.

     

  9. Computed Columns that set the output Collation by using the COLLATE keyword, whether PERSISTED or not, will have their meta-data updated in sys.columns, but the actual Collation used will be whatever was specified in the definition. This is very similar to how Inline TVFs behave, except:
    • Selecting from the Table will display the following warning in the “Messages” tab:

      Metadata stored on disk for computed column '{column_name}' in table '{schema_name}.{column_name}' did not match the column definition. In order to avoid possible index corruption, please drop and recreate this computed column.

      This warning will appear when the query referencing the table is not in the plan cache. Once the query is cached, the warning will not be displayed again, for that query, until the plan is evicted from the cache, or DBCC FREEPROCCACHE is executed, etc.

    • There is no “refresh” system stored procedure that will fix the meta-data. Executing sys.sp_recompile on the table does not help. Correcting the meta-data requires dropping and recreating the Computed Column.

       

  10. Contained Databases (i.e. “Partially Contained Database”, CONTAINMENT = PARTIAL) are handled correctly:
    • Instance-level meta-data is changed as expected
    • Database-level meta-data (name column in system catalog views) is still Latin1_General_100_CI_AS_KS_WS_SC (i.e. CATALOG_DEFAULT)
    • System lookup values (_desc columns in system catalog views) are still Latin1_General_CI_AS_KS_WS (coming from the hidden mssqlsystemresource DB)
    • User data is changed (this includes: filter_definition in sys.indexes ; collation_name in sys.columns ; collation_name and definition in sys.computed_columns ; definition in sys.sql_modules ; clr_name in sys.assemblies ; etc)

       

  11. Instance-level default Collation will not change until all DBs have successfully converted.

     

  12. This operation will make no changes at all if the current Instance-level Collation is the same as the new Collation being requested by this operation. This means that if the process converts some Databases but then fails while converting another Database (hence not completing and not changing the Instance-level Collation), then you cannot “revert” the operation by going back to the original Collation. If the Collation you request using the -q switch is the same as the Instance-level Collation, then running this command-line will simply start the Instance in Single-User mode instead of making any Collation changes. However, you will still be able to make changes to the Database that the error occurred in and repeat the operation for the new Collation.

     

  13. While both documented and undocumented methods use the Transaction Log for the dropping and recreating of Indexes, this method sometimes uses less Tran Log space due to not doing Code Page conversions of VARCHAR / CHAR / TEXT columns. Of course, if the old and new Collations use the same Code Page, then the documented method won’t be doing Code Page conversions anyway.

     

  14. Unlike the documented “SETUP.EXE /ACTION=REBUILDDATABASE” method, this method does not drop and recreate the system Databases (requiring a bit of extra work to get the instance back to its original state minus the change in Collation). This could save a good bit of work in terms of restoring Databases, recreating Instance-level objects, SQL Server Agent configuration and jobs, re-applying patches, etc.

     

  15. For memory-optimized tables containing string columns, there are a few issues that might come up:

    1. The existence of any auto-created statistics (i.e. names starting with _WA_Sys_*) on string columns will most likely cause an error that will stop the Collation update process. Please see next section on “Errors you Might Encounter” (below) for details.
    2. For any characters with values of 128 or above, it is quite likely that those same bytes will refer to different, or possibly even invalid, characters. This is pretty much the same data loss issue as noted above in item #6. For example, if the original Collation is Korean_100_CI_AS, then the byte sequence of “0xB15A” will produce “켣”. If you then use this method to switch to any “_UTF8” Collation, you will end up seeing “�Z” because “0xB1” is an invalid byte sequence in UTF-8, and “0x5A” is valid and is the “Z” character.
    3. What is different with memory-optimized tables (regarding data loss) is that even when there is no data loss, indexes on string columns can easily produce non-obvious, “delayed-reaction” errors (meaning: they do not cause the Collation-update process to fail, and you might not notice them immediately). For memory-optimized tables that persist their data and have data to persist, the Collation update process does not drop and recreate their indexes, which leads to two issues:
      1. The indexes could, quite likely, be out of order. For example (using a VARCHAR column):
        Column CollationORDER BYDataNotes
        SQL_EBCDIC297_ CP1_CS_AScolumnab ➞ aA ➞ ba ➞ Aa ➞ Ab  Initial order
        French_CI_AScolumnab ➞ aA ➞ ba ➞ Aa ➞ Ab  Same order
        French_CI_AScolumn COLLATE French_CI_ASab ➞ aA ➞ ba ➞ Aa ➞ Ab  Same order
        French_CI_AScolumn COLLATE French_100_CI_ASaA ➞ Aa ➞ ab ➞ Ab ➞ ba  Correct order

        As you can see, after the Collation update, ordering by the column (no COLLATE clause) uses the original sort order because the index hasn’t been rebuilt for the new Collation. And even if you specify the COLLATE clause with the new Collation in an attempt to force the intended new sort order, it will still return in the original order because specifying the Collation that is defined for the column has no effect at all. But, specifying a Collation that is neither the original Collation nor the new Collation, will return in the order for that Collation because SQL Server knows that it doesn’t have the rows sorted by the new ordering rules, so it applies the specified Collation.

        Also, you aren’t allowed to manually REBUILD or REORGANIZE these either.

        ALTER INDEX [IX_ShoppingCart_Name2i] ON dbo.ShoppingCart REBUILD; -- REORGANIZE

        /*

        Msg 10794, Level 16, State 12, Line XXXXX

        The operation 'ALTER INDEX' is not supported with memory optimized tables.

        */

         

        ALTER TABLE dbo.ShoppingCart REBUILD;

        /*

        Msg 10794, Level 16, State 134, Line XXXXX

        The operation 'ALTER TABLE REBUILD' is not supported with memory optimized tables.

        */

         

        DROP INDEX [IX_ShoppingCart_Name2i] ON dbo.ShoppingCart;

        /*

        Msg 10794, Level 16, State 114, Line XXXXX

        The operation 'DROP INDEX' is not supported with memory optimized tables.

        */

        In these cases you will need to, at the very least, delete and re-insert the data, or drop and recreate the table(s) and then re-insert the data.

      2. Queries appear to execute correctly, until you add more data. Then, when you use WHERE or ORDER BY, you will likely get one of the following errors:

        Msg 9100, Level 21, State 2, Line XXXX

        Possible index corruption detected. Run DBCC CHECKDB.

        (disconnected!)

        Msg 701, Level 17, State 153, Line XXXXX

        There is insufficient system memory in resource pool 'default' to run this query.

        These errors will disappear once the Instance is resarted (there might be a more elegant way of getting the system to rebuild memory-optimized indexes, but I am not aware of any at the moment). The errors might come back if more rows are added to these tables.

PLEASE NOTE

To avoid unnecessary time, disk I/O, tran log space, etc spent (i.e. wasted) on DBs that are already set to the desired Collation (in terms of both Database-level and all columns that are desired to be using the new Collation), or to skip updating one or more Databases for any reason: detach the Database(s) before shutting the Instance down. Then, after the update, re-attach the Database(s). Unfortunately, you cannot skip a Database by setting it to OFFLINE since it will be treated as being “read-only”, which causes an error before any Database is converted (though the error message never says which Database(s) are causing the problem).

Errors You Might Encounter

  1. Due to read-only DB (documented in link #4 at the bottom, and can be caused by a Database being OFFLINE):

    Error: 5804, Severity: 16, State: 1

    Character set, sort order, or collation cannot be changed at the server level because at least one database is not writable.

    Make the database writable, and retry the operation.

     

  2. Due to files being read-only (documented in link #4 at the bottom, and can be caused by a Database being OFFLINE):

    Error: 3416, Severity: 20, State: 1

    The server contains read-only files that must be made writable before the server can be recollated.

     

  3. Due to In-Memory DB (documented in link #4 at the bottom):

    Error: 41317, Severity: 16, State: 4

    A user transaction that accesses memory optimized tables or natively compiled procedures cannot access more than one user database or databases model and msdb, and it cannot write to master.

    (please see “Fix…” immediately below this list)

     

  4. Due to In-Memory DB (documented in link #4 at the bottom) and Unique Constraint Violation:

    Error: 3434, Severity: 20, State: 1

    Cannot change sort order or locale. An unexpected failure occurred while trying to reindex the server to a new collation. SQL

    Server is shutting down. Restart SQL Server to continue with the sort order unchanged. Diagnose and correct previous errors and then retry the operation.

    (please see “Fix…” immediately below this list)

     

  5. Due to Unique Constraint Violation:

    Error: 1505, Severity: 16, State: 1.

    The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name ‘dbo.UniqueIndexViolation’ and the index name ‘CUIX_UniqueIndexViolation’. The duplicate key value is (a ).

     

  6. Due to specifying an invalid Collation for the -q option (including Collations not available in the version of SQL Server being updated; e.g. trying to use a version “100” Collation with SQL Server 2005):
    • SQL Server 2005, 2008, and 2008 R2 error (specifying -q"Latin1_General_200_CI_AS"):

      2018-08-02 14:24:00.78 Server Error: 17112, Severity: 16, State: 1.

      2018-08-02 14:24:00.78 Server An invalid startup option q was supplied, either from the registry or the command prompt. Correct or remove the option.

    • SQL Server 2017 error (specifying -q"Latin1_General_200_CI_AS"):

      An invalid startup option ‘qLatin1_General_200_CI_AS’ was supplied, either from the registry or the command prompt. Correct or remove the option.

      2018-08-02 14:07:30.11 Server SQL Server shutdown has been initiated

Fix for In-Memory OLTP Errors

Errors related to memory-optimized tables are due to non-indexed columns (specifically, columns that are not the leading column in any index) that have auto-created statistics on them. All you need to do is remove these auto-created statistics by using the following query:

DECLARE @DropStats NVARCHAR(MAX) = N'';
SELECT 
       @DropStats = @DropStats +
       N'DROP STATISTICS ' + QUOTENAME(OBJECT_SCHEMA_NAME(st.[object_id]))
       + N'.' + QUOTENAME(OBJECT_NAME(st.[object_id]))
       + N'.' + QUOTENAME(st.[name]) + N';' + NCHAR(0x000D) + NCHAR(0x000A)
FROM   sys.stats st
INNER JOIN sys.objects so
        ON so.[object_id] = st.[object_id]
WHERE  st.[auto_created] = 1
AND    so.[is_ms_shipped] = 0
PRINT @DropStats; -- DEBUG
EXEC (@DropStats);

NOTE: This query only fixes the problem that prevents the Collation update process from completing. It does not fix any instances of data loss or potentially corrupted indexes. To fix those problems you still might need to drop/recreate any affected tables or maybe just the data (in any affected tables).

Posts Dealing With the Undocumented sqlservr.exe -q Option

  1. Changing Server Collation ( 2011-05-26 )
  2. Changing SQL Server Collation After Installation ( 2015-02-19 )
  3. SQL Server – Changing Sql Instance Collation – via sqlservr/-q ( 2016-11-04 ; based on post linked directly above)
  4. SQL Server – Changing Sql Instance Collation – via sqlservr/-q – Little Traps ( 2017-01-08 ; follow-up to post linked directly above)

Misc.

SQL Server Express LocalDB

Changing the Instance-level Collation for LocalDB is not possible. It will always be SQL_Latin1_General_CP1_CI_AS (quite unfortunately!):

  1. SETUP.EXE for LocalDB will not be able to find the LocalDB instance. Using something like (LocalDB)\v12.0 for the Instance name will result in the following error:

    Instance Name (LOCALDB)\V12.0 for product Microsoft SQL Server doesn’t exist. Specify a valid instance name.

     

    Error result: -2068578302

    Result facility code: 1204

    Result error code: 2

    Nor does using “LocalDB” or “v12.0” by themselves work. And yes, I do have a v12.0 Instance of LocalDB.

     

  2. Attempting to use the sqlservr.exe -q option results in one of the following errors, depending on how you specify the Instance name:
    • Using just the instance name:

      Your SQL Server installation is either corrupt or has been tampered with (Error getting instance ID from name.). Please uninstall then re-run setup to correct this problem

    • Using (LocalDB)\InstanceName:

      Your SQL Server installation is either corrupt or has been tampered with (Error: Instance name exceeds maximum length.). Please uninstall then re-run setup to correct this problem

There is a UserVoice suggestion, however, to allow it to be user-settable: Allow collation to be set for LocalDB at Instance level when creating an instance.

Compatibility Collations

You probably have never heard of “compatibility” Collations, and that is fine (full post on them coming in early 2019). They only show up when migrating from pre-SQL Server 2000. Most likely nobody outside of myself would ever try this, and I’m only doing it for research / documentation. The “sqlserver -q” method allows you to use a compatibility collation at the Instance-level. Attempting to do this using the documented method results in the following error:

The collation Compatibility_52_409_30002 was not found.

 

Error result: -2061893630

Result facility code: 1306

Result error code: 2

In-Memory OLTP and UTF-8 Collations (new in SQL Server 2019)

Even though In-Memory OLTP was mentioned in item # 15 in the “Undocumented Approach” section above, this is a special case.

The new UTF-8 Collations, introduced in SQL Server 2019, are not supported for using in memory-optimized tables. Attempting to CREATE or ALTER tables to use UTF-8 Collations will result in one of the following errors:

Msg 12356, Level 16, State 157, Line XXXXX

Comparison, sorting, and manipulation of character strings that use a UTF8 collation is not supported with memory optimized tables.

or:

Msg 12357, Level 16, State 158, Line XXXXX

Indexes on character columns that use a UTF8 collation are not supported with indexes on memory optimized tables.

Using the “sqlservr -q” approach will by-pass those restrictions and update memory-optimized tables to use UTF-8 Collations (assuming you specified a UTF-8 Collation). For memory-optimized tables that either have no data or are only persisting the schema (i.e. DURABILITY = SCHEMA_ONLY), this might work. I have not done extensive testing. But, even if testing does seem to indicate that it works as expected, this is still an unsupported configuration (so don’t expect any help from Microsoft if there is odd / incorrect behavior with a UTF-8 Collation in a memory-optimized table).

Testing

I tested on SQL Server 2017 CU6 and SQL Server 2019 CTP 2.1, but the behavior should be consistent across versions.

General

{ When I have time I will post the test cases }

LocalDB

SETUP

sqllocaldb c TestChange
CD C:\Program Files\Microsoft SQL Server\140\LocalDB\Binn

TEST 1

.\sqlservr -c -m -T4022 -T3659 -s"TestChange" -q"Hebrew_100_CI_AS"
REM Your SQL Server installation is either corrupt or has been tampered
REM with (Error getting instance ID from name.).  Please uninstall then
REM re-run setup to correct this problem

TEST 2

.\sqlservr -c -m -T4022 -T3659 -s"(LocalDB)\TestChange" -q"Hebrew_100_CI_AS"
REM Your SQL Server installation is either corrupt or has been tampered
REM with (Error: Instance name exceeds maximum length.).  Please
REM uninstall then re-run setup to correct this problem

TEST 3

CD C:\Program Files\Microsoft SQL Server\120\Setup Bootstrap\SQLServer2014
.\SETUP.EXE /ACTION=REBUILDDATABASE /INSTANCENAME="(LocalDB)\v12.0" ^
/QUIET /SQLCOLLATION=Latin1_General_100_CI_AS
REM Instance Name (LOCALDB)\V12.0 for product Microsoft SQL Server
REM doesn't exist. Specify a valid instance name.
REM
REM Error result: -2068578302
REM Result facility code: 1204
REM Result error code: 2

Compatiblity Collations

TEST 1

SETUP.EXE /QUIET /ACTION=REBUILDDATABASE /INSTANCENAME=SQL2019 ^
/SQLCOLLATION=Compatibility_52_409_30002
REM The collation Compatibility_52_409_30002 was not found.
REM
REM Error result: -2061893630
REM Result facility code: 1306
REM Result error code: 2

TEST 2

.\sqlservr -c -m -T4022 -T3659 -s"{InstanceName}" ^
-q"Compatibility_55_409_30002"
REM Success!!

In-Memory OLTP and UTF-8 Collations

SETUP

CREATE DATABASE [Hekaton] COLLATE Japanese_XJIS_140_CI_AS_VSS;
ALTER DATABASE [Hekaton] SET RECOVERY SIMPLE;
ALTER DATABASE [Hekaton]
   ADD FILEGROUP [InMemoryStuff]
   CONTAINS MEMORY_OPTIMIZED_DATA;
ALTER DATABASE [Hekaton]
   ADD FILE (name = N'InMemoryStuff', filename =
   'C:\Program Files\Microsoft SQL Server\MSSQL15.SQL2019\MSSQL\DATA\InMemoryStuff.idf')
   TO FILEGROUP [InMemoryStuff];
USE [Hekaton];

TEST 1

Specify a UTF-8 Collation when creating the table (no index specified for the column attempting the UTF-8 Collation):

CREATE TABLE dbo.UTF8
(
    [ID] INT IDENTITY(1, 1)
        NOT NULL
        PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 400),
    [Name] VARCHAR(50) COLLATE Latin1_General_100_CI_AS_SC_UTF8
        NOT NULL
)
WITH (MEMORY_OPTIMIZED = ON);
/*
Msg 12356, Level 16, State 157, Line XXXXX
Comparison, sorting, and manipulation of character strings that use
   a UTF8 collation is not supported with memory optimized tables.
*/

TEST 2

Create the table with a non-UTF-8 Collation, then attempt to change the Collation via ALTER TABLE:

CREATE TABLE dbo.UTF8
(
    [ID] INT IDENTITY(1, 1)
        NOT NULL
        PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 400),
    [Name] VARCHAR(50) COLLATE Latin1_General_100_CI_AS_SC -- NOT UTF8
        NOT NULL
)
WITH (MEMORY_OPTIMIZED = ON);
-- Success
ALTER TABLE dbo.[UTF8]
    ALTER COLUMN [Name]
    VARCHAR(50) COLLATE Latin1_General_100_CI_AS_SC_UTF8 NOT NULL;
/*
Msg 12356, Level 16, State 157, Line XXXXX
Comparison, sorting, and manipulation of character strings that use
   a UTF8 collation is not supported with memory optimized tables.
*/

However, changing the Collation to a non-UTF-8 Collation does work.

TEST 3

Specify a UTF-8 Collation when creating the table (include an index for the column attempting the UTF-8 Collation):

CREATE TABLE dbo.UTF8i
(
    [ID] INT IDENTITY(1, 1)
        NOT NULL
        PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 400),
    [Name] VARCHAR(50) COLLATE Latin1_General_100_CI_AS_SC_UTF8
        NOT NULL INDEX [IX_UTF8i_Name] NONCLUSTERED
)
WITH (MEMORY_OPTIMIZED = ON);
/*
Msg 12357, Level 16, State 158, Line XXXXX
Indexes on character columns that use a UTF8 collation are not
   supported with indexes on memory optimized tables.
*/

More…

For more info on Collations / encodings / Unicode / Extended ASCII (especially as they relate to Microsoft SQL Server), please visit:

Collations.Info

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating