I thought my database was collation agnostic

  • Comments posted to this topic are about the item I thought my database was collation agnostic

    Tom Romeo
    Practical Compliance Solutions Corp.
    One Orient Way - Suite F156.
    Rutherford, NJ 07070-2524
    --------------------------------
    www.complianceabc.com
    email: tromeo@complianceabc.com
    Phone: 201-728-8809

  • I doubt, that your real solution (specifying an explicit collation string at each char field) is really a real solution, because it is very error prone - you have to find EVERY code line with #temp tables (+ maybe @table-vars??? + every cross db access + access to some sys table/view joins as to sys.server_principals) and change it. Futuremore you have to rembember this every time you (or your new developer) writes a new procedure.

    Personally I would have used my script to create the drop / create statements for the FKs, dropped them, changed collation and recreated the FKs. Should be much lesser work and is a one time action that solves the problem for the future.

  • samot-dwarf (1/14/2016)


    I doubt, that your real solution (specifying an explicit collation string at each char field) is really a real solution, because it is very error prone - you have to find EVERY code line with #temp tables

    It's what we do, and very rarely have problems. It's just one more thing we need to remember when coding, and if we forget it's almost always picked up (by code reviews or QA) before it hits production.

  • Hi, tromeo,

    that was an interesting article for me: Up to now I did not know anything about the abbreviations "CI" and "AS". Of course I did not have too many problems with collations up to now, but now I am a bit more aware of possible complications.

    Thank you!

    JP

    ________________________________________________________
    If you set out to do something, something else must be done first.

  • OK, I will now show the level of my ignorance. (someone has to do it.)

    My first thought was "Ah, the characters 'CP1' exist in one instance and not in the other."

    So, I looked carefully for some explanation of what 'CP1' stood for. If it is in the article I missed it despite re-reading several times.

    Apparently completely unimportant as far as the article is concerned, but just what does the 'CP1' stand for?

  • I didn't supply an explicit collation string, the string"COLLATE DATABASE_DEFAULT", tells the temp table to use the default collation of the database that was restored no matter what the collation of the SQL server.

    If the stored procedure had been written that way, I would never need to run your script at all. It would run naturally on any database server that is configured with any collation.

    The actual application is sold world wide so I can receive a backup from any where in the world any collation. I recently received a database from South America. Keeping their database in their collation was important when trouble shooting.

    I would rather diagnose the problem that the customer is experiencing as close to their native implementation as possible.

    Thanks for your post. I'm glad people actually read it.

    Actually a very recent post about scalar functions and their performance became the topic of a recent meeting here.

    All good stuff

    Tom Romeo
    Practical Compliance Solutions Corp.
    One Orient Way - Suite F156.
    Rutherford, NJ 07070-2524
    --------------------------------
    www.complianceabc.com
    email: tromeo@complianceabc.com
    Phone: 201-728-8809

  • Sorry - it actually means "code page 1"

    SQL_Latin1_General_CP1_CI_AS:- Latin1-General, case-insensitive,

    accent-sensitive, kanatype-insensitive, width-insensitive for Unicode

    Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data

    I never dreamed it could get this complicated.

    Tom Romeo
    Practical Compliance Solutions Corp.
    One Orient Way - Suite F156.
    Rutherford, NJ 07070-2524
    --------------------------------
    www.complianceabc.com
    email: tromeo@complianceabc.com
    Phone: 201-728-8809

  • sorry, double post.

  • samot-dwarf (1/14/2016)


    I doubt, that your real solution (specifying an explicit collation string at each char field) is really a real solution, because it is very error prone - you have to find EVERY code line with #temp tables (+ maybe @table-vars??? + every cross db access + access to some sys table/view joins as to sys.server_principals) and change it. Futuremore you have to rembember this every time you (or your new developer) writes a new procedure.

    Personally I would have used my script to create the drop / create statements for the FKs, dropped them, changed collation and recreated the FKs. Should be much lesser work and is a one time action that solves the problem for the future.

    The solution is very real and works just fine. Been there, done that. 1000+stored procedures and functions across 3 active version branches. Adding the COLLATION DATABASE_DEFAULT to temp tables and table vars was easy, and maintaining/remembering it is a piece of cake. New developers seem to pick it up quickly, also

    As for your solution, just dropping the FKs, changing the DB collation and recreating the FKs won't solve the problem.

    The collation is stored with each and every string-based column (varchar, nvarchar, etc.) in every table. In order to truly change the collation of the database, you must also change the collation on each string column. In addition to dropping FKs, you must also drop any indexes, statistics, computed columns and check constraints that refer to the column in question. And, since you can only alter one column at a time, it would probably be easier to rename the table, create a new one, copy the data, and drop the old one.

  • yes you are correct

    I actually devised a script to change the collation of every object, but then realized that the other constraints were blocking me.

    It would have been a huge task

    here are some of the SQL statements I attempted - Oh and I also failed with respect to nvarchar(max) - but that is a different story

    Select convert(sysname, serverproperty(N'collation'))

    select database_id, name from sys.databases

    SET NOCOUNT ON

    /*Find database default*/

    DECLARE @name sysname,

    @Collate VARCHAR(50)

    SET @name =DB_NAME(7) -- The DataBase Number that you intend to use.

    print @name

    --SET @Collate =(SELECT CONVERT (sysname, DATABASEPROPERTYEX (@name,'Collation')))

    SET @Collate =(Select convert(sysname, serverproperty(N'collation')))

    print @Collate

    SELECT 'ALTER TABLE ' + TABLE_NAME + ' ALTER COLUMN ' + COLUMN_NAME + ' ' + DATA_TYPE + '(' + CAST(CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10))+ ') ' + 'COLLATE ' + @Collate +

    CASE IS_NULLABLE

    WHEN 'NO' THEN' NOT NULL'

    WHEN 'YES' THEN' NULL'

    END

    FROM INFORMATION_SCHEMA.columns

    WHERE TABLE_NAME <>'dtproperties'

    AND COLLATION_NAME NOT LIKE @Collate

    ORDER BY COLUMN_NAME

    Tom Romeo
    Practical Compliance Solutions Corp.
    One Orient Way - Suite F156.
    Rutherford, NJ 07070-2524
    --------------------------------
    www.complianceabc.com
    email: tromeo@complianceabc.com
    Phone: 201-728-8809

  • samot-dwarf (1/14/2016)


    Personally I would have used my script to create the drop / create statements for the FKs, dropped them, changed collation and recreated the FKs. Should be much lesser work and is a one time action that solves the problem for the future.

    Hi there. Changing the database's Collation is actually a bad idea in this case. If customers are sending in the databases for support, then changing that Collation would mean that whoever is looking into the issue isn't seeing exactly what the customer is seeing and might not be able to debug the problem. The author's approach of using DATABASE_DEFAULT for the Collation of string fields on temporary tables is the appropriate solution for this setup.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Hi Thomas. Good job raising the visibility of the DATABASE_DEFAULT option for the COLLATE clause.

    Just a few notes about things mentioned in the article:

    1) "CP1" is just a nickname / short-hand for "Code Page 1252".

    2) "Latin1 makes the server treat strings using charset Latin 1, basically ASCII": the "Latin1" specifies the Language / Locale used for NVARCHAR/NCHAR (i.e. Unicode) operations and implies the Code Page for the non-SQL_ Collations for VARCHAR/CHAR operations. The term "ASCII" technically refers to just the first 128 characters (decimal values 0 - 127). In a more general sense that term is often used to mean non-Unicode and cover all 256 characters (values 0 - 255). In this later case, the 2nd set of 128 characters (values 128 - 255) are determined by the Code Page (in this case:1252, which is Latin1 characters) and is technically referred to as "Extended ASCII".

    3) "One would think that 'SQL_Latin1_General_CP1_CI_AS' and 'Latin1_General_CP1_CI_AS', should be identical with the exception that one is a SQL collation and the other is a Windows collation.": For NVARCHAR/NCHAR data they are mostly identical, but for VARCHAR/CHAR data there are some slight differences. For example, only with VARCHAR data and SQL_Latin1_General_CP1* collation does CHAR(0) <> ' '. Also, the non-SQL_ Latin1 collations will equate 'ß' with 'ss' as well as other combinations of cross-language equivalences, but the SQL_Latin1 collations will not. And, when sorting VARCHAR data with a SQL_ collation, apostrophes and dashes will sort before any letters just like other punctuation. But with the non-SQL_ Latin1 collations, dashes and apostrophes in the middle of words will be ignored. Depending on what your application is doing, these differences could be meaningful.

    4) "Notice that we are still collation indifferent.": mostly, but the server's default collation might still affect your code since it controls how variable names (including CURSOR variables) and GOTO labels are handled. If you create a new instance with a collation of Latin1_General_100_BIN2 and run through all of your tests and everything still works, then you are closer to being "collation indifferent". But if you have any places where you declare a variable as @bob-2 but then reference it as @bob-2 or @bob-2, it will break on an instance using a binary collation.

    Take care,

    Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • I ran into this problem with temp tables and I took a slightly different approach choosing to SELECT INTO the temp table. Not suggesting better. Just different.

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[#PriceRulesWork]') AND type in (N'U'))

    drop table #PriceRulesWork

    SELECT cast(null as char(30)) as StockCode, CAST(null as decimal(12,5)) as UnitCost, CAST(null as CHAR(1)) as [Manual],

    CAST(null as CHAR(1)) as ExpiredCost, CAST(null as CHAR(50)) as Reason,CAST(null as CHAR(6)) as Operator,

    CAST(null as CHAR(1)) as Catalog into #PriceRulesWork

    delete from #PriceRulesWork

  • There used to be a bug in the cluster install of SQL2008R2 that set the collation for the server incorrectly no matter what you specified. I'm not sure if this comes into affect.

    Something else that can crop up is that difference in collation can cause performance problems as something that you expect to SEEK does a CONVERT_IMPLICIT[/url] instead.

  • Thanks for this article. Good stuff! I like your solution.

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply