I am having a difficulty with SQL Server collations (2008 R2) that is driving me up the wall. I've learned everything I can about MSSQL and Windows collations and character sets (lots of crazy details!), but I still can't tell what's going wrong. Hoping somebody here can save me...
My goal is simple:
-- I have a program that takes text files (all in Windows Code Page 1252) and bulk inserts them into database tables.
-- Every CHAR and VARCHAR field in every table has collation SQL_Latin1_General_CP1_CI_AS.
-- The format files mark these fields as SQLCHAR, likewise with collation SQL_Latin1_General_CP1_CI_AS.
-- The database and table default collations are Latin1_General_CI_AS.
-- The server default collation is Japanese_CI_AS
-- The result of sp_helpsort is Japanese, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
I'm using MERGE commands, with the file as a source table. The relevant part:
SELECT mycol1, mycol2 FROM OPENROWSET
(BULK 'C:\my\file.txt', FORMATFILE='C:\my\format.fmt', FIRSTROW = 2)
I receive the error:
Msg 7339, Level 16, State 1, Line 2
OLE DB provider 'BULK' for linked server '(null)' returned invalid data for column '[BULK].title'.
I've narrowed the problem down to accented "é" characters. When I remove those from the file, it loads properly. Since the file itself, the format file, and the table all understand the single-byte é, I don't see why it should cause problems.
The only idea I have -- and I hope this isn't right -- is if MSSQL is trying to convert the data to Unicode even though it's coming from and going to Windows-1252, due to the Server default of Japanese_CI_AS. Unfortunately, this page suggests that might be the case:http://msdn.microsoft.com/en-us/library/ms191145(v=sql.90).aspx
"The use remote collation option specifies whether the collation of a remote column or of a local server will be used... When false, distributed queries always use the default collation of the local server instance, and collation name and the collation of remote columns are ignored.
"Using the linked server options is the only way to enable using remote collations. Therefore, queries that are constructed that use ad hoc names provided by OPENROWSET and OPENDATASOURCE cannot use the collation information of remote character data. Additionally, all linked servers in SQL Server version 7.0 that are upgraded to SQL Server 2000 or later are set to use remote collation = false."
I interpret this as meaning that it will read my Windows-1252 file as Unicode instead (Japanese_CI_AS). That would explain the error, since "é" is a two-byte character in Unicode and the one-byte equivalent wouldn't be understood. But why are my settings in the format file and table structure being ignored? More importantly, is there a way around this? This is an automated process and never processes the same file twice, so setting up the every new file as a "remote database" isn't an option unless that can also be automated.
Is there anything I can change to fix this? I can change the format file, the table settings, or the database settings. I can change the query, but only somewhat since I have to use MERGE as these are UPSERT operations. I can't change the server settings or the data files themselves.