collation difficulties when BULK inserting CP1 data in Japanese server

  • I'm experiencing confusing behavior when bulk inserting data in SQL_Latin1_General_CP1_CI_AS, on a Japanese server. Extended characters like é are being converted to question marks when selected. This makes me think it's being converted to Unicode somewhere, but I'm not really sure where to look or where the conversion could occur... as part of the BULK load? During the insert? Maybe during the select?

    Here are the different components:

    1. Unzip text file and load it with:

    MERGE INTO

    AS [alias] USING (SELECT [columns] FROM OPENROWSET (BULK [datafile], FORMATFILE=[formatfile] ...

    2. The format file specifies the columns like:

    2SQLCHAR00"\"|"1MYCOL1SQL_Latin1_General_CP1_CI_AS

    3. The table columns are VARCHAR and have SQL_Latin1_General_CP1_CI_AS collation. However, the database itself could be something else (? - it's not my database so I don't have all the details yet). It might be Japanese or Unicode or Latin1_General_CI_AS.

    4. When I select in SQL Server Management Studio:

    SELECT mycol FROM mytable WHERE active='yes';

    I get cells like:

    Associ?

    Unfortunately, I can't reproduce this on my own laptop/server which is a US configuration. I've fiddled around to change settings to Japanese but I always get the correct answer:

    Associée

    Right now I'm not even sure where to look. Maybe the issue is with the command, maybe with the table, maybe even with how I'm using SQL Server Management Studio? Can anybody provide a suggestion for narrowing down the problem further?

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply