bcp unicode

  • Hello, I have a great mystery about bcp command.I have this table:

    CREATE TABLE myTable

    (

    PersonID smallint,

    FirstName nvarchar(50) ,

    LastName nvarchar(50)

    );

    I create a format file with bcp:

    exec xp_cmdshell 'bcp AdventureWorks2012.dbo.myTable format nul -f D:\Test\MyTableFormatFile.fmt -S instance\SERVER -w -t, -T'

    The file with data is:

    1,skip,firstname1,lastname1

    1,skip,firstname2,lastname2

    I've changed the format file to include the skiped column(the data file has one extra column and I want to skip this column on import, the data file is Unicode):

    9.0

    4

    1 SQLNCHAR 0 14 ",\0" 1 PersonID ""

    2 SQLNCHAR 0 1000 ",\0" 0 FirstName SQL_Latin1_General_CP1_CI_AS

    3 SQLNCHAR 0 1000 ",\0" 2 FirstName SQL_Latin1_General_CP1_CI_AS

    4 SQLNCHAR 0 1000 "\r\0\0" 3 LastName SQL_Latin1_General_CP1_CI_AS

    When I execute:

    exec xp_cmdshell 'bcp AdventureWorks2012.dbo.myTable in D:\Test\MyTableData.dat -S instance\SERVER -T -e D:\Test\error_bcp.txt -f D:\Test\MyTableFormatFile.fmt'

    I get the error:

    NULL

    Starting copy...

    SQLState = 22018, NativeError = 0

    Error = [Microsoft][SQL Native Client]Invalid character value for cast specification

    NULL

    0 rows copied.

    Network packet size (bytes): 4096

    Clock Time (ms.) Total : 1

    NULL

    instead this query works perfectly:

    select * from openrowset(bulk 'D:\Test\MyTableData.dat', FormatFile = 'D:\Test\MyTableFormatFile.fmt') t1

    Which might be the problem? If I generate the format file with "-c" instead of "-w" and the file from which I load the data in Saved As "ANSI" the bcp works fine.

    The test was made in SQL 2012 Evaluation, i used bcp version 100(sql2008R2) but if I switch to "C:\Program Files\Microsoft SQL Server\110\Tools\Binn\bcp.exe" the result is the same.

  • Actually if data file is unicode and I use -F option let say -F2 everything is ok, no error:

    exec xp_cmdshell 'bcp AdventureWorks2012.dbo.myTable in D:\Test\MyTableData.dat -S instance\SERVER -T -F2 -f D:\Test\MyTableFormatFile.fmt'

    ?!?

  • This line:

    4 SQLNCHAR 0 1000 "\r\0\0" 3 LastName SQL_Latin1_General_CP1_CI_AS

    is missing a backslash-n, but that seems to be a trick from SQL Server Central. I need to dig further.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Sorry for the nonsense post, and thanks a lot for the question. It came very timely with a text I'm writing, and made me aware of problem with BCP that I had missed.

    The problem is the BOM, byte-order-mark. A unicode file typically start with a BOM, so that an application can see if the encoding is UTF-16LE (the normal on Windows), UTF16BE or UTF-8. Without it, the program needs to try heuristics.

    BULK INSERT and OPENROWSET understand the BOM, and can read both UTF-16LE and UTF-16BE. They also understand to skip the BOM and not import it.

    BCP on the other hand, does not understand the BOM, but thinks that those two bytes are just data. If the first field is a character field, the BOM will be imported as part of the data. Here, where the first field is a number, everything goes bad. BCP can only handle UTF-16LE files without BOM.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • I did some further testing, and I found that when you use the -w option and do not use a format file, BCP skips the BOM. It still does not really understand it, because I am not able to load a big-endian file.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Thank you for you answer, i will try to understand the BOM stuff. I've tested also the import without format file and I saw that it works.

    Best regards

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply