Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

bcp unicode Expand / Collapse
Author
Message
Posted Saturday, September 14, 2013 11:38 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, August 13, 2014 2:47 AM
Points: 13, Visits: 322
Hello, I have a great mystery about bcp command.I have this table:
CREATE TABLE myTable
(
PersonID smallint,
FirstName nvarchar(50) ,
LastName nvarchar(50)
);
I create a format file with bcp:
exec xp_cmdshell 'bcp AdventureWorks2012.dbo.myTable format nul -f D:\Test\MyTableFormatFile.fmt -S instance\SERVER -w -t, -T'

The file with data is:
1,skip,firstname1,lastname1
1,skip,firstname2,lastname2

I've changed the format file to include the skiped column(the data file has one extra column and I want to skip this column on import, the data file is Unicode):

9.0
4
1 SQLNCHAR 0 14 ",\0" 1 PersonID ""
2 SQLNCHAR 0 1000 ",\0" 0 FirstName SQL_Latin1_General_CP1_CI_AS
3 SQLNCHAR 0 1000 ",\0" 2 FirstName SQL_Latin1_General_CP1_CI_AS
4 SQLNCHAR 0 1000 "\r\0\0" 3 LastName SQL_Latin1_General_CP1_CI_AS

When I execute:
exec xp_cmdshell 'bcp AdventureWorks2012.dbo.myTable in D:\Test\MyTableData.dat -S instance\SERVER -T -e D:\Test\error_bcp.txt -f D:\Test\MyTableFormatFile.fmt'

I get the error:

NULL
Starting copy...
SQLState = 22018, NativeError = 0
Error = [Microsoft][SQL Native Client]Invalid character value for cast specification
NULL
0 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 1
NULL

instead this query works perfectly:
select * from openrowset(bulk 'D:\Test\MyTableData.dat', FormatFile = 'D:\Test\MyTableFormatFile.fmt') t1

Which might be the problem? If I generate the format file with "-c" instead of "-w" and the file from which I load the data in Saved As "ANSI" the bcp works fine.
The test was made in SQL 2012 Evaluation, i used bcp version 100(sql2008R2) but if I switch to "C:\Program Files\Microsoft SQL Server\110\Tools\Binn\bcp.exe" the result is the same.






Post #1494830
Posted Saturday, September 14, 2013 2:00 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, August 13, 2014 2:47 AM
Points: 13, Visits: 322
Actually if data file is unicode and I use -F option let say -F2 everything is ok, no error:
exec xp_cmdshell 'bcp AdventureWorks2012.dbo.myTable in D:\Test\MyTableData.dat -S instance\SERVER -T -F2 -f D:\Test\MyTableFormatFile.fmt'
?!?
Post #1494839
Posted Saturday, September 14, 2013 3:46 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, August 21, 2014 3:04 PM
Points: 801, Visits: 715
This line:

4 SQLNCHAR 0 1000 "\r\0\0" 3 LastName SQL_Latin1_General_CP1_CI_AS


is missing a backslash-n, but that seems to be a trick from SQL Server Central. I need to dig further.


Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Post #1494849
Posted Saturday, September 14, 2013 4:10 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, August 21, 2014 3:04 PM
Points: 801, Visits: 715
Sorry for the nonsense post, and thanks a lot for the question. It came very timely with a text I'm writing, and made me aware of problem with BCP that I had missed.

The problem is the BOM, byte-order-mark. A unicode file typically start with a BOM, so that an application can see if the encoding is UTF-16LE (the normal on Windows), UTF16BE or UTF-8. Without it, the program needs to try heuristics.

BULK INSERT and OPENROWSET understand the BOM, and can read both UTF-16LE and UTF-16BE. They also understand to skip the BOM and not import it.

BCP on the other hand, does not understand the BOM, but thinks that those two bytes are just data. If the first field is a character field, the BOM will be imported as part of the data. Here, where the first field is a number, everything goes bad. BCP can only handle UTF-16LE files without BOM.


Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Post #1494851
Posted Sunday, September 15, 2013 5:21 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, August 21, 2014 3:04 PM
Points: 801, Visits: 715
I did some further testing, and I found that when you use the -w option and do not use a format file, BCP skips the BOM. It still does not really understand it, because I am not able to load a big-endian file.

Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Post #1494870
Posted Sunday, September 15, 2013 5:28 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, August 13, 2014 2:47 AM
Points: 13, Visits: 322
Thank you for you answer, i will try to understand the BOM stuff. I've tested also the import without format file and I saw that it works.

Best regards
Post #1494872
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse