Unicode Bulk insert Problems

  • Hello everyone,

    I try to import Unicode (UTF-8) data from *.csv file into SQL table. I use XML format file for bulk import. Everything import but UTF-8 symbol not show correctly.

    Maybe you know where the problem can be.

    I try changes all setting like DATAFILETYPE, ROWTERMINATOR, xsi:type, TERMINATOR in xml file and ect. , but can't find solution. I Attach all files, like i said with Billing_Data_NoUnicode.xml it works but no Unicode letter.

    CREATE TABLE Billing.dbo.Billing_Data (ID [int] identity (1,1) NOT NULL PRIMARY KEY,

    "OS" varchar(100) NULL,

    "Type" nvarchar(255) NULL,

    "Client" nvarchar(255) NULL,

    "SupportType" varchar(255) NULL,

    "Site" nvarchar(255) NULL,

    "Device" nvarchar(255) NULL,

    "Description" nvarchar(255) NULL,

    "Service" nvarchar(255) NULL,

    AddTime [datetime] NOT NULL CONSTRAINT Addtime_Bill DEFAULT (GETDATE()),

    CSV_File varchar (255) NULL)

    CREATE VIEW dbo.Billing_View AS SELECT "OS", "Type", "Client", "Site", "Device", "Description", "Service" FROM Billing.dbo.Billing_Data

    BULK INSERT Billing.dbo.Billing_View FROM 'C:\csv\Billing\Done\CNV_2016_05_26.csv'

    WITH ( DATAFILETYPE = 'widechar',

    FIELDTERMINATOR = '|',

    ROWTERMINATOR = '\r', (not show n)

    FORMATFILE = 'C:\csv\Billing_Data.xml',

    FIRSTROW = 2 )

  • SQL Server does not understand UTF-8.

    Convert the file to UCS-2 and then BULK INSERT will load it correcty.

    _____________
    Code for TallyGenerator

  • It's stupid. Even NOTEPAD supports UTF-8 but SQL Server does not until 2016.

    https://msdn.microsoft.com/en-us/library/ms188365(v=sql.105).aspx

    You can, however, pre-convert the file to UTF-16 to import. Supposedly, the code at http://www.gnu.org/software/libiconv/#downloading works. I've not tried it though. Normally when I get something in UTF-8, I send it back to them in Pig Latin. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/30/2016)


    It's stupid. Even NOTEPAD supports UTF-8 but SQL Server does not until 2016.

    https://msdn.microsoft.com/en-us/library/ms188365(v=sql.105).aspx

    You can, however, pre-convert the file to UTF-16 to import. Supposedly, the code at http://www.gnu.org/software/libiconv/#downloading works. I've not tried it though. Normally when I get something in UTF-8, I send it back to them in Pig Latin. 😉

    Careful here.

    The meaning of "UTF-16" nowadays is not that fixed length unicode encoding as it was some time ago.

    The file converter might be creating a file in that "new" UTF-16, not the one SQL Server can understand.

    Therefore I used the name "UCS-2" - that would be certainly interpreted by SQL Server correctly.

    _____________
    Code for TallyGenerator

  • Thank you for everything I will try 🙂

    I found other solution which working for me.

    First need convert csv file with encoding in PowerShell:

    get-childitem ($directory + '\' + '*.csv') | ForEach-Object {import-csv $_.FullName -Delimiter ';' -Encoding Default | Export-CSV -path ($ExportPath+"CNV"+"_"+$_.Name) -Delimiter '|' -Encoding Default}

    Default csv file has ANSI Encode so i convert file to the same encode file just changes Delimiter(FIELDTERMINATOR = ''|'') from ; to |. And then create new xml file with different COLLATION and its work.

    <?xml version="1.0"?>

    <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <RECORD>

    <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='"' MAX_LENGTH="200" COLLATION="Lithuanian_CI_AS"/>

    <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='"|"' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>

    <FIELD ID="3" xsi:type="CharTerm" TERMINATOR='"|"' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>

    <FIELD ID="4" xsi:type="CharTerm" TERMINATOR='"|"' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>

    <FIELD ID="5" xsi:type="CharTerm" TERMINATOR='"|"' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>

    <FIELD ID="6" xsi:type="CharTerm" TERMINATOR='"|"' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>

    <FIELD ID="7" xsi:type="CharTerm" TERMINATOR='"|"' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>

    <FIELD ID="8" xsi:type="CharTerm" TERMINATOR='"\r' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>

    </RECORD>

    <ROW>

    <COLUMN SOURCE="2" NAME="OS" xsi:type="SQLNVARCHAR"/>

    <COLUMN SOURCE="3" NAME="Type" xsi:type="SQLNVARCHAR"/>

    <COLUMN SOURCE="4" NAME="Client" xsi:type="SQLNVARCHAR"/>

    <COLUMN SOURCE="5" NAME="Site" xsi:type="SQLNVARCHAR"/>

    <COLUMN SOURCE="6" NAME="Device" xsi:type="SQLNVARCHAR"/>

    <COLUMN SOURCE="7" NAME="Description" xsi:type="SQLNVARCHAR"/>

    <COLUMN SOURCE="8" NAME="Service" xsi:type="SQLNVARCHAR"/>

    </ROW>

    </BCPFORMAT>

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply