May 30, 2016 at 5:56 am
Hello everyone,
I try to import Unicode (UTF-8) data from *.csv file into SQL table. I use XML format file for bulk import. Everything import but UTF-8 symbol not show correctly.
Maybe you know where the problem can be.
I try changes all setting like DATAFILETYPE, ROWTERMINATOR, xsi:type, TERMINATOR in xml file and ect. , but can't find solution. I Attach all files, like i said with Billing_Data_NoUnicode.xml it works but no Unicode letter.
CREATE TABLE Billing.dbo.Billing_Data (ID [int] identity (1,1) NOT NULL PRIMARY KEY,
"OS" varchar(100) NULL,
"Type" nvarchar(255) NULL,
"Client" nvarchar(255) NULL,
"SupportType" varchar(255) NULL,
"Site" nvarchar(255) NULL,
"Device" nvarchar(255) NULL,
"Description" nvarchar(255) NULL,
"Service" nvarchar(255) NULL,
AddTime [datetime] NOT NULL CONSTRAINT Addtime_Bill DEFAULT (GETDATE()),
CSV_File varchar (255) NULL)
CREATE VIEW dbo.Billing_View AS SELECT "OS", "Type", "Client", "Site", "Device", "Description", "Service" FROM Billing.dbo.Billing_Data
BULK INSERT Billing.dbo.Billing_View FROM 'C:\csv\Billing\Done\CNV_2016_05_26.csv'
WITH ( DATAFILETYPE = 'widechar',
FIELDTERMINATOR = '|',
ROWTERMINATOR = '\r', (not show n)
FORMATFILE = 'C:\csv\Billing_Data.xml',
FIRSTROW = 2 )
May 30, 2016 at 9:05 pm
SQL Server does not understand UTF-8.
Convert the file to UCS-2 and then BULK INSERT will load it correcty.
_____________
Code for TallyGenerator
May 30, 2016 at 9:24 pm
It's stupid. Even NOTEPAD supports UTF-8 but SQL Server does not until 2016.
https://msdn.microsoft.com/en-us/library/ms188365(v=sql.105).aspx
You can, however, pre-convert the file to UTF-16 to import. Supposedly, the code at http://www.gnu.org/software/libiconv/#downloading works. I've not tried it though. Normally when I get something in UTF-8, I send it back to them in Pig Latin. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
May 30, 2016 at 10:16 pm
Jeff Moden (5/30/2016)
It's stupid. Even NOTEPAD supports UTF-8 but SQL Server does not until 2016.https://msdn.microsoft.com/en-us/library/ms188365(v=sql.105).aspx
You can, however, pre-convert the file to UTF-16 to import. Supposedly, the code at http://www.gnu.org/software/libiconv/#downloading works. I've not tried it though. Normally when I get something in UTF-8, I send it back to them in Pig Latin. 😉
Careful here.
The meaning of "UTF-16" nowadays is not that fixed length unicode encoding as it was some time ago.
The file converter might be creating a file in that "new" UTF-16, not the one SQL Server can understand.
Therefore I used the name "UCS-2" - that would be certainly interpreted by SQL Server correctly.
_____________
Code for TallyGenerator
May 30, 2016 at 11:59 pm
Thank you for everything I will try 🙂
I found other solution which working for me.
First need convert csv file with encoding in PowerShell:
get-childitem ($directory + '\' + '*.csv') | ForEach-Object {import-csv $_.FullName -Delimiter ';' -Encoding Default | Export-CSV -path ($ExportPath+"CNV"+"_"+$_.Name) -Delimiter '|' -Encoding Default}
Default csv file has ANSI Encode so i convert file to the same encode file just changes Delimiter(FIELDTERMINATOR = ''|'') from ; to |. And then create new xml file with different COLLATION and its work.
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR='"' MAX_LENGTH="200" COLLATION="Lithuanian_CI_AS"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR='"|"' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR='"|"' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR='"|"' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR='"|"' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>
<FIELD ID="6" xsi:type="CharTerm" TERMINATOR='"|"' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>
<FIELD ID="7" xsi:type="CharTerm" TERMINATOR='"|"' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>
<FIELD ID="8" xsi:type="CharTerm" TERMINATOR='"\r' MAX_LENGTH="510" COLLATION="Lithuanian_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="2" NAME="OS" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="3" NAME="Type" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="Client" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="5" NAME="Site" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="6" NAME="Device" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="7" NAME="Description" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="8" NAME="Service" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply