• Solomon Rutzky

    SSCoach

    Points: 15887

    JohnG69 - Friday, September 25, 2015 9:47 AM

    As you all know the bulk insert / bcp commands are made to work with UTF-16le files.

    We receive multiple CSV files which are in UTF-8 coding.
    I found a great way to convert the file using powershell ... The problem is that the customer that we are dealing with doesn't want to enable xp_cmdshell. I tried the Bulk insert on an UTF-8 encoded file and it seems to work but I see extra characters in the first row, which puts in doubt the validity of loading a UTF8 file with bulk insert.

    bulk insert #tmp
       from 'C:\Users\John_Doe\The_UTF8_File.csv'
       with (DATAFILETYPE = 'widechar',Firstrow = 1, ROWTERMINATOR = '')

    Without having to install or change anything on the client's server, how can I automate this conversion process, so that the bulk insert works properly?

    Hi John. First off, please see my reply to your related question from another topic here, which covers the basics of UTF-8 / UTF-16LE, etc:

    BCP - unicode

    Secondly, using a DataFileType of "widechar" is incorrect for UTF-8. "widechar" indicates UTF-16 LE, the 16-bit encoding used by NVARCHAR / NCHAR / NTEXT / XML. You need to use "char" as that is for 8-bit encodings, which is what UTF-8 is.

    Thirdly, you don't mention either here or in the related question which version of SQL Server you are using. But you posted both towards the end of 2015, so I should mention that starting with SQL Server 2014 SP2, code page 65001 (which is UTF-8) is supported in bcp and BULK INSERT.

    Fourthly, for anyone using SQL Server 2014 SP1 or older, the incoming data does need to be converted at some point. PowerShell is  good option, and I don't believe that you need to use xp_cmdshell in order to do this. You should be able to use SQL Server Agent by creating either a "PowerShell" step, or an "Operating System (CmdExec)" step in which you call PowerShell. If neither of those options work, it might still be possible to import the data as-is, into a VARCHAR column, and use a function to convert the bytes from UTF-8 into UTF-16 LE (i.e. NVARCHAR). This can be done easily with a SQLCLR function, or I believe I saw a T-SQL version of such a function somewhere, but don't remember off-hand. I can update this later if I find it and it actually works.

    I hope that helps. Take care, Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sites — Collations     •     Module Signing     •     SQLCLR