JohnG69 - Friday, September 25, 2015 9:47 AM
Hi John. First off, please see my reply to your related question from another topic here, which covers the basics of UTF-8 / UTF-16LE, etc:
Secondly, using a DataFileType of "widechar" is incorrect for UTF-8. "widechar" indicates UTF-16 LE, the 16-bit encoding used by NVARCHAR / NCHAR / NTEXT / XML. You need to use "char" as that is for 8-bit encodings, which is what UTF-8 is.
Thirdly, you don't mention either here or in the related question which version of SQL Server you are using. But you posted both towards the end of 2015, so I should mention that starting with SQL Server 2014 SP2, code page 65001 (which is UTF-8) is supported in bcp and BULK INSERT.
Fourthly, for anyone using SQL Server 2014 SP1 or older, the incoming data does need to be converted at some point. PowerShell is good option, and I don't believe that you need to use xp_cmdshell in order to do this. You should be able to use SQL Server Agent by creating either a "PowerShell" step, or an "Operating System (CmdExec)" step in which you call PowerShell. If neither of those options work, it might still be possible to import the data as-is, into a VARCHAR column, and use a function to convert the bytes from UTF-8 into UTF-16 LE (i.e. NVARCHAR). This can be done easily with a SQLCLR function, or I believe I saw a T-SQL version of such a function somewhere, but don't remember off-hand. I can update this later if I find it and it actually works.
I hope that helps. Take care, Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR