March 15, 2013 at 12:56 pm
I am running a .bat file that runs a SQL script that extracts data from a SQL Server database and writes the data to a flat file. The encoding of the file is ANSI. What code can I add to the .bat file to take this ANSI file and convert it to UTF-8?
March 15, 2013 at 1:35 pm
What tool are you using to extract the data?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
March 18, 2013 at 11:22 am
The .bat file is running a SQL query from the command line, using the following format:
sqlcmd -S <SQL Server instance> -i <SQL query> -o <output file> -d <SQL Server database name> -h -1 -W -r 0
March 18, 2013 at 2:41 pm
Adding the -u switch will give you a Windows Unicode (i.e. UCS-2) file. To get to UTF-8 you'll have to enlist the help of some other tool. Why do you need UTF-8? It's not native for Windows, UCS-2 (subset of UTF-16LE) is native.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
March 19, 2013 at 1:23 pm
The system that the file is being imported into requires UTF-8, so there's no flexibility there.
March 19, 2013 at 3:38 pm
You could try using SSIS to do the conversion for you on the fly. Use an OLE DB Source in a Data Flow to get the data from the database and send it to a Flat File Destination configured to write a file in UTF-8 format.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
April 11, 2013 at 3:17 am
trigonom (3/19/2013)
The system that the file is being imported into requires UTF-8, so there's no flexibility there.
Hi, did you find a solution?
I have one using scripting to convert the file through ADODB.Stream objects (SSIS sucks at producing UTF-8) but it is not performing well, so am looking for another..What did you end up using?
Here is the script I found on t'internet in case you want it:
function ansiToUtf8(fileName) {
var indir = "ansiin";
var outdir = "utf8out";
var ansi = WScript.CreateObject("ADODB.Stream");
ansi.Open();
ansi.Charset = "x-ansi";
ansi.LoadFromFile(indir+"\\"+fileName);
var utf8 = WScript.CreateObject("ADODB.Stream");
utf8.Open();
utf8.Charset = "UTF-8";
utf8.WriteText(ansi.ReadText());
utf8.SaveToFile(outdir+"\\"+fileName, 2 /*adSaveCreateOverWrite*/);
ansi.Close();
utf8.Close();
}
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
April 11, 2013 at 3:46 am
Ok, ignore that...I found my answer.
Apparently ADODB.Stream for reading the file is terrible in this code.
This worked much much much better:
function ansiToUtf8(fileName) {
var indir = "ansiin";
var outdir = "utf8out";
var ansi = WScript.CreateObject("Scripting.FileSystemObject");
var utf8 = WScript.CreateObject("ADODB.Stream");
utf8.Open();
utf8.Charset = "UTF-8";
utf8.WriteText(ansi.OpentextFile(indir+"\\"+fileName,1,false,false).ReadAll());
utf8.SaveToFile(outdir+"\\"+fileName, 2 /*adSaveCreateOverWrite*/);
utf8.Close();
}
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
April 11, 2013 at 6:22 am
SSIS sucks at producing UTF-8
Why?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
April 11, 2013 at 6:44 am
opc.three (4/11/2013)
SSIS sucks at producing UTF-8
Why?
Because I couldn't get it to do it.
I tried.
Lots.
If you know how, and have actually seen it work, I would seriously love to know 😛
I know you can set the Flat file connection manager to UTF-8 and you can try and force your data into the right code page for it (65001 i think without checking), but I just couldn't get it to actually produce UTF-8 and the same was true for all the people that popped up on Google with the same problem.
I haven't tried in SQL2012 yet though.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
April 11, 2013 at 6:50 am
I am pretty sure I tried it before posting but that code has long since evaporated. I'll try it again though. Chances are I was using 2012 too so I'll fire that up and try 2005 too. Things may have gotten better.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
April 11, 2013 at 7:33 am
...on hold...
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
April 11, 2013 at 7:36 am
Thanks for that - will take a look.
Much appreciated.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
April 11, 2013 at 7:43 am
😛 I just realized something...converting a file from ANSI to UTF-8 is pretty much non-sensical unless you truly need the BOM. UTF-8 covers all 255 ANSI characters with the same exact code points. The SSIS flat file team could have been nice enough to put in the BOM, but they didn't, but basically no conversion should be necessary as most UTF-8 parsers say the BOM Is optional.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
April 11, 2013 at 7:46 am
mister.magoo (4/11/2013)
Thanks for that - will take a look.Much appreciated.
If you did happen to grab the download before I took it down, don't bother looking. I was misreading the BOM in my output file. The code I posted outputs UTF-16LE, not UTF-8.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply