ANSI to UTF-8 conversion

  • I am running a .bat file that runs a SQL script that extracts data from a SQL Server database and writes the data to a flat file. The encoding of the file is ANSI. What code can I add to the .bat file to take this ANSI file and convert it to UTF-8?

  • What tool are you using to extract the data?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • The .bat file is running a SQL query from the command line, using the following format:

    sqlcmd -S <SQL Server instance> -i <SQL query> -o <output file> -d <SQL Server database name> -h -1 -W -r 0

  • Adding the -u switch will give you a Windows Unicode (i.e. UCS-2) file. To get to UTF-8 you'll have to enlist the help of some other tool. Why do you need UTF-8? It's not native for Windows, UCS-2 (subset of UTF-16LE) is native.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • The system that the file is being imported into requires UTF-8, so there's no flexibility there.

  • You could try using SSIS to do the conversion for you on the fly. Use an OLE DB Source in a Data Flow to get the data from the database and send it to a Flat File Destination configured to write a file in UTF-8 format.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • trigonom (3/19/2013)


    The system that the file is being imported into requires UTF-8, so there's no flexibility there.

    Hi, did you find a solution?

    I have one using scripting to convert the file through ADODB.Stream objects (SSIS sucks at producing UTF-8) but it is not performing well, so am looking for another..What did you end up using?

    Here is the script I found on t'internet in case you want it:

    function ansiToUtf8(fileName) {

    var indir = "ansiin";

    var outdir = "utf8out";

    var ansi = WScript.CreateObject("ADODB.Stream");

    ansi.Open();

    ansi.Charset = "x-ansi";

    ansi.LoadFromFile(indir+"\\"+fileName);

    var utf8 = WScript.CreateObject("ADODB.Stream");

    utf8.Open();

    utf8.Charset = "UTF-8";

    utf8.WriteText(ansi.ReadText());

    utf8.SaveToFile(outdir+"\\"+fileName, 2 /*adSaveCreateOverWrite*/);

    ansi.Close();

    utf8.Close();

    }

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Ok, ignore that...I found my answer.

    Apparently ADODB.Stream for reading the file is terrible in this code.

    This worked much much much better:

    function ansiToUtf8(fileName) {

    var indir = "ansiin";

    var outdir = "utf8out";

    var ansi = WScript.CreateObject("Scripting.FileSystemObject");

    var utf8 = WScript.CreateObject("ADODB.Stream");

    utf8.Open();

    utf8.Charset = "UTF-8";

    utf8.WriteText(ansi.OpentextFile(indir+"\\"+fileName,1,false,false).ReadAll());

    utf8.SaveToFile(outdir+"\\"+fileName, 2 /*adSaveCreateOverWrite*/);

    utf8.Close();

    }

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • SSIS sucks at producing UTF-8

    Why?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (4/11/2013)


    SSIS sucks at producing UTF-8

    Why?

    Because I couldn't get it to do it.

    I tried.

    Lots.

    If you know how, and have actually seen it work, I would seriously love to know 😛

    I know you can set the Flat file connection manager to UTF-8 and you can try and force your data into the right code page for it (65001 i think without checking), but I just couldn't get it to actually produce UTF-8 and the same was true for all the people that popped up on Google with the same problem.

    I haven't tried in SQL2012 yet though.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I am pretty sure I tried it before posting but that code has long since evaporated. I'll try it again though. Chances are I was using 2012 too so I'll fire that up and try 2005 too. Things may have gotten better.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • ...on hold...

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks for that - will take a look.

    Much appreciated.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • 😛 I just realized something...converting a file from ANSI to UTF-8 is pretty much non-sensical unless you truly need the BOM. UTF-8 covers all 255 ANSI characters with the same exact code points. The SSIS flat file team could have been nice enough to put in the BOM, but they didn't, but basically no conversion should be necessary as most UTF-8 parsers say the BOM Is optional.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • mister.magoo (4/11/2013)


    Thanks for that - will take a look.

    Much appreciated.

    If you did happen to grab the download before I took it down, don't bother looking. I was misreading the BOM in my output file. The code I posted outputs UTF-16LE, not UTF-8.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Viewing 15 posts - 1 through 15 (of 15 total)

    You must be logged in to reply to this topic. Login to reply