How to use SSIS to output to UTF-8 flat file

  • I have a table (dbo.BatchOutput) with a single column of data (dat). It is NVARCHAR(MAX). I need to create a flat file with a specified name (ProductType_YYYYMMDD.dat). The file must be in UTF-8 and is being sent to London for import. This entire process must be automated within an SSIS package.

    I am using Visual Studio 2005 and have created a SSIS package.

    Basically here are my steps:

    1) Create the file name in a Script Task

    The following is in a Data Flow Task.

    2) Read the data in an OLE DB Source from the table using SELECT Dat FROM dbo.BatchOutput.

    In the Properties of this OLE DB Source the DefaultCodePage is 65001 and Always UseDefaultCodePage is True.

    In the OLE DB Source Editor in Columns the External Column (Dat) is DT_NTEXT (I can't seem to make it anything else).

    Dat maps to DatOutput.

    The Output Column is DatOutput which I made DT_WSTR with a length of 2000.

    3) Next step is a Flat File Destination.

    Its Destination Connection Flat File Code page is 65001 (UTF-8), Unicode box is not checked, format is Delimited, there is no Text qualifier, Header row delimiter is {CR}{LF}, no rows are skipped. Its DataType (Expr1) is DT_WSTR with a length of 2000.

    DatOutput maps to Expr1.

    For all of the variables (Dat, DatOutput, and Expr1) I cannot change the CodePage associated with that variable. It remains as 0. I assume that the Code Page of the Destination Connection Flat File will be use.

    I run the package and it creates the file with the correct filename and data but it is always ANSI when I open the file in NotePad and look in Save As.

    I have read all of the forums here and across the web but can never get this to work. I have struggled for days on this. Help would be greatly appreciated.

    Thank you.

  • You can save your .sql file as UTF8 in SMS so you can call T-SQL with UTF8 encoding so your notepad will get data that is already encoded. The other option is to use the free VS2005/8 Express VB version to use .NET System.Text.Encoding to create UTF8 encoding.

    Kind regards,
    Gift Peddie

  • I'm sorry, but I don't understand the solution. What is SMS? How does this differ from what I am already trying? Perhaps some details would be helpful.

    Thank you.

  • SMS(SQL Server Management Studio) this is different because your .sql file is saved as UTF8 when you get to the save as box there is an arrow which says save with encoding which takes you to a menu of most known encoding look for UT8 and save it so your text is encoded before you transfer it.

    The other option is to use the free VS2005 Express write custom dll in VB and consume it, if I find existing code I will post the link.

    Kind regards,
    Gift Peddie

  • Is it possible to do this is SSIS? I've written a number of other SSIS packages for this project that work successfully and it would be best to keep all of the pieces similar. Also, this must be completely automated with no manual intervention so no one would be able to choose from a drop-down box.

    Thank you.

  • I understand what you are saying but you are not understanding me I am telling you to move the task related to the UTF8 file to an Execute SQL task go to SMS write your .sql code save it as UTF8 and then call the UTF8 file in your Execute SQL task.

    Can you do this in BIDS no because it does not come with Advanced Save as however Team DB pro and Team Suites comes with Advanced Save as. If you have the former you can save your package as UTF8.

    Here I have found what I told you was correct but there is another option.

    http://blogs.conchango.com/jamiethomson/archive/2005/07/19/SSIS_3A00_-Codepages.aspx

    Kind regards,
    Gift Peddie

  • @blaforge:

    Sorry for the late response - if you still need assistance with this:

    Your issue is a "quirk" in SSIS when dealing with LOB character fields > 4000 chars in length and a UTF-8 destination. The workaround is simple albeit counterintuitive - add a Data Conversion Transformation step between the OLE DB Source and the Flat-File Destination that converts your input "Dat" column from DT_NTEXT to DT_TEXT with a codepage of 65001. Then you feed the newly transformed column directly to the output column in your flat-file dest. The output column in the flat-file connection manager can then be either DT_WSTR with a fixed length or DT_NTEXT, either will work fine and both will produce a UTF-8 encoded text file.

    Regards,

    Jacob

  • Jacob Luebbers (5/31/2009)


    @blaforge:

    Sorry for the late response - if you still need assistance with this:

    Your issue is a "quirk" in SSIS when dealing with LOB character fields > 4000 chars in length and a UTF-8 destination. The workaround is simple albeit counterintuitive - add a Data Conversion Transformation step between the OLE DB Source and the Flat-File Destination that converts your input "Dat" column from DT_NTEXT to DT_TEXT with a codepage of 65001. Then you feed the newly transformed column directly to the output column in your flat-file dest. The output column in the flat-file connection manager can then be either DT_WSTR with a fixed length or DT_NTEXT, either will work fine and both will produce a UTF-8 encoded text file.

    Regards,

    Jacob

    Sorry for replying to and old post. But I would like to know if this particular issue was solved or not. I tried the following.

    1. Created a OLEDB connection

    2. Prepared the source query.

    3. Used a converter and converted to 65001 (for UTF-8)

    4. Added a flat file destination.

    5. Unchecked "unicode" check box and selected 6005-UTF-8 code page.

    It generates the CSV file.. but only in ANSI format. Not in UTF-8...

    Can anyone pls help..

    Thanks.

  • No responses at all...

    Can anyone help pls...

    Thanks.

  • Joy Smith San (11/11/2011)


    No responses at all...

    Can anyone help pls...

    Thanks.

    I wrote code for my employer that I cannot share in a Script component but I think you can just create a table with only Nvarchar columns and export the table. If you are in SQL Server 2005 make sure your column definition is at least 255 because smaller definition may be passing just bytes which may be the reason you are getting ASCII because SQL Server before 2008 gives you only UCS-2 which is not quite UTF8. You can also save the stored procedure as UTF8 before executing it, that exist in previous version of the product.

    Kind regards,
    Gift Peddie

  • Thanks for the response. Will try that solution as well.

    Anyway, I managed it by keeping a "template" in UTF-8 format and

    copied this template to actual location and wrote the data in that file.

    Dint overwrite, just appended the data. It's working I believe, other people are just verifying it.

    Thanks again.

  • Thanks to all for your inputs.....

    Even though we followed the 65001 in OLEDB data source & Connectin Manager...i came to know some data ( pound symbol & euro symbole ) are not appearing properly....

    Here is solution....

    Used the normal extract of data ( use default in OLEDB & 1252 for Connection Manager)

    Then convert all extracted files using script task with code from following link...

    http://blogs.msdn.com/b/jay_akhawri/archive/2010/06/15/missing-byte-order-mark-in-flat-file-generated-with-flat-file-connection.aspx

    Thanks,

    SP

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply