Export table data to UTF-8 text file problem

  • Running: Microsoft SQL Server 2005 - 9.00.3152.00 Enterprise Edition SP2 Enterprise Edition (Intel IA-64)

    I need to export data from a table to a UTF-8 encoded text file. This is going to be scheduled for every 15 or 30 minutes. My first approach was using BCP and xp_Cmdshell but this server is locked down for security reasons; no go for xp_Cmdshell, and I'd rather not open that up.

    So my revised approach is to call an SSIS task (that exports table data to text) from within a SQL JOB (other steps are required in the job before and after the SSIS is run). I think I will have to do something to be sure the SSIS is done before continuing to the next step (hrmm I think there may be a flag on the SSIS step that controls that).

    The SSIS creates the text outfile which MUST be in quoted and comma separated format as shown here:

    "1949","0","1","Something","Wow","","","",""

    "2107","4","1","Something else","Cool","","","",""

    The SSIS task works correctly-- EXCEPT the txt outfile is NOT in UTF-8.

    To simplify the txt output, my SqlCommand concatenates the columns using quotes and commas, and that "single column" is wrapped in a CAST to NVARCHAR. Like this (shortened):

    declare @dq char(1);

    declare @qcq char(3);

    set @dq='"'; -- a double quote

    set @qcq='","'; -- double quote comma double quote

    SELECT CAST(@dq + Col1 + @qcq + Col2 + @qcq + Col3 + @qcq + ReturnTime + @dq AS NVARCHAR(200)) as MyData

    FROM MyTable

    WHERE IsExported = 0

    Looking at the SSIS, Connection Manager shows me that the "DestinationConnectionFlatFile" has CodePage=65001 which is supposed to set UTF-8... or so I thought.

    Is there anything obvious I'm doing wrong here?

    Has anyone successfully done this-- even if via a different mechanism?


    Cursors are useful if you don't know SQL

  • In SSIS when you build the package you can specify the code page. Open the connection manager for the flat file. You should see a code page choice and be able to select it there. I believe that will store it out then in utf-8

  • Right... I'm already doing that, 3rd line from the end in the OP. Unless that's not the right CP...


    Cursors are useful if you don't know SQL

  • Has anyone out there tried this, either with success or failure?

    Adding this 1/27:

    One more whimper to try to get some additional responses.


    Cursors are useful if you don't know SQL

  • Ah... the edit on the previous message didn't float this thread in the list of topics, so... has anyone else been able to output to UTF-8?


    Cursors are useful if you don't know SQL

  • I've not had a problem outputting a UTF-8 flat file. I just whipped up a quick test table containing 2 columns (int, nvarchar(256)) and populated it with some sample UTF-8 data, and exported to a flat file from an OLEDB source, and the result was in UTF-8 format.

    For reference I'm running "Microsoft SQL Server 2005 - 9.00.3257.00 (X64)".

    Did nothing special to the flat file connection manager - just set the codepage to 65001 and made sure that the Unicode tickbox was not selected.

    Double-check that your column definition (under Advanced in the connection manager) for your NVARCHAR columns is set to "Unicode string [DT_WSTR]" and the OutputColumnWidth is correct.

    I'd also try deleting your existing output file and run the package again to recreate it - I've found that if the file already exists it retains the existing format irrespective of what your connection manager settings say. If you had a previous version of the output file in ASCII format it could cause problems if you try outputting to that same file.

    Regards,

    Jacob

  • I totally agree with Jacob here. I also experience the same. But I have got different scenario.

    Instead of the NVARCHAR column in the database, we have nTEXT. In the File connection, I have changed the Data type to DT_NTEXT for that column. But while running, it gives an error that:

    "Input column is an NTEXT which is not supported with ANSI files."

    I want the file to be in UTF-8 format only. By checking the UniCode Option in the File Connection, it works fine.

    Please suggest.

  • I finally succeeded by writing the output to a temp.txt file and adding the next PowerShell command to convert it to UTF-8:

    -- Change encoding to UTF-8 with PowerShell

    SET @command = 'powershell -Command "Get-Content '+@Path+'\temp.txt -Encoding Unicode | Set-Content -Encoding UTF8 '+@path+'\'+@filename+'"';

    EXEC xp_cmdshell @command;

  • My first approach was using BCP and xp_Cmdshell but this server is locked down for security reasons; no go for xp_Cmdshell, and I'd rather not open that up.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply