Export/Import of Text fields using CSV Files

  • I have been using CSV files for a one-off data conversion - it is the only mechanism I have available. The good thing is that I have control of both the export and the import.

    One of the fields I am extracting has a TEXT datatype and it contains free-text notes. There are all sorts of problematic characters in there (double quotes, single quotes, commas, tabs - you name it). These characters need to be removed or replaced as they are extracted, or they will cause a problem on upload.

    But there does not appear to be any way to run a REPLACE command on a text field - either in T-SQL or in an SSIS Script Component.

    In the end, I CAST the field to Varchar(4000) on extract and was able to use replace on that. Some data was truncated, but it was an acceptably small amount. But I would love to know whether anyone else has dealt with this problem without losing data? I would like to be able to do this in the pipeline somehow and not as a bulk replace on the CSV file once it has been created.

    Cheers for any input

    Phil

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (5/9/2009)


    I have been using CSV files for a one-off data conversion - it is the only mechanism I have available. The good thing is that I have control of both the export and the import.

    One of the fields I am extracting has a TEXT datatype and it contains free-text notes. There are all sorts of problematic characters in there (double quotes, single quotes, commas, tabs - you name it). These characters need to be removed or replaced as they are extracted, or they will cause a problem on upload.

    But there does not appear to be any way to run a REPLACE command on a text field - either in T-SQL or in an SSIS Script Component.

    In the end, I CAST the field to Varchar(4000) on extract and was able to use replace on that. Some data was truncated, but it was an acceptably small amount. But I would love to know whether anyone else has dealt with this problem without losing data? I would like to be able to do this in the pipeline somehow and not as a bulk replace on the CSV file once it has been created.

    Cheers for any input

    Phil

    Phil,

    What about using derived column with expression using REPLACE function in it?

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • Thanks for the idea - tried it and:

    The function REPLACE does not support the data type "DT_TEXT" for parameter number 1. The type of the parameter could not be explicitly cast into a compatible type for the function. To perform this function, the operand needs to be explicitly cast with a cast operator.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (5/11/2009)


    Thanks for the idea - tried it and:

    The function REPLACE does not support the data type "DT_TEXT" for parameter number 1. The type of the parameter could not be explicitly cast into a compatible type for the function. To perform this function, the operand needs to be explicitly cast with a cast operator.

    What is the issue when you try to use script component?

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • I overcome this issue by converting the text column back to string and then doing the replace in SSIS Script component.

    Here is that script:

    Imports System

    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

    Public Class ScriptMain Inherits UserComponent

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    Dim rowString As String

    rowString = Me.ConvertBlobToString(Row.RawDataLine)

    Row.CleanLine = Replace(rowString, "$", String.Empty)

    End Sub

    Private Function ConvertBlobToString(ByVal blobColumn As Microsoft.SqlServer.Dts.Pipeline.BlobColumn) As String

    If blobColumn Is Nothing Then

    Return (String.Empty)

    Else

    Return (System.Text.Encoding.Default.GetString(blobColumn.GetBlobData(0, Convert.ToInt32(blobColumn.Length))))

    End If

    End Function

    End Class

    --Ramesh


  • CozyRoc (5/11/2009)


    What is the issue when you try to use script component?

    OK, I did this in a test environment to check. I have an OLEDB Source which just does a SELECT TOP 10 LongNotes ... where LongNotes is a text field.

    Add a Script Component and configure the input column, LongNotes, to be ReadWrite.

    Then, in the script, I added the following line:

    Row.longnotes = Replace(Row.longnotes, "a", "b")

    Which does not validate. The error is:

    Value of type Microsoft.SqlServer,Dts.Pipeline.BlobColumn cannot be converted to string.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Ramesh (5/11/2009)


    I overcome this issue by converting the text column back to string and then doing the replace in SSIS Script component.

    Here is that script:

    Imports System

    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

    Public Class ScriptMain Inherits UserComponent

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    Dim rowString As String

    rowString = Me.ConvertBlobToString(Row.RawDataLine)

    Row.CleanLine = Replace(rowString, "$", String.Empty)

    End Sub

    Private Function ConvertBlobToString(ByVal blobColumn As Microsoft.SqlServer.Dts.Pipeline.BlobColumn) As String

    If blobColumn Is Nothing Then

    Return (String.Empty)

    Else

    Return (System.Text.Encoding.Default.GetString(blobColumn.GetBlobData(0, Convert.ToInt32(blobColumn.Length))))

    End If

    End Function

    End Class

    Wow, some cool stuff there! Is there a limit on the total number of characters that ConvertBlobToString() can process and return?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (5/11/2009)


    Wow, some cool stuff there! Is there a limit on the total number of characters that ConvertBlobToString() can process and return?

    I guess the limit of String data type is limited to the available memory or 2GB (i.e. 2^31 Unicode characters).

    See this MSDN link for more info

    http://msdn.microsoft.com/en-us/library/thwcx436(VS.80).aspx

    --Ramesh


  • Phil Parkin (5/11/2009)


    Ramesh (5/11/2009)


    I overcome this issue by converting the text column back to string and then doing the replace in SSIS Script component.

    Here is that script:

    Imports System

    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

    Public Class ScriptMain Inherits UserComponent

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    Dim rowString As String

    rowString = Me.ConvertBlobToString(Row.RawDataLine)

    Row.CleanLine = Replace(rowString, "$", String.Empty)

    End Sub

    Private Function ConvertBlobToString(ByVal blobColumn As Microsoft.SqlServer.Dts.Pipeline.BlobColumn) As String

    If blobColumn Is Nothing Then

    Return (String.Empty)

    Else

    Return (System.Text.Encoding.Default.GetString(blobColumn.GetBlobData(0, Convert.ToInt32(blobColumn.Length))))

    End If

    End Function

    End Class

    Wow, some cool stuff there! Is there a limit on the total number of characters that ConvertBlobToString() can process and return?

    The solution Ramesh has sent will work fine for up to 2GB of data, based on the fact it accepts as length parameter Int32. You can check here for another TSQL-based solution.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • CozyRoc & Ramesh - great work, thanks for the help.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Hi guys - I bet you thought that this was all over? :hehe:

    I tried out Ramesh's code and have a follow-up question: what is the data type of 'CleanLine'? Looks very much like a DT_WSTR and therefore subject to 4,000 character maximum ...?

    I tried to write an inverse function - ConvertStringToBlob() - but could not work out how to do it.

    Surely I need such a function to avoid the 4,000 char limit - or am I missing something?

    Thanks again.

    Phil

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (5/11/2009)


    Hi guys - I bet you thought that this was all over? :hehe:

    I tried out Ramesh's code and have a follow-up question: what is the data type of 'CleanLine'? Looks very much like a DT_WSTR and therefore subject to 4,000 character maximum ...?

    I tried to write an inverse function - ConvertStringToBlob() - but could not work out how to do it.

    Surely I need such a function to avoid the 4,000 char limit - or am I missing something?

    Thanks again.

    Phil

    Phil,

    I'm guessing the target column is either DT_TEXT (ANSI Text) or DT_NTEXT (Unicode). Check below for ConvertStringToBlob function:

    Private Sub ConvertStringToBlob(ByVal input As String, ByVal blobColumn As Microsoft.SqlServer.Dts.Pipeline.BlobColumn)

    If input Is Nothing Then

    Call blobColumn.SetNull()

    Else

    Call blobColumn.AddBlobData(System.Text.Encoding.Default.GetBytes(input))

    End If

    End Sub

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • CozyRoc (5/11/2009)


    Phil Parkin (5/11/2009)


    Hi guys - I bet you thought that this was all over? :hehe:

    I tried out Ramesh's code and have a follow-up question: what is the data type of 'CleanLine'? Looks very much like a DT_WSTR and therefore subject to 4,000 character maximum ...?

    I tried to write an inverse function - ConvertStringToBlob() - but could not work out how to do it.

    Surely I need such a function to avoid the 4,000 char limit - or am I missing something?

    Thanks again.

    Phil

    Phil,

    I'm guessing the target column is either DT_TEXT (ANSI Text) or DT_NTEXT (Unicode). Check below for ConvertStringToBlob function:

    Private Sub ConvertStringToBlob(ByVal input As String, ByVal blobColumn As Microsoft.SqlServer.Dts.Pipeline.BlobColumn)

    If input Is Nothing Then

    Call blobColumn.SetNull()

    Else

    Call blobColumn.AddBlobData(System.Text.Encoding.Default.GetBytes(input))

    End If

    End Sub

    CozyRoc, your sub nailed it. Ace work 😎

    I also tried adding a new field to the Output Buffer, type DT_TEXT, then the following code:

    Output0Buffer.AddRow()

    Output0Buffer.FixNotes = ConvertBlobToString(Row.longnotes)

    where FixNotes is the name of the field I added to the Output Buffer and LongNotes is the Text field. This does not validate - I get a message about the field FixNotes being read only.

    But the problem is now resolved and, in future, I am sure I will definitely use this.

    Phil

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (5/11/2009)


    Hi guys - I bet you thought that this was all over? :hehe:

    I tried out Ramesh's code and have a follow-up question: what is the data type of 'CleanLine'? Looks very much like a DT_WSTR and therefore subject to 4,000 character maximum ...?

    I tried to write an inverse function - ConvertStringToBlob() - but could not work out how to do it.

    Surely I need such a function to avoid the 4,000 char limit - or am I missing something?

    Thanks again.

    Phil

    In this case, "CleanLine" is indeed a DT_WSTR with 4000 max. characters, but I assumed:-) that you understood that "CleanLine" needs to be reversed to TEXT data before sending it to output buffer.

    --Ramesh


  • Ramesh (5/12/2009)


    In this case, "CleanLine" is indeed a DT_WSTR with 4000 max. characters, but I assumed:-) that you understood that "CleanLine" needs to be reversed to TEXT data before sending it to output buffer.

    Hey, I'm not as good at this stuff as you think! 😀 It's not as if this 'reversing' is as trivial as you have made it sound either!

    Got there in the end though, thanks to you guys. I have a really neat solution now.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply