Split String in SSIS

  • Hello all,

    I have a table where one column contains comma delimited string

    ID vlaue_string

    1 23:45:355:68

    1 4545:908:24

    2 343:32

    3 812:321:23434:34:45:68

    There is maximum number of tokens in the string - 6

    I need to split each string and put every token in its respective column for further manipulation:

    ID vlaue_string Col1 Col2 Col3 Col4 Col5 Col6

    1 23:45:35:68 23 45 35 68 null null

    1 15:908:24 15 908 24 null null null

    2 343:32 343 32 null null null null

    3 812:321:434:34:45:68 812 321 434 34 45 68

    Could someone please help me to do it?

    Is it possible at all to do it in SSIS, or in T-SQL? Or I have to do all processing in CLR?

    Thanks a lot in advance!

    Pit.

  • It can be done in SSIS, but you have to use a Script Component to do it. Within the Script Component you would use the Split function which creates an array then you can add the columns to the Script Components output by looping through the array.

    You can also do it in T-SQL like this:

    DECLARE @table TABLE (ID INT, value VARCHAR(100))

    INSERT INTO @table (

    ID,

    value

    )

    SELECT

    1, '23:45:355:68'

    Union All

    Select

    1, '4545:908:24'

    Union All

    Select

    2, '343:32'

    Union All

    Select

    3, '812:321:23434:34:45:68'

    /*

    this builds a tally or numbers cte to help do the split

    see this article http://www.sqlservercentral.com/articles/TSQL/62867/

    for an explanation of the tally table and the split function that follows

    */

    ;WITH cteTally AS

    (

    SELECT TOP 100

    ROW_NUMBER() OVER(ORDER BY NAME) as n

    FROM

    sys.all_columns AS AC

    ),

    cteData As

    (

    SELECT

    /*

    I need this to uniquely identitfy each row since your

    test data includes 2 rows with id 1. If the id column

    is unique then you can skip this step and replace value_id

    with id in later steps

    */

    ROW_NUMBER() OVER (ORDER BY T.id) AS value_id,

    T.ID,

    T.value

    FROM

    @table T

    ),

    /*

    this splits the values into multiple rows identifying each row

    for each value_id for use to PIVOT the data

    */

    cteValues AS

    (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY T.value_id ORDER BY T.id) AS row_id,

    T.value_id,

    T.ID,

    SUBSTRING(':'+T.value+':',N+1,CHARINDEX(':',':'+T.value+':',N+1)-N-1) AS split_value,

    T.value

    FROM

    cteData AS T CROSS JOIN

    cteTally AS TL

    WHERE

    N < LEN(':'+T.value+':') AND

    SUBSTRING(':'+T.value+':',N,1) = ':'

    )

    /*

    Here is where we return the final data in the format you want,

    I used the MIN function but you could use MAX as each row will only

    have one value for each column. See this article for details on the

    pivoting http://www.sqlservercentral.com/articles/T-SQL/63681/

    The first example uses the PIVOT function the second (Commented out) does not.

    */

    SELECT

    id,

    value,

    [1] AS Col1,

    [2] AS Col2,

    [3] AS Col3,

    [4] AS Col4,

    [5] AS Col5,

    [6] AS Col6

    FROM

    (SELECT value_id, row_id, id, value, split_value FROM cteValues) AS Src

    PIVOT (MIN(split_value) FOR row_id IN ([1],[2],[3], [4], [5], [6]))AS pvt

    ORDER BY

    id,

    value

    /*

    SELECT

    id,

    value,

    Min(CASE

    WHEN row_id = 1 THEN split_value

    ELSE NULL

    END) AS col1,

    Min(CASE

    WHEN row_id = 2 THEN split_value

    ELSE NULL

    END) AS col2,

    Min(CASE

    WHEN row_id = 3 THEN split_value

    ELSE NULL

    END) AS col3,

    Min(CASE

    WHEN row_id = 4 THEN split_value

    ELSE NULL

    END) AS col4,

    Min(CASE

    WHEN row_id = 5 THEN split_value

    ELSE NULL

    END) AS col5,

    Min(CASE

    WHEN row_id = 6 THEN split_value

    ELSE NULL

    END) AS col6

    FROM

    cteValues

    GROUP BY

    value_id,

    id,

    value

    */

    Which you use (SSIS or T-SQL) depends on what you need to do next.

  • Thanks a lot, Jack!

    It all worked perfect! And performance with Tally table is great!

    Thanks a lot for your help!

    Pit.

  • Glad I could help. A tally table has a lot of uses, almost anytime you think you need a loop see if there is a way to use a tally table.

  • Using XML its easier

    http://www.sqlservercentral.com/articles/XML/61618/[/url]

    regards.

  • oddiejbp (10/28/2009)


    Using XML its easier

    http://www.sqlservercentral.com/articles/XML/61618/[/url]

    regards.

    This approach would worsen the load performance. If you have to load simple CSV file, the much better approach is to use directly the standard bcp facility.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • Jack Corbett (3/19/2009)


    It can be done in SSIS, but you have to use a Script Component to do it. Within the Script Component you would use the Split function which creates an array then you can add the columns to the Script Components output by looping through the array.

    Hi Jack - would please be able to provide an example on how to do this using the Scrip Component?

    I'm trying to accomplish the same thing, but I so far I have never needed to use that Component before.

    Thanks

    /Matthew

  • Matthew,

    Here's an example of what you could do. In the example I don't pivot the data as I just want to split the string and have multiple rows. You should be able to figure out how to Pivot it if that is your need:

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    Dim RowId As String = Row.RowID

    Dim DelimitedList As String = Row.DelimitedList

    Dim delimiter As String = ","

    If Not String.IsNullOrEmpty(RowId.Trim) Then

    If Not (String.IsNullOrEmpty(DelimitedList)) Then

    Dim DelimitedListArray() As String = DelimitedList.Split(New String() {delimiter}, StringSplitOptions.RemoveEmptyEntries)

    For Each item As String In DelimitedListArray

    With Output0Buffer

    .AddRow()

    .RowId = RowId

    .Item = item.Replace(ControlChars.Cr, "").Replace(ControlChars.Lf, "").Replace(ControlChars.Tab, "").Trim()

    End With

    Next

    End If

    End If

    End Sub

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply