Need Help to split Single String data and insert into many columns.

  • Dear All,

    I am working on a project for loading data from flat file source into a table using SSIS package. This SSIS ETL configuration is successfully done. There are thousands of records loaded very fast into the 'staging tables' of MS SQL Server 2008 database.

    Now my new task is that I have to split some string data from this 'staging table' and insert into many columns along with the other existing columns of this 'staging table' into the "FINAL TABLE'.

    For example, the 'staging table' contains 12 columns. I have to split the data of 'Column2' into two parts, and I have to split the data of the 'Column8' into six parts, and then insert the data into the 'final table' which has 18 columns. The strings of 'Column2' do not have any delimiter, we have to count the characters. But the strings of 'Column8' has two types of delimiter such as * and # respectively.

    The 18 columns of 'FINAL TABLE' are the remaining 10 columns of the 'staging table' plus 8 NEW COLUMNS which are created from splitting the data of 'Column2' and 'Column8' of the 'staging table'.

    Now I am in the middle of no where.

    Please help me.

    Thanks in advance.

    Raymond

  • Please read the first article I reference below in my signature block regarding asking for assistance. To really help you it would help to have the DDL for the tables, some sample data, and expected results. If you follow the instructions in the article you will also get TESTED code in return.

  • Try looking at the left(), mid(), and charindex() functions in books online. You should be able to parse anything you need with those. There might be better ways but those would get you started.

  • From your description, it should be relatively easy to use the Substring function to split out those columns.

    For example, something like this will handle column 2:

    create table #T (

    ID int identity primary key,

    Col1 varchar(50),

    Col2 varchar(50));

    insert into #T (Col1, Col2)

    select '1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ', '1234*567890ABCDEFGH#IJKLMNOPQ#RSTUVWXYZ';

    select substring(Col1, 1, 10), substring(Col1, 11, 10), substring(Col1, 21, 50)

    from #T;

    I need to know if the delimiters in column 8 are in a specific sequence, or if it's variable. That'll determine the complexity of that part. Also, does it always split into a fixed number of columns, or is that variable?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Wouldn't it be easier to load the data back to SSIS pack and process them properly there? Or process the data and load them in their final state to the database.

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • Many thanks to all.

    Dear GSquared,

    The delimiters in 'Column8' are not in sequence. And it does not return the same number of columns either. It is very complex.

    The minimum string length of 'Column8' is 9 characters, and maximum is 125 characters. For example, let's take a look at the following -

    Data1:

    1AA#ffjddxmv*2CC#jkdfgjkgdkdjdfgdererndndffdndfgndfgd

    Data2:

    3DD#klvdfvncvnthrtowejdfgndfgndfgeiherurhsfjgdfgdhdgdfhgfhdsjgdfgh

    Data3:

    7HH#dnkjgrgdffnfgjknfnnj16456+6*#jfejfgfdjkgdfgjkfngrtyiyurouj908urtyuriojrtgiojrtkhonmghnfghmkfgn

    Now let's take a look at how the output is going to look like (please follow the comma , delimiters) -

    Data1:

    "1","AA","ffjddxmv","2","CC","jkdfgjkgdkdjdfgdererndndffdndfgndfgd"

    Data2:

    "3","DD","klvdfvncvnthrtowejdfgndfgndfgeiherurhsfjgdfgdhdgdfhgfhdsjgdfgh"

    Data3:

    "7","HH","dnkjgrgdffnfgjknfnnj16456+6",",","jfejfgfdjkgdfgjkfngrtyiyurouj908urtyuriojrtgiojrtkhonmghnfghmkfgn"

    Thanks again and Best Regards.

    Raymond

    GSquared (10/7/2009)


    From your description, it should be relatively easy to use the Substring function to split out those columns.

    For example, something like this will handle column 2:

    create table #T (

    ID int identity primary key,

    Col1 varchar(50),

    Col2 varchar(50));

    insert into #T (Col1, Col2)

    select '1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ', '1234*567890ABCDEFGH#IJKLMNOPQ#RSTUVWXYZ';

    select substring(Col1, 1, 10), substring(Col1, 11, 10), substring(Col1, 21, 50)

    from #T;

    I need to know if the delimiters in column 8 are in a specific sequence, or if it's variable. That'll determine the complexity of that part. Also, does it always split into a fixed number of columns, or is that variable?

  • From that, I'm extrapolating that an asterisk in the field indicates that either the next character is a column, or that the next number is a column. Can you clarify the rule on that?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • It appears that this can be acomplshed with the use of two user defined database functions. The data is composed of two parts seperated by a "*". Each of the substrings can have three parts. The first two appear to be fixed lengths of 1 and 2, if they exist. There is a '#' seperator followed by a variable length string. The 125 just represents the max string length you indicated.

    The first function ParseFunction1 would take the entire string @InString and the column number to return (1-6) @ColNum. Find the charindex of the "*" and store in a variable @DelimiterIndex. Then just call function 2 as follows:

    select @DelimiterIndex = charindex(@InString, '*')

    select @ResultString = case

    when @DelimiterIndex < 1 and @ColNum in (1, 2, 3) then dbo.ParseFunction2(@InString, @ColNum) -- only one substring

    when @DelimiterIndex < 1 and @ColNum in (4, 5, 6) then null -- Second substring is all null

    when @ColNum = 1 then dbo.ParseFunction2(substring(@InString, 1, @DelmiterIndex - 1), 1)

    when @ColNum = 2 then dbo.ParseFunction2(substring(@InString, 1, @DelmiterIndex - 1), 2)

    when @ColNum = 3 then dbo.ParseFunction2(substring(@InString, 1, @DelmiterIndex - 1), 3)

    when @ColNum = 4 then dbo.ParseFunction2(substring(@InString, @DelmiterIndex + 1, 125), 1)

    when @ColNum = 5 then dbo.ParseFunction2(substring(@InString, @DelmiterIndex + 1, 125), 2)

    when @ColNum = 6 then dbo.ParseFunction2(substring(@InString, @DelmiterIndex + 1, 125), 3)

    end

    ParseFunction2 would then take its substring and select the right peice in a similar way. I am not sure substring likes 0 lengths so you might need a special case when @DelimiterIndex is 0 or 1. Something like:

    select @DelimiterIndex = charindex(@instring, '#')

    select @ResultString = case

    when @DelimiterIndex < 1 then null -- All three columns are null

    when @ColNum = 1 and @DelimiterIndex = 1 then null

    when @ColNum = 1 then substring(@Instring, 1,1)

    when @ColNum = 2 and @DelimiterIndex < 3 then null

    when @ColNum = 2 then substring(@Instring, 2,2)

    when @ColNum = 3 then substring(@Instring, @DelimiterIndex+1,125)

    end

  • Matt Wilhoite (10/7/2009)


    Try looking at the left(), mid(), and charindex() functions in books online. You should be able to parse anything you need with those. There might be better ways but those would get you started.

    MID() is in VB. Did you mean SUBSTRING()?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply