SSIS Derived Column question

  • I'm moving data from a text file to a table using SSIS.

    Below is an example of my data and how it needs to be broken down.

    To extract the ID eleminits I'm using the Derived column from data flow transformation.

    ID = 999#2876400*65F00000#CP0939*CP1-100028896-1101N3#15790-1#16#D

    # = delimiter

    company = 999

    Account = 2876400*65f00000

    seq = CP0939*CP1-100028896-1101N3

    Trans = 15790-1

    GLType = 16

    TransType = D

    All data elements are variable length, except for company which is 3 in length.

    DT_STR does now allow a variable for the length field.

    There are 5m+ rows to load.

    Is there another way to load the data using SSIS?

    Below is my code which did not work as I cannot define the DT_STR length.:unsure:

    company = left(ID,3)

    Account = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",1)+ 1,FINDSTRING(ID,"#",2)- 1)

    Seq = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",2)+ 1,FINDSTRING(ID,"#",3)- 1)

    Trans = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",3)+ 1,FINDSTRING(ID,"#",4)- 1)

    Etc..

  • Anyone with a suggestion?

  • debrarowens (12/9/2012)


    I'm moving data from a text file to a table using SSIS.

    Below is an example of my data and how it needs to be broken down.

    To extract the ID eleminits I'm using the Derived column from data flow transformation.

    ID = 999#2876400*65F00000#CP0939*CP1-100028896-1101N3#15790-1#16#D

    # = delimiter

    company = 999

    Account = 2876400*65f00000

    seq = CP0939*CP1-100028896-1101N3

    Trans = 15790-1

    GLType = 16

    TransType = D

    All data elements are variable length, except for company which is 3 in length.

    DT_STR does now allow a variable for the length field.

    There are 5m+ rows to load.

    Is there another way to load the data using SSIS?

    Below is my code which did not work as I cannot define the DT_STR length.:unsure:

    company = left(ID,3)

    Account = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",1)+ 1,FINDSTRING(ID,"#",2)- 1)

    Seq = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",2)+ 1,FINDSTRING(ID,"#",3)- 1)

    Trans = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",3)+ 1,FINDSTRING(ID,"#",4)- 1)

    Etc..

    Rather than reading in one long line and then parsing out the fields yourself, why not change the Flat File connection to make the delimiter be a '#' ? (I think it defaults to comma) Then your fields will come into your data flow already broken out. See the attached screen capture.

    HTH,

    Rob

  • Then only element that has a delimiter of '#' is the ID, the ID literally has the '#'.

    The file is delimited by Tabs and <cr><lf>.

  • debrarowens (12/10/2012)


    All data elements are variable length, except for company which is 3 in length.

    DT_STR does now allow a variable for the length field.

    There are 5m+ rows to load.


    Then only element that has a delimiter of '#' is the ID, the ID literally has the '#'.

    The file is delimited by Tabs and <cr><lf>.

    Sorry, I misunderstood.

    Is the issue that you don't know the exact length of each string field? If so, just pick a value that will be large enough for the longest string for each field. 5 million rows isn't a particularly large number (of course it's all dependent upon your environment); you're not going to see any real performance difference in setting a string column to (DT_STR, 4, 1252) vs. (DT_STR, 8, 1252). Just don't make it too short so that you get truncation.

    HTH,

    Rob

  • I tried but in some cases the string is to long and it pulls the # plus a few digits from the next field. I was under the impression that the DT_STR set the field length regardless of the items pulled from the substring and findstring. Am I wrong?

  • I too, had assumed that you only had 1 delimiter - the #. Then came your response indicating 1 # and a bunch of tabs. And now that I've read the 2nd response you gave, I'm confused all over again, as you indicated that SSIS pulls the # into another field. Now I'm not sure what your delimiters are. Can you post a zip file with a couple sample records?

    If this is a one-time use thing, I'd write a quick VBScript with ADO code to do it, and I'd even consider using that same code within Script Task inside of SSIS for other than one-time use. The reason is that you can use the Split function in VBScript to use any delimiter you need. However, getting SSIS to handle a varying delimiter is probably not going to work very well. One can either go with fixed length fields or use a delimiter, but not both at the same time. The table you insert into is the item that needs the fields to be "long enough".

    Or have I missed something?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • debrarowens (12/10/2012)


    I tried but in some cases the string is too long and it pulls the # plus a few digits from the next field. I was under the impression that the DT_STR set the field length regardless of the items pulled from the substring and findstring. Am I wrong?

    I think your Mid() calculations in your Derived column transform must be incorrect. What you cast those results to (whether WSTR or DT_STR with different lengths) doesn't change the contents of the flow variable (unless it's too short and you get truncation).

    Add a data viewer after your Derived column transform -- see what the results of your derived columns are and step by step tweak these until you're only parsing out the piece you want.

    HTH,

    Rob

  • Really sorry about the confusion (I know better)!

    Here is a data example. I replaced the tab with a pipe '|'.

    The item in read is the issue. The other files are easly parsed out because of the delimiter and there size does not matter.

    When I pull the third filed 2876400*65F00000 (setting DT_STR to 22 as that is the max lenght) I get 2876400*65F00000#CP215 even though the substring and findstring are pulling the correct lengths.

    TWA#2876400*65F00000#CP2152*CP2-100007392-1101N3#15790-1#16#D|C|CDF 100028896 1101N3|65|FSS0006994|CP0939*CP1|FSSOO3|MO|16|120611|120511|1636154<CR/LF>

  • Take a look at the following VBScript. I took your data (removing the color directive and the <CR/LF> codes), and pasted it into a TEST_DATA.TXT file.

    Dim fso, FH, filename, InLine, I, J

    Set fso = WScript.CreateObject("Scripting.FileSystemObject")

    filename = "C:\Users\c50522\Documents\TEST_DATA.TXT"

    Set FH = fso.OpenTextFile(filename, 1)

    WScript.Echo ""

    Do Until FH.AtEndOfStream

    InLine = FH.Readline

    If InLine <> "" Then

    Arr1 = Split(InLine, "|")

    On Error Resume Next

    Arr2 = Split(Arr1(0), "#")

    On Error GoTo 0

    WScript.Echo "==================================================="

    For I = 0 to UBound(Arr2)

    WScript.Echo CStr(I + 1) & " : " & Arr2(I)

    Next

    WScript.Echo "==================================================="

    For J = 1 to UBound(Arr1)

    WScript.Echo CStr(I + 1) & " : " & Arr1(J)

    I = I + 1

    Next

    WScript.Echo "==================================================="

    End If

    Loop

    FH.Close

    Set FH = Nothing

    Set fso = Nothing

    WScript.Quit

    The output from this is designed to show the result of first using the pipe delimiter to do a split on the entire text line, followed by doing a split on the very first column from the first split, but using the # as the delimiter for that. I output lines of equal signs to show where the change in delimiter occurs. In theory, you could do the same exact thing using Derived Columns. Just do it twice. First, use the pipe to delimit, then use the #.

    ===================================================

    1 : TWA

    2 : 2876400*65F00000

    3 : CP2152*CP2-100007392-1101N3

    4 : 15790-1

    5 : 16

    6 : D

    ===================================================

    7 : C

    8 : CDF 100028896 1101N3

    9 : 65

    10 : FSS0006994

    11 : CP0939*CP1

    12 : FSSOO3

    13 : MO

    14 : 16

    15 : 120611

    16 : 120511

    17 : 1636154

    ===================================================

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Yes, that does look like it worked.:Wow:

    Thanks Steve for taking the time to review and solve!

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply