SSIS Derived Column question

Question

Post reply

SSIS Derived Column question

debrarowens

SSC-Addicted

Points: 443
More actions
December 9, 2012 at 4:12 pm

#271810

I'm moving data from a text file to a table using SSIS.
Below is an example of my data and how it needs to be broken down.
To extract the ID eleminits I'm using the Derived column from data flow transformation.
ID = 999#2876400*65F00000#CP0939*CP1-100028896-1101N3#15790-1#16#D
# = delimiter
company = 999
Account = 2876400*65f00000
seq = CP0939*CP1-100028896-1101N3
Trans = 15790-1
GLType = 16
TransType = D
All data elements are variable length, except for company which is 3 in length.
DT_STR does now allow a variable for the length field.
There are 5m+ rows to load.
Is there another way to load the data using SSIS?
Below is my code which did not work as I cannot define the DT_STR length.:unsure:
company = left(ID,3)
Account = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",1)+ 1,FINDSTRING(ID,"#",2)- 1)
Seq = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",2)+ 1,FINDSTRING(ID,"#",3)- 1)
Trans = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",3)+ 1,FINDSTRING(ID,"#",4)- 1)
Etc..

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

debrarowens SSC-Addicted Points: 443 More actions · Answer 1

debrarowens

SSC-Addicted

Points: 443

December 9, 2012 at 5:09 pm

#1566895

Anyone with a suggestion?

robert.gerald.taylor SSChampion Points: 13133 More actions · Answer 2

debrarowens (12/9/2012)
I'm moving data from a text file to a table using SSIS.
Below is an example of my data and how it needs to be broken down.
To extract the ID eleminits I'm using the Derived column from data flow transformation.
ID = 999#2876400*65F00000#CP0939*CP1-100028896-1101N3#15790-1#16#D
# = delimiter
company = 999
Account = 2876400*65f00000
seq = CP0939*CP1-100028896-1101N3
Trans = 15790-1
GLType = 16
TransType = D
All data elements are variable length, except for company which is 3 in length.
DT_STR does now allow a variable for the length field.
There are 5m+ rows to load.
Is there another way to load the data using SSIS?
Below is my code which did not work as I cannot define the DT_STR length.:unsure:
company = left(ID,3)
Account = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",1)+ 1,FINDSTRING(ID,"#",2)- 1)
Seq = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",2)+ 1,FINDSTRING(ID,"#",3)- 1)
Trans = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",3)+ 1,FINDSTRING(ID,"#",4)- 1)
Etc..

Rather than reading in one long line and then parsing out the fields yourself, why not change the Flat File connection to make the delimiter be a '#' ? (I think it defaults to comma) Then your fields will come into your data flow already broken out. See the attached screen capture.

HTH,

Rob

debrarowens SSC-Addicted Points: 443 More actions · Answer 3

Then only element that has a delimiter of '#' is the ID, the ID literally has the '#'.

The file is delimited by Tabs and <cr><lf>.

robert.gerald.taylor SSChampion Points: 13133 More actions · Answer 4

debrarowens (12/10/2012)
All data elements are variable length, except for company which is 3 in length.
DT_STR does now allow a variable for the length field.
There are 5m+ rows to load.
Then only element that has a delimiter of '#' is the ID, the ID literally has the '#'.
The file is delimited by Tabs and <cr><lf>.

Sorry, I misunderstood.

Is the issue that you don't know the exact length of each string field? If so, just pick a value that will be large enough for the longest string for each field. 5 million rows isn't a particularly large number (of course it's all dependent upon your environment); you're not going to see any real performance difference in setting a string column to (DT_STR, 4, 1252) vs. (DT_STR, 8, 1252). Just don't make it too short so that you get truncation.

HTH,

Rob

debrarowens SSC-Addicted Points: 443 More actions · Answer 5

I tried but in some cases the string is to long and it pulls the # plus a few digits from the next field. I was under the impression that the DT_STR set the field length regardless of the items pulled from the substring and findstring. Am I wrong?

sgmunson SSC Guru Points: 110639 More actions · Answer 6

I too, had assumed that you only had 1 delimiter - the #. Then came your response indicating 1 # and a bunch of tabs. And now that I've read the 2nd response you gave, I'm confused all over again, as you indicated that SSIS pulls the # into another field. Now I'm not sure what your delimiters are. Can you post a zip file with a couple sample records?

If this is a one-time use thing, I'd write a quick VBScript with ADO code to do it, and I'd even consider using that same code within Script Task inside of SSIS for other than one-time use. The reason is that you can use the Split function in VBScript to use any delimiter you need. However, getting SSIS to handle a varying delimiter is probably not going to work very well. One can either go with fixed length fields or use a delimiter, but not both at the same time. The table you insert into is the item that needs the fields to be "long enough".

Or have I missed something?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

robert.gerald.taylor SSChampion Points: 13133 More actions · Answer 7

debrarowens (12/10/2012)
I tried but in some cases the string is too long and it pulls the # plus a few digits from the next field. I was under the impression that the DT_STR set the field length regardless of the items pulled from the substring and findstring. Am I wrong?

I think your Mid() calculations in your Derived column transform must be incorrect. What you cast those results to (whether WSTR or DT_STR with different lengths) doesn't change the contents of the flow variable (unless it's too short and you get truncation).

Add a data viewer after your Derived column transform -- see what the results of your derived columns are and step by step tweak these until you're only parsing out the piece you want.

HTH,

Rob

debrarowens SSC-Addicted Points: 443 More actions · Answer 8

Really sorry about the confusion (I know better)!

Here is a data example. I replaced the tab with a pipe '|'.

The item in read is the issue. The other files are easly parsed out because of the delimiter and there size does not matter.

When I pull the third filed 2876400*65F00000 (setting DT_STR to 22 as that is the max lenght) I get 2876400*65F00000#CP215 even though the substring and findstring are pulling the correct lengths.

TWA#2876400*65F00000#CP2152*CP2-100007392-1101N3#15790-1#16#D|C|CDF 100028896 1101N3|65|FSS0006994|CP0939*CP1|FSSOO3|MO|16|120611|120511|1636154<CR/LF>

sgmunson SSC Guru Points: 110639 More actions · Answer 9

Take a look at the following VBScript. I took your data (removing the color directive and the <CR/LF> codes), and pasted it into a TEST_DATA.TXT file.

Dim fso, FH, filename, InLine, I, J

Set fso = WScript.CreateObject("Scripting.FileSystemObject")

filename = "C:\Users\c50522\Documents\TEST_DATA.TXT"

Set FH = fso.OpenTextFile(filename, 1)

WScript.Echo ""

Do Until FH.AtEndOfStream

InLine = FH.Readline

If InLine <> "" Then

Arr1 = Split(InLine, "|")

On Error Resume Next

Arr2 = Split(Arr1(0), "#")

On Error GoTo 0

WScript.Echo "==================================================="

For I = 0 to UBound(Arr2)

WScript.Echo CStr(I + 1) & " : " & Arr2(I)

Next

WScript.Echo "==================================================="

For J = 1 to UBound(Arr1)

WScript.Echo CStr(I + 1) & " : " & Arr1(J)

I = I + 1

Next

WScript.Echo "==================================================="

End If

Loop

FH.Close

Set FH = Nothing

Set fso = Nothing

WScript.Quit

The output from this is designed to show the result of first using the pipe delimiter to do a split on the entire text line, followed by doing a split on the very first column from the first split, but using the # as the delimiter for that. I output lines of equal signs to show where the change in delimiter occurs. In theory, you could do the same exact thing using Derived Columns. Just do it twice. First, use the pipe to delimit, then use the #.

===================================================

1 : TWA

2 : 2876400*65F00000

3 : CP2152*CP2-100007392-1101N3

4 : 15790-1

5 : 16

6 : D

===================================================

7 : C

8 : CDF 100028896 1101N3

9 : 65

10 : FSS0006994

11 : CP0939*CP1

12 : FSSOO3

13 : MO

14 : 16

15 : 120611

16 : 120511

17 : 1636154

===================================================

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

debrarowens SSC-Addicted Points: 443 More actions · Answer 10

Yes, that does look like it worked.:Wow:

Thanks Steve for taking the time to review and solve!