December 9, 2012 at 4:12 pm
I'm moving data from a text file to a table using SSIS.
Below is an example of my data and how it needs to be broken down.
To extract the ID eleminits I'm using the Derived column from data flow transformation.
ID = 999#2876400*65F00000#CP0939*CP1-100028896-1101N3#15790-1#16#D
# = delimiter
company = 999
Account = 2876400*65f00000
seq = CP0939*CP1-100028896-1101N3
Trans = 15790-1
GLType = 16
TransType = D
All data elements are variable length, except for company which is 3 in length.
DT_STR does now allow a variable for the length field.
There are 5m+ rows to load.
Is there another way to load the data using SSIS?
Below is my code which did not work as I cannot define the DT_STR length.:unsure:
company = left(ID,3)
Account = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",1)+ 1,FINDSTRING(ID,"#",2)- 1)
Seq = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",2)+ 1,FINDSTRING(ID,"#",3)- 1)
Trans = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",3)+ 1,FINDSTRING(ID,"#",4)- 1)
Etc..
December 9, 2012 at 5:09 pm
Anyone with a suggestion?
December 9, 2012 at 9:07 pm
debrarowens (12/9/2012)
I'm moving data from a text file to a table using SSIS.Below is an example of my data and how it needs to be broken down.
To extract the ID eleminits I'm using the Derived column from data flow transformation.
ID = 999#2876400*65F00000#CP0939*CP1-100028896-1101N3#15790-1#16#D
# = delimiter
company = 999
Account = 2876400*65f00000
seq = CP0939*CP1-100028896-1101N3
Trans = 15790-1
GLType = 16
TransType = D
All data elements are variable length, except for company which is 3 in length.
DT_STR does now allow a variable for the length field.
There are 5m+ rows to load.
Is there another way to load the data using SSIS?
Below is my code which did not work as I cannot define the DT_STR length.:unsure:
company = left(ID,3)
Account = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",1)+ 1,FINDSTRING(ID,"#",2)- 1)
Seq = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",2)+ 1,FINDSTRING(ID,"#",3)- 1)
Trans = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",3)+ 1,FINDSTRING(ID,"#",4)- 1)
Etc..
Rather than reading in one long line and then parsing out the fields yourself, why not change the Flat File connection to make the delimiter be a '#' ? (I think it defaults to comma) Then your fields will come into your data flow already broken out. See the attached screen capture.
HTH,
Rob
December 10, 2012 at 4:52 am
Then only element that has a delimiter of '#' is the ID, the ID literally has the '#'.
The file is delimited by Tabs and <cr><lf>.
December 10, 2012 at 6:14 am
debrarowens (12/10/2012)
All data elements are variable length, except for company which is 3 in length.DT_STR does now allow a variable for the length field.
There are 5m+ rows to load.
Then only element that has a delimiter of '#' is the ID, the ID literally has the '#'.The file is delimited by Tabs and <cr><lf>.
Sorry, I misunderstood.
Is the issue that you don't know the exact length of each string field? If so, just pick a value that will be large enough for the longest string for each field. 5 million rows isn't a particularly large number (of course it's all dependent upon your environment); you're not going to see any real performance difference in setting a string column to (DT_STR, 4, 1252) vs. (DT_STR, 8, 1252). Just don't make it too short so that you get truncation.
HTH,
Rob
December 10, 2012 at 6:49 am
I tried but in some cases the string is to long and it pulls the # plus a few digits from the next field. I was under the impression that the DT_STR set the field length regardless of the items pulled from the substring and findstring. Am I wrong?
December 10, 2012 at 7:44 am
I too, had assumed that you only had 1 delimiter - the #. Then came your response indicating 1 # and a bunch of tabs. And now that I've read the 2nd response you gave, I'm confused all over again, as you indicated that SSIS pulls the # into another field. Now I'm not sure what your delimiters are. Can you post a zip file with a couple sample records?
If this is a one-time use thing, I'd write a quick VBScript with ADO code to do it, and I'd even consider using that same code within Script Task inside of SSIS for other than one-time use. The reason is that you can use the Split function in VBScript to use any delimiter you need. However, getting SSIS to handle a varying delimiter is probably not going to work very well. One can either go with fixed length fields or use a delimiter, but not both at the same time. The table you insert into is the item that needs the fields to be "long enough".
Or have I missed something?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 10, 2012 at 8:10 am
debrarowens (12/10/2012)
I tried but in some cases the string is too long and it pulls the # plus a few digits from the next field. I was under the impression that the DT_STR set the field length regardless of the items pulled from the substring and findstring. Am I wrong?
I think your Mid() calculations in your Derived column transform must be incorrect. What you cast those results to (whether WSTR or DT_STR with different lengths) doesn't change the contents of the flow variable (unless it's too short and you get truncation).
Add a data viewer after your Derived column transform -- see what the results of your derived columns are and step by step tweak these until you're only parsing out the piece you want.
HTH,
Rob
December 10, 2012 at 8:29 am
Really sorry about the confusion (I know better)!
Here is a data example. I replaced the tab with a pipe '|'.
The item in read is the issue. The other files are easly parsed out because of the delimiter and there size does not matter.
When I pull the third filed 2876400*65F00000 (setting DT_STR to 22 as that is the max lenght) I get 2876400*65F00000#CP215 even though the substring and findstring are pulling the correct lengths.
TWA#2876400*65F00000#CP2152*CP2-100007392-1101N3#15790-1#16#D|C|CDF 100028896 1101N3|65|FSS0006994|CP0939*CP1|FSSOO3|MO|16|120611|120511|1636154<CR/LF>
December 10, 2012 at 9:42 am
Take a look at the following VBScript. I took your data (removing the color directive and the <CR/LF> codes), and pasted it into a TEST_DATA.TXT file.
Dim fso, FH, filename, InLine, I, J
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
filename = "C:\Users\c50522\Documents\TEST_DATA.TXT"
Set FH = fso.OpenTextFile(filename, 1)
WScript.Echo ""
Do Until FH.AtEndOfStream
InLine = FH.Readline
If InLine <> "" Then
Arr1 = Split(InLine, "|")
On Error Resume Next
Arr2 = Split(Arr1(0), "#")
On Error GoTo 0
WScript.Echo "==================================================="
For I = 0 to UBound(Arr2)
WScript.Echo CStr(I + 1) & " : " & Arr2(I)
Next
WScript.Echo "==================================================="
For J = 1 to UBound(Arr1)
WScript.Echo CStr(I + 1) & " : " & Arr1(J)
I = I + 1
Next
WScript.Echo "==================================================="
End If
Loop
FH.Close
Set FH = Nothing
Set fso = Nothing
WScript.Quit
The output from this is designed to show the result of first using the pipe delimiter to do a split on the entire text line, followed by doing a split on the very first column from the first split, but using the # as the delimiter for that. I output lines of equal signs to show where the change in delimiter occurs. In theory, you could do the same exact thing using Derived Columns. Just do it twice. First, use the pipe to delimit, then use the #.
===================================================
1 : TWA
2 : 2876400*65F00000
3 : CP2152*CP2-100007392-1101N3
4 : 15790-1
5 : 16
6 : D
===================================================
7 : C
8 : CDF 100028896 1101N3
9 : 65
10 : FSS0006994
11 : CP0939*CP1
12 : FSSOO3
13 : MO
14 : 16
15 : 120611
16 : 120511
17 : 1636154
===================================================
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 10, 2012 at 9:47 am
Yes, that does look like it worked.:Wow:
Thanks Steve for taking the time to review and solve!
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply