|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, January 28, 2013 2:59 PM
Points: 9,
Visits: 37
|
|
I'm moving data from a text file to a table using SSIS. Below is an example of my data and how it needs to be broken down.
To extract the ID eleminits I'm using the Derived column from data flow transformation. ID = 999#2876400*65F00000#CP0939*CP1-100028896-1101N3#15790-1#16#D
# = delimiter company = 999 Account = 2876400*65f00000 seq = CP0939*CP1-100028896-1101N3 Trans = 15790-1 GLType = 16 TransType = D
All data elements are variable length, except for company which is 3 in length. DT_STR does now allow a variable for the length field. There are 5m+ rows to load.
Is there another way to load the data using SSIS? Below is my code which did not work as I cannot define the DT_STR length.
company = left(ID,3) Account = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",1)+ 1,FINDSTRING(ID,"#",2)- 1) Seq = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",2)+ 1,FINDSTRING(ID,"#",3)- 1) Trans = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",3)+ 1,FINDSTRING(ID,"#",4)- 1) Etc..
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, January 28, 2013 2:59 PM
Points: 9,
Visits: 37
|
|
| Anyone with a suggestion?
|
|
|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: Today @ 11:10 AM
Points: 740,
Visits: 785
|
|
debrarowens (12/9/2012)
I'm moving data from a text file to a table using SSIS. Below is an example of my data and how it needs to be broken down. To extract the ID eleminits I'm using the Derived column from data flow transformation. ID = 999#2876400*65F00000#CP0939*CP1-100028896-1101N3#15790-1#16#D # = delimiter company = 999 Account = 2876400*65f00000 seq = CP0939*CP1-100028896-1101N3 Trans = 15790-1 GLType = 16 TransType = D All data elements are variable length, except for company which is 3 in length. DT_STR does now allow a variable for the length field. There are 5m+ rows to load. Is there another way to load the data using SSIS? Below is my code which did not work as I cannot define the DT_STR length.  company = left(ID,3) Account = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",1)+ 1,FINDSTRING(ID,"#",2)- 1) Seq = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",2)+ 1,FINDSTRING(ID,"#",3)- 1) Trans = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",3)+ 1,FINDSTRING(ID,"#",4)- 1) Etc.. Rather than reading in one long line and then parsing out the fields yourself, why not change the Flat File connection to make the delimiter be a '#' ? (I think it defaults to comma) Then your fields will come into your data flow already broken out. See the attached screen capture.
HTH, Rob
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, January 28, 2013 2:59 PM
Points: 9,
Visits: 37
|
|
Then only element that has a delimiter of '#' is the ID, the ID literally has the '#'. The file is delimited by Tabs and <cr><lf>.
|
|
|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: Today @ 11:10 AM
Points: 740,
Visits: 785
|
|
debrarowens (12/10/2012) All data elements are variable length, except for company which is 3 in length. DT_STR does now allow a variable for the length field. There are 5m+ rows to load.
Then only element that has a delimiter of '#' is the ID, the ID literally has the '#'. The file is delimited by Tabs and <cr><lf>.
Sorry, I misunderstood.
Is the issue that you don't know the exact length of each string field? If so, just pick a value that will be large enough for the longest string for each field. 5 million rows isn't a particularly large number (of course it's all dependent upon your environment); you're not going to see any real performance difference in setting a string column to (DT_STR, 4, 1252) vs. (DT_STR, 8, 1252). Just don't make it too short so that you get truncation.
HTH, Rob
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, January 28, 2013 2:59 PM
Points: 9,
Visits: 37
|
|
| I tried but in some cases the string is to long and it pulls the # plus a few digits from the next field. I was under the impression that the DT_STR set the field length regardless of the items pulled from the substring and findstring. Am I wrong?
|
|
|
|
|
UDP Broadcaster
      
Group: General Forum Members
Last Login: Wednesday, May 15, 2013 8:20 AM
Points: 1,446,
Visits: 1,883
|
|
I too, had assumed that you only had 1 delimiter - the #. Then came your response indicating 1 # and a bunch of tabs. And now that I've read the 2nd response you gave, I'm confused all over again, as you indicated that SSIS pulls the # into another field. Now I'm not sure what your delimiters are. Can you post a zip file with a couple sample records?
If this is a one-time use thing, I'd write a quick VBScript with ADO code to do it, and I'd even consider using that same code within Script Task inside of SSIS for other than one-time use. The reason is that you can use the Split function in VBScript to use any delimiter you need. However, getting SSIS to handle a varying delimiter is probably not going to work very well. One can either go with fixed length fields or use a delimiter, but not both at the same time. The table you insert into is the item that needs the fields to be "long enough".
Or have I missed something?
Steve (aka sgmunson)
   Weight Loss Tips
|
|
|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: Today @ 11:10 AM
Points: 740,
Visits: 785
|
|
debrarowens (12/10/2012) I tried but in some cases the string is too long and it pulls the # plus a few digits from the next field. I was under the impression that the DT_STR set the field length regardless of the items pulled from the substring and findstring. Am I wrong? I think your Mid() calculations in your Derived column transform must be incorrect. What you cast those results to (whether WSTR or DT_STR with different lengths) doesn't change the contents of the flow variable (unless it's too short and you get truncation).
Add a data viewer after your Derived column transform -- see what the results of your derived columns are and step by step tweak these until you're only parsing out the piece you want.
HTH, Rob
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, January 28, 2013 2:59 PM
Points: 9,
Visits: 37
|
|
Really sorry about the confusion (I know better)!
Here is a data example. I replaced the tab with a pipe '|'. The item in read is the issue. The other files are easly parsed out because of the delimiter and there size does not matter.
When I pull the third filed 2876400*65F00000 (setting DT_STR to 22 as that is the max lenght) I get 2876400*65F00000#CP215 even though the substring and findstring are pulling the correct lengths.
[color=red]TWA#2876400*65F00000#CP2152*CP2-100007392-1101N3#15790-1#16#D[/color]|C|CDF 100028896 1101N3|65|FSS0006994|CP0939*CP1|FSSOO3|MO|16|120611|120511|1636154<CR/LF>
|
|
|
|
|
UDP Broadcaster
      
Group: General Forum Members
Last Login: Wednesday, May 15, 2013 8:20 AM
Points: 1,446,
Visits: 1,883
|
|
Take a look at the following VBScript. I took your data (removing the color directive and the <CR/LF> codes), and pasted it into a TEST_DATA.TXT file.
Dim fso, FH, filename, InLine, I, J Set fso = WScript.CreateObject("Scripting.FileSystemObject") filename = "C:\Users\c50522\Documents\TEST_DATA.TXT" Set FH = fso.OpenTextFile(filename, 1) WScript.Echo "" Do Until FH.AtEndOfStream InLine = FH.Readline If InLine <> "" Then Arr1 = Split(InLine, "|") On Error Resume Next Arr2 = Split(Arr1(0), "#") On Error GoTo 0 WScript.Echo "===================================================" For I = 0 to UBound(Arr2) WScript.Echo CStr(I + 1) & " : " & Arr2(I) Next WScript.Echo "===================================================" For J = 1 to UBound(Arr1) WScript.Echo CStr(I + 1) & " : " & Arr1(J) I = I + 1 Next WScript.Echo "===================================================" End If Loop FH.Close Set FH = Nothing Set fso = Nothing WScript.Quit
The output from this is designed to show the result of first using the pipe delimiter to do a split on the entire text line, followed by doing a split on the very first column from the first split, but using the # as the delimiter for that. I output lines of equal signs to show where the change in delimiter occurs. In theory, you could do the same exact thing using Derived Columns. Just do it twice. First, use the pipe to delimit, then use the #.
=================================================== 1 : TWA 2 : 2876400*65F00000 3 : CP2152*CP2-100007392-1101N3 4 : 15790-1 5 : 16 6 : D =================================================== 7 : C 8 : CDF 100028896 1101N3 9 : 65 10 : FSS0006994 11 : CP0939*CP1 12 : FSSOO3 13 : MO 14 : 16 15 : 120611 16 : 120511 17 : 1636154 ===================================================
Steve (aka sgmunson)
   Weight Loss Tips
|
|
|
|