Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

SSIS Derived Column question Expand / Collapse
Author
Message
Posted Sunday, December 09, 2012 4:12 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, January 28, 2013 2:59 PM
Points: 9, Visits: 37
I'm moving data from a text file to a table using SSIS.
Below is an example of my data and how it needs to be broken down.

To extract the ID eleminits I'm using the Derived column from data flow transformation.
ID = 999#2876400*65F00000#CP0939*CP1-100028896-1101N3#15790-1#16#D

# = delimiter
company = 999
Account = 2876400*65f00000
seq = CP0939*CP1-100028896-1101N3
Trans = 15790-1
GLType = 16
TransType = D

All data elements are variable length, except for company which is 3 in length.
DT_STR does now allow a variable for the length field.
There are 5m+ rows to load.

Is there another way to load the data using SSIS?
Below is my code which did not work as I cannot define the DT_STR length.


company = left(ID,3)
Account = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",1)+ 1,FINDSTRING(ID,"#",2)- 1)
Seq = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",2)+ 1,FINDSTRING(ID,"#",3)- 1)
Trans = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",3)+ 1,FINDSTRING(ID,"#",4)- 1)
Etc..
Post #1394419
Posted Sunday, December 09, 2012 5:09 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, January 28, 2013 2:59 PM
Points: 9, Visits: 37
Anyone with a suggestion?
Post #1394423
Posted Sunday, December 09, 2012 9:07 PM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Today @ 11:10 AM
Points: 740, Visits: 785
debrarowens (12/9/2012)
I'm moving data from a text file to a table using SSIS.
Below is an example of my data and how it needs to be broken down.

To extract the ID eleminits I'm using the Derived column from data flow transformation.
ID = 999#2876400*65F00000#CP0939*CP1-100028896-1101N3#15790-1#16#D

# = delimiter
company = 999
Account = 2876400*65f00000
seq = CP0939*CP1-100028896-1101N3
Trans = 15790-1
GLType = 16
TransType = D

All data elements are variable length, except for company which is 3 in length.
DT_STR does now allow a variable for the length field.
There are 5m+ rows to load.

Is there another way to load the data using SSIS?
Below is my code which did not work as I cannot define the DT_STR length.


company = left(ID,3)
Account = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",1)+ 1,FINDSTRING(ID,"#",2)- 1)
Seq = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",2)+ 1,FINDSTRING(ID,"#",3)- 1)
Trans = (DT_STR,??,1252)SUBSTRING(ID,FINDSTRING(ID,"#",3)+ 1,FINDSTRING(ID,"#",4)- 1)
Etc..

Rather than reading in one long line and then parsing out the fields yourself, why not change the Flat File connection to make the delimiter be a '#' ? (I think it defaults to comma) Then your fields will come into your data flow already broken out. See the attached screen capture.

HTH,
Rob


  Post Attachments 
FlatFileConnection_1.PNG (10 views, 49.07 KB)
Post #1394438
Posted Monday, December 10, 2012 4:52 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, January 28, 2013 2:59 PM
Points: 9, Visits: 37
Then only element that has a delimiter of '#' is the ID, the ID literally has the '#'.
The file is delimited by Tabs and <cr><lf>.

Post #1394548
Posted Monday, December 10, 2012 6:14 AM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Today @ 11:10 AM
Points: 740, Visits: 785
debrarowens (12/10/2012)
All data elements are variable length, except for company which is 3 in length.
DT_STR does now allow a variable for the length field.
There are 5m+ rows to load.


Then only element that has a delimiter of '#' is the ID, the ID literally has the '#'.
The file is delimited by Tabs and <cr><lf>.

Sorry, I misunderstood.

Is the issue that you don't know the exact length of each string field? If so, just pick a value that will be large enough for the longest string for each field. 5 million rows isn't a particularly large number (of course it's all dependent upon your environment); you're not going to see any real performance difference in setting a string column to (DT_STR, 4, 1252) vs. (DT_STR, 8, 1252). Just don't make it too short so that you get truncation.

HTH,
Rob
Post #1394569
Posted Monday, December 10, 2012 6:49 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, January 28, 2013 2:59 PM
Points: 9, Visits: 37
I tried but in some cases the string is to long and it pulls the # plus a few digits from the next field. I was under the impression that the DT_STR set the field length regardless of the items pulled from the substring and findstring. Am I wrong?
Post #1394586
Posted Monday, December 10, 2012 7:44 AM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Wednesday, May 15, 2013 8:20 AM
Points: 1,446, Visits: 1,883
I too, had assumed that you only had 1 delimiter - the #. Then came your response indicating 1 # and a bunch of tabs. And now that I've read the 2nd response you gave, I'm confused all over again, as you indicated that SSIS pulls the # into another field. Now I'm not sure what your delimiters are. Can you post a zip file with a couple sample records?

If this is a one-time use thing, I'd write a quick VBScript with ADO code to do it, and I'd even consider using that same code within Script Task inside of SSIS for other than one-time use. The reason is that you can use the Split function in VBScript to use any delimiter you need. However, getting SSIS to handle a varying delimiter is probably not going to work very well. One can either go with fixed length fields or use a delimiter, but not both at the same time. The table you insert into is the item that needs the fields to be "long enough".

Or have I missed something?





Steve
(aka sgmunson)

Weight Loss Tips
Post #1394627
Posted Monday, December 10, 2012 8:10 AM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Today @ 11:10 AM
Points: 740, Visits: 785
debrarowens (12/10/2012)
I tried but in some cases the string is too long and it pulls the # plus a few digits from the next field. I was under the impression that the DT_STR set the field length regardless of the items pulled from the substring and findstring. Am I wrong?

I think your Mid() calculations in your Derived column transform must be incorrect. What you cast those results to (whether WSTR or DT_STR with different lengths) doesn't change the contents of the flow variable (unless it's too short and you get truncation).

Add a data viewer after your Derived column transform -- see what the results of your derived columns are and step by step tweak these until you're only parsing out the piece you want.

HTH,
Rob
Post #1394644
Posted Monday, December 10, 2012 8:29 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, January 28, 2013 2:59 PM
Points: 9, Visits: 37
Really sorry about the confusion (I know better)!

Here is a data example. I replaced the tab with a pipe '|'.
The item in read is the issue. The other files are easly parsed out because of the delimiter and there size does not matter.

When I pull the third filed 2876400*65F00000 (setting DT_STR to 22 as that is the max lenght) I get 2876400*65F00000#CP215 even though the substring and findstring are pulling the correct lengths.

[color=red]TWA#2876400*65F00000#CP2152*CP2-100007392-1101N3#15790-1#16#D[/color]|C|CDF 100028896 1101N3|65|FSS0006994|CP0939*CP1|FSSOO3|MO|16|120611|120511|1636154<CR/LF>

Post #1394652
Posted Monday, December 10, 2012 9:42 AM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Wednesday, May 15, 2013 8:20 AM
Points: 1,446, Visits: 1,883
Take a look at the following VBScript. I took your data (removing the color directive and the <CR/LF> codes), and pasted it into a TEST_DATA.TXT file.

Dim fso, FH, filename, InLine, I, J
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
filename = "C:\Users\c50522\Documents\TEST_DATA.TXT"
Set FH = fso.OpenTextFile(filename, 1)
WScript.Echo ""
Do Until FH.AtEndOfStream
InLine = FH.Readline
If InLine <> "" Then
Arr1 = Split(InLine, "|")
On Error Resume Next
Arr2 = Split(Arr1(0), "#")
On Error GoTo 0
WScript.Echo "==================================================="
For I = 0 to UBound(Arr2)
WScript.Echo CStr(I + 1) & " : " & Arr2(I)
Next
WScript.Echo "==================================================="
For J = 1 to UBound(Arr1)
WScript.Echo CStr(I + 1) & " : " & Arr1(J)
I = I + 1
Next
WScript.Echo "==================================================="
End If
Loop
FH.Close
Set FH = Nothing
Set fso = Nothing
WScript.Quit

The output from this is designed to show the result of first using the pipe delimiter to do a split on the entire text line, followed by doing a split on the very first column from the first split, but using the # as the delimiter for that. I output lines of equal signs to show where the change in delimiter occurs. In theory, you could do the same exact thing using Derived Columns. Just do it twice. First, use the pipe to delimit, then use the #.

===================================================
1 : TWA
2 : 2876400*65F00000
3 : CP2152*CP2-100007392-1101N3
4 : 15790-1
5 : 16
6 : D
===================================================
7 : C
8 : CDF 100028896 1101N3
9 : 65
10 : FSS0006994
11 : CP0939*CP1
12 : FSSOO3
13 : MO
14 : 16
15 : 120611
16 : 120511
17 : 1636154
===================================================



Steve
(aka sgmunson)

Weight Loss Tips
Post #1394686
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse