April 7, 2008 at 9:00 am
Help, please.
I have a derived column task which is failing on several rows. I'm trying to parse a FirstName field into FirstName and MiddleInitial. I borrowed some expressions from another developer to do so, but for some reason, my code fails when his doesn't. I outputed the failed rows and found that when the FirstName is NULL or 1 letter only, the task will fail on those rows.
Here's my format. FirstName is varchar(25) in the source database. So "Molly Y" should parse out to "Molly" and "Y", but "C" should just parse out to "C" and "Charles" should parse out to "Charles" with no middle initial. From the OLE DB Source, I take a derived column task to split FirstName into FirstName_Parse1 (the "real" firstName) and FirstName_Parse2 (the middle initial).
Expression for FirstName_Parse1 is SUBSTRING(FirstName,LEN(FirstName) - 1,1) == " " ? SUBSTRING(FirstName,1,LEN(FirstName) - 1) : FirstName
Expression for FirstName_Parse2 is SUBSTRING(FirstName,LEN(FirstName) - 1,1) == " " && RIGHT(FirstName,1) != " " ? RIGHT(FirstName,1) : " "
Now, I'm not used to working with expressions prhased like this, so maybe I'm misunderstanding what they're doing.
What do I need to do to these expressions to make the MiddleInitial part fill in with a blank or NULL if there is only a 1 letter FirstName?
And what do I need to do to account for a completely NULL or blank FirstName? I don't mind leaving FirstName blank or NULL. I just need the job not to fail because of the blank or NULL. Does this make sense?
Your thoughts and any assistance is appreciated in advance.
April 7, 2008 at 9:39 am
Ok, I have not tested this, but I would tend to build a more standard parse with a space as your delimiter. This would help you handle some bad situations you will run into.
For the first name, I would have something like this:
FINDSTRING(TRIM(FirstName)," ",1) > 0 ? SUBSTRING(TRIM(FirstName),1,FINDSTRING(TRIM(FirstName)," ",1)-1) : TRIM(FirstName)
Basically:
If there is a space
Give me everything up to the space
Else
Give me everything
For the middle initial or middle name you would have something like:
FINDSTRING(TRIM(FirstName)," ",1) > 0 ? SUBSTRING(TRIM(FirstName),FINDSTRING(TRIM(FirstName)," ",1)+1,LEN(FirstName)) : ""
If there is a space
Give me everything after the space
Else
Give me an empty string
Now, these may be off a bit, but hopefully they will give you the right idea.
These may error on a NULL still - if they do, I would handle NULL before the derived column either through a conditional split or another derived column that turns NULL into empty string.
April 7, 2008 at 10:24 am
Michael,
As much as I would love to do as you've just listed, I can't. I have things in my data such as FirstName of "Ace Insurance" and LastName of "Inc.". The word Insurance is obviously much more than 1 character but comes after a space. And I can't increase the size of the middle initial field because I'm merging two datasets, one from a mainframe dataset which already has a 1 character middle initial, and another one from my sql server dataset which doesn't have a separate middle initial.
Urgh.... If only my life were easy...
April 7, 2008 at 10:26 am
BTW, I should add that I just got a look at the other developer's package and he has everything set to "Ignore Failure" on his package so he doesn't have to deal with this error...
This is frustrating. I want a good solution. Not one that adds bad data into the mix.
April 7, 2008 at 10:29 am
Ok, make that first IF a bit more specific:
FINDSTRING(REVERSE(TRIM(FirstName))," ",1) = 2 && LEN(TRIM(FirstName)) > 2 ? SUBSTRING(TRIM(FirstName),1,FINDSTRING(TRIM(FirstName)," ",1)-1) : TRIM(FirstName)
Of course, you may want to make that first IF conditional part of a conditional split - splitting the data set so you can avoid doing the remaining string operations on some of your records may improve performance.
April 7, 2008 at 10:33 am
I use the "ignore failure" option now and then for this type of thing also - string operations like this will always find some bad data that you could not anticipate. Sometimes generating a NULL value and allowing good data to pass is a good idea.
I do like to use conditional splits to simplify complicated string operations - they have proven to help performance a lot for me in the past. You also have some limits on the complexity and length of a function in a derived column component.
April 7, 2008 at 11:29 am
Michael,
You're BRILLIANT!!
It took me a little more work at first, but given the way the task is executing right now, I really really like your multiple conditional split idea.
I've got 3 splits.
1) ISNULL(LTRIM(RTRIM(FirstName))) == FALSE && LEN(LTRIM(RTRIM(FirstName))) == 1
2) ISNULL(LTRIM(RTRIM(FirstName))) == FALSE && LEN(LTRIM(RTRIM(FirstName))) > 1
3) ISNULL(LTRIM(RTRIM(FirstName))) == TRUE
On condition 3, I use a derived column to fill in the values of "UNKNOWN" on the first & Last names, X for the middle initial and continue.
On Condition 2, I take the task directly to a merge with Condition 3.
On Condition 1, I do my FirstName parsing and truncation as needed. Then I merge the results with the merged set of Conditions 1&3, then I merge that with the Mainframe rows.
I haven't seen a failure yet. @=) Of course, I've just now jinxed myself by saying that. (hee).
Haven't tried your FINDSTRING() formula. Not sure I need it at this point, but I'll keep it in mind. Thank you ever so much for your suggestions & help. I'm incredibly grateful.
Viewing 7 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply