Split one Name Column into Two Colmns using SSIS

  • I am working with SQL Server 2012 and Visual Studio 2010 with Excel 2007. I am building a package that will pull information from a .CSV file and import it into a table in a DB that we are creating for an HRIS project.

    The .CSV file has the following fields starting in B9 and running through column J (row length will vary)

    | PersNo | IDNo | Name | PArea | OrgUnit | OrgName | UserID | EntryDate |

    The issue I have is that I need to take the Name column and split it into a FirstName and LastName columns. I have scoured the internet and found a number of forums that detail methods that use Derived Columns, Conditional Split, Script Component and MultiCast. None of them were helpful or fit my needs.

    The issue is complicated by the name arrangement. I have three different name types that can be displayed.

    John Smith

    John A Smith

    John Smith III

    The last two they will need to be split differently.

    If there is a middle initial they want it split as such.

    | FirstName | LastName |

    ------------------------

    | John | Smith A |

    i.e. Smith and the Middle Initial will be in the LastName column

    If they have a suffix then they will need to be split as such.

    | FirstName | LastName |

    -------------------------

    | John | Smith III |

    i.e. Smith and the suffix will be in the LastName column

    I need to know if there is a way that I can split this out in one package. Eventually this will be automated to run daily so I will also have to utilize a lookup I assume to filter out any duplicates and enter only new data.

    I appreciate anything information that anyone can provide.

  • Could you also get a middle initial and a suffix?

    Or more than one middle initial?

    Sorry to complicate matters, but this could impact how to go about solving it

  • Yes, there can ba a situation where there is a middle initial and a suffix.

    However, there will not be an issue with 2 middle initials. The system we use only allows one character for middle initial.

  • OK, then I think my approach would be to split up the string into its component parts. If you get 2 pieces you must have just first name, surname. If you get 4, it will be 1st name, initial, surname, suffix.

    If you get 3 and the second is a single character, I think it is safe to assume that is a middle initial. If not, that it is a surname so the last element is a suffix. When you know what each component is, reconstitute it in the format you require in the end

  • Thank you for the response.

    I actually ended up using a suggestion from another forum that worked out really well.

    The moderator there helped me to put together a Script Component that took into account all factors.

    public override void Input0_ProcessInputRow(Input0Buffer Row)

    {

    const string pFirstName = @"^[A-Z]([-']?[a-z]+)*";

    const string pSuffix = @"((Jr|Sr|I|V|X)( ?))*";

    const string pInitial = @"(?<=\s)[A-Z](?=\s)";

    const string pLastName = @"(?!(?:Jr|Sr|I|V|X|^))([A-Z][-'\s]?[a-z]+)";

    string fullName = Row.Name.ToString();

    string firstName = Regex.Match(fullName, pFirstName).Value;

    string suffix = Regex.Match(fullName, pSuffix).Value;

    string initial = Regex.Match(fullName, pInitial).Value;

    string lastName = Regex.Match(fullName, pLastName).Value;

    if (!string.IsNullOrEmpty(initial))

    lastName += " " + initial;

    if (!string.IsNullOrEmpty(suffix))

    lastName += " " + suffix;

    Row.FirstName = firstName;

    Row.LastName = lastName;

    }

    I figured I would share it in hopes of it assisting others that may have a similar situation.

  • It'll be interesting what you do with last names like "van der waals" and first names like "Mary Jo" as well as people who have alphabet-soup after their last names and titles before their first names.

    My recommendation would be to forget about splitting names especially since most people end up doing "fuzzy" lookups for names anyway. Unless you have an application that has people populate separate on-screen fields to begin with, trying to splits on names can be terribly frustrating.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply