Parsing the name field with T-SQL

  • I have a routine in Excel that will parse out a column with a person’s full name.  I need to break the column up into three separate columns with T-SQL.  Here are the scripts that I am using:

    This section describes several worksheet functions you can use to split full names into the first and last name components. 

    To return the last name of the full name in A2, use the following formula.

    =LEFT(A1,IF(ISERROR(FIND(" ",A1,1)),LEN(A1),FIND(" ",A1,1)-1))

    To return the first name of the full name in A2, use the following formula.

    =TRIM(IF(ISERROR(FIND(" ",A1,1)),A1,MID(A1,FIND(" ",A1,1)+1,

    IF(ISERROR(FIND(" ",A1,FIND(" ",A1,1)+2)),LEN(A1),

    FIND(" ",A1,FIND(" ",A1,1)+2))-FIND(" ",A1,1))))

    To return the middle name of the full name in A2, use the following formula.

    =TRIM(RIGHT(A1,LEN(A1)-IF(ISERROR(FIND(" ",A1,

    FIND(" ",A1,FIND(" ",A1,1)+2))),LEN(A1),

    FIND(" ",A1,FIND(" ",A1,FIND(" ",A1,1)+2))-1)))

     

    Any suggestions would be appreciated.

     

  • Have you searched the script section here already. This is a FAQ, so very likely you find a generic solution there.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Yes.  I have found a few UDF that will split the field but I need something like this:

    SELECT SUBSTRING(Col001,1,CHARINDEX(' ',Col001)) AS LName,

    SUBSTRING(Col001,CHARINDEX(' ',Col001)+1, CHARINDEX(' ',Col001,CHARINDEX(' ',Col001)+1)-CHARINDEX(' ',Col001)+20) AS FName,

    REVERSE(SUBSTRING(REVERSE(Col001),1,CHARINDEX('',REVERSE(Col001)))) AS MName

    FROM table

    I have not gotten this to work properly.  Do you think this can work?

     

  • Can you post some sample data?

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • There might be a script to help in the scripts section as well. Not to toot my horn, but I wrote some articles, all entitled Tame Those Strings (search on that) about string manipulation. Basically the IFs are replaced by CASE statements, the FIND with CHARINDEX, RTRIM or LTRIM replace TRIM.

     

  • Raw Data

    Col001

    AABY TAMARA J

    AADNESEN MARJORIE

    AARANT MAUREEN R

    AARON MICHELLEE

    AARON PATRICIA J

    ABBA DAVID W

    ABBATE ANTHONY P

    SELECT SUBSTRING(Col001,1,CHARINDEX(' ',Col001)) AS LName,

    SUBSTRING(Col001,CHARINDEX(' ',Col001)+1, CHARINDEX(' ',Col001,CHARINDEX(' ',Col001)+1)-CHARINDEX(' ',Col001)+20) AS FName,

    REVERSE(SUBSTRING(REVERSE(Col001),1,CHARINDEX('',REVERSE(Col001)))) AS MName

    FROM table

    Gives me this:

    LName       FName           MName

    AABY         TAMARA J 

    AADNESEN  MARJORIE 

    AARANT      MAUREEN R 

    AARON        MICHELLEE 

    AARON        PATRICIA J 

    ABBA          DAVID W 

    ABBATE       ANTHONY P 

    The AS MName is what is killing me.  How can I get the MName field populated?  I have looked at the Tame Those Strings articles but I am still lost.

     

  • Just a question. You are aware that there is a built-in function in Excel that can do this quite nicely? In English I suspect it to be called "Text in columns" and it can be found in the Data menu in the menu bar.

    Here's the quickely recorded Excel macro

    Sub Makro1()

    '

    ' Makro1 Makro

    ' Makro am 29.12.2004 von Frank Kalis aufgezeichnet

    '

    '

        Selection.TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _

            TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, Tab:=True, _

            Semicolon:=False, Comma:=False, Space:=True, Other:=False, FieldInfo _

            :=Array(Array(1, 1), Array(2, 1), Array(3, 1))

    End Sub

    Or does it need to be in T-SQL?

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Yes.  I do alot of my data scrubbing in Excel but Excel is limited to 65,000 lines.  I need to do a couple of million records.

  • Umpf, yes that's really a serious limitation in Excel. Will see what I can dig out this evening. However, it certainly pays to have a close look at Steve's article anyway.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Thanks!  This seems to work also for the LName but I may be making this harder than it is:

    SELECT SUBSTRING(Col001,1,LEN(SUBSTRING(Col001,1,CHARINDEX(' ',Col001)))) AS LName FROM table

  • SELECT

     LName =

      CASE

       WHEN PARSENAME(REPLACE(Col001, ' ', '.'), 3)

       IS NULL THEN PARSENAME(REPLACE(Col001, ' ', '.'), 2)

       ELSE PARSENAME(REPLACE(Col001, ' ', '.'), 3)

      END,

     FName =

      CASE

       WHEN PARSENAME(REPLACE(Col001, ' ', '.'), 3)

       IS NULL THEN PARSENAME(REPLACE(Col001, ' ', '.'), 1)

       ELSE PARSENAME(REPLACE(Col001, ' ', '.'), 2)

      END,

     MName =

      CASE

       WHEN PARSENAME(REPLACE(Col001, ' ', '.'), 3) IS NULL THEN ''

       ELSE PARSENAME(REPLACE(Col001, ' ', '.'), 1)

      END

    FROM table

    This seems to work also but I still am not getting the correct results.

     

  • Based on your sample data, it really seems to work with the exception of names like 'VAN HALEN EDDIE'. How many row are we talking about? Is this a one-time action?

    Btw, have you seen one of the recent articles on the homepage here?

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • SELECT

     Column1 =

      CASE

       WHEN PARSENAME(REPLACE(Col001, ' ', '.'), 4) IS NULL AND PARSENAME(REPLACE(Col001, ' ', '.'), 3) IS NULL THEN PARSENAME(REPLACE(Col001, ' ', '.'), 2)

       WHEN PARSENAME(REPLACE(Col001, ' ', '.'), 4) IS NULL THEN PARSENAME(REPLACE(Col001, ' ', '.'), 3)

       ELSE PARSENAME(REPLACE(Col001, ' ', '.'), 4)

      END,

     Column2 =

      CASE

       WHEN PARSENAME(REPLACE(Col001, ' ', '.'), 4) IS NULL AND PARSENAME(REPLACE(Col001, ' ', '.'), 3) IS NULL THEN PARSENAME(REPLACE(Col001, ' ', '.'), 1)

       WHEN PARSENAME(REPLACE(Col001, ' ', '.'), 4) IS NULL THEN PARSENAME(REPLACE(Col001, ' ', '.'), 2)

       ELSE PARSENAME(REPLACE(Col001, ' ', '.'), 3)

      END,

     Column3 =

      CASE

       WHEN PARSENAME(REPLACE(Col001, ' ', '.'), 4) IS NULL AND PARSENAME(REPLACE(Col001, ' ', '.'), 3) IS NULL THEN ''

       WHEN PARSENAME(REPLACE(Col001, ' ', '.'), 4) IS NULL THEN PARSENAME(REPLACE(Col001, ' ', '.'), 1)

       ELSE PARSENAME(REPLACE(Col001, ' ', '.'), 2)

      END,

     Column4 =

      CASE

       WHEN PARSENAME(REPLACE(Col001, ' ', '.'), 4) IS NULL AND PARSENAME(REPLACE(Col001, ' ', '.'), 3) IS NULL THEN ''

       WHEN PARSENAME(REPLACE(Col001, ' ', '.'), 4) IS NULL THEN ''

       ELSE PARSENAME(REPLACE(Col001, ' ', '.'), 1)

      END

    FROM table

     

    Gives me this:

    A S CONSTRUCTION INC

    A S CONSTRUCTION INC

    A S CONSTRUCTION INC

    AABY TAMARA J 

    AADNESEN MARJORIE  

    AARANT MAUREEN R 

    AARON MICHELLEE  

    AARON PATRICIA J 

    ABBA DAVID W 

    ABBATE ANTHONY P 

    ABBOTT CHARLES C 

    ABBOTT FRED  

    ABBOTT JAMES P 

    ABBOTT JULIA L 

    ABBOTT KEITH E 

    ABBOTT MIKE E 

    NULL NULL  

    As you can see, the elements are broken up into their own column but I can only assign 4.  The last record will show up NULL NULL.  This is going to be used constantly.  I have millions of records I scrub threw all the time. 

  • Sorry.  I guess the code tags did not work.  By the way, the last record has 5 elements.

  • Yuk, while PARSENAME is really handy, you can split only up to 4 parts. As your possible outcomes ranges from 2 to apparently 5 parts, I suspect every T-SQL solution to be some kind of ugly. I think you can do this with DTS, but for that kind of question I'm the wrong one, since I don't use DTS. What about doing the scrubbing with some scripting language outside SQL Server and then import? Or use Access with its text import functionality (which is very similar to Excel) as an intermediate stage? Getting "better" raw data isn't possible, right?

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply