Suggested Approaches Appreciated

  • First off I want to say that I know I'll come up with something on my own, I'm just looking for other suggestions and approaches to this requirement.

    Here's the sample data and required results...

    DECLARE @sampleData TABLE

    (personID INT

    ,fullName VARCHAR(64))

    INSERT @sampleData

    SELECT 999888899, 'MIKE SMITH^^^^' UNION ALL

    SELECT 123456789, 'SELBURG JASON L' UNION ALL

    SELECT 377775014, 'GEORGE W TAKEYAMO' UNION ALL

    SELECT 354517002, 'WAYNE A YADA^^^^' UNION ALL

    SELECT 115455522, 'DONNA M AABERG^^^^' UNION ALL

    SELECT 349555554, 'ECKERLIND^FRANCES^C^^' UNION ALL

    SELECT 351874470, 'YABOVICH^LISA^M^^' UNION ALL

    SELECT 351874470, 'JONES^STEVE^^^' UNION ALL

    SELECT 354891162, 'YALUNG^JOSEFINA^Y^^'

    /*

    Rules:

    Output should be in the format LASTNAME^FIRSTNAME^MIDDLEINITIAL^^

    The carrot '^' splits each word

    A total of four '^' symbols, so if no middle initial then LASTNAME^FIRSTNAME^^^

    I understand that the biggest hurdle is determining whether or not a record

    with no middle initial is FIRST-LAST or LAST-FIRST, but let's ignore that issue for now.

    Assume that there will not be any single word names, everyone has at least a first and last name

    Expected result

    'SMITH^MIKE^^^'

    'SELBURG^JASON^L^^'

    'TAKEYAMO^GEORGE^W^^'

    'YADA^WAYNE^A^^'

    'AABERG^DONNA^M^^'

    'ECKERLIND^FRANCES^C^^'

    'YABOVICH^LISA^M^^'

    'JONES^STEVE^^^'

    'YALUNG^JOSEFINA^Y^^'

    */

    Any suggestions are appreciated, and I'm specifically interested in the SET based approach. 😎

    NOTE: This is a third part application and the fact that the data isn't normalized is beyond my control and CANT BE CHANGED. So please don't reply telling me that a correct design in the first place would avoid this problem... DUH! 😀

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Change the design - just kidding.

    You are in trouble - mostly because of the issue you presented. Even in your sample data, MIKE SMITH and STEVE JONES are impossible to sort out. This will be an especially big problem when you get to someone's name like mine (MICHAEL EARL or EARL MICHAEL - could go either way).

    Here is a shot at an "intelligent" parse that really looks for the middle initial position to get an assumption about the data format.

    I used sub-queries because I did not want to fight so much with the nested functions.

    [font="Courier New"]SELECT

    PersonID

    , FullString

    , CASE WHEN LEN(Part2) = 1 THEN Part1 WHEN LEN(Part3) = 1 THEN Part2 ELSE Part1 END AS FirstName

    , CASE WHEN LEN(Part2) = 1 THEN Part2 WHEN LEN(Part3) = 1 THEN Part3 ELSE '' END AS MiddleInit

    , CASE WHEN LEN(Part2) = 1 THEN Part3 WHEN LEN(Part3) <> 1 THEN Part3 ELSE Part1 END AS LastName

    FROM (

    SELECT

    PersonID

    , FullString

    , Part1

    , REPLACE(REPLACE(FullString,Part1 + ' ',''),' ' + Part3,'') AS Part2

    , Part3

    FROM (

    SELECT

    PersonID

    , LTRIM(RTRIM(REPLACE(FullName,'^',' '))) AS FullString

    , LEFT(LTRIM(RTRIM(REPLACE(FullName,'^',' '))),CHARINDEX(' ',LTRIM(RTRIM(REPLACE(FullName,'^',' '))))-1) AS Part1

    , REVERSE(LEFT(REVERSE(LTRIM(RTRIM(REPLACE(FullName,'^',' ')))),CHARINDEX(' ',REVERSE(LTRIM(RTRIM(REPLACE(FullName,'^',' ')))))-1)) AS Part3

    FROM

    #sampleData

    ) X

    ) Y[/font]

  • Nice. I'll give it a whirl.

    Any more takers out there?

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Jason - have you considered a CLR function? You could run a split on BOTH ^ and (space), and then put it back together from there, or a regex replace (one of the things SQLCLR is actually fairly good at).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt,

    Thanks for the suggestion, but I left out one VERY important point. This needs to work on 2000 as well. I probably should have posted in that forum :hehe:

    Anywho, I'm going with Michael's solution.

    Thanks

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • So you need to change the design AND upgrade your server...

    Is this a one-time update, something you intend to do regularly in batch, or something you need to really happen real-time?

    If this needs to happen in batch, you could use SSIS. A data flow and a script task would give you a lot of options. Regardless, I think the problem with not being able to figure out which is the first and last name and then issues with dutch names that have spaces in them are going to tank whatever you write. At the end of the day, someone is going to have to look at the records to really be sure they are correct - and even then, there may be no way to tell.

  • Michael Earl (6/18/2008)


    So you need to change the design AND upgrade your server...

    Is this a one-time update, something you intend to do regularly in batch, or something you need to really happen real-time?

    This is a one time update, but needs to be done over the next 6 months at over 150 client sites in conjunction with another software update. Each site has a different version of SQL from 2000 on. :w00t:

    Regardless, I think the problem with not being able to figure out which is the first and last name and then issues with dutch names that have spaces in them are going to tank whatever you write. At the end of the day, someone is going to have to look at the records to really be sure they are correct - and even then, there may be no way to tell.

    I don't think that issue (first/last or last/first) is actually going to be a problem. I was just being proactive in my description. The data I've looked into so far doesn't have them mixed together.

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Just curious... do you have any names like the following example?

    Dr. and Mrs. J R van de Camp Rosenburg V MD. PHD. Esquire

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Not so far!

    $#&&@%....! ... Jeff, you just had to bring that up didn't you!

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • I have worked for a couple of mailing houses.

    You would not believe how many people have names like that.

  • Not that it matters but carrot is what, in a bunch, BunnyRabbit on Captain Kangaroo used to put up on the counter all the time

    I believe the ^ (capital 6) is a caret

    carat (or karat?) is a gold weight

  • 😛

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Well, I WAS going to suggest having a table of the first names to deal with that issue, but any example beyond the original sample data would break the idea, and I really like the solution that was chosen, so in addition to being a day late (sort of), I'm a dollar short as well. Oh well... maybe next time...

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Seggerman (6/19/2008)


    Not that it matters but carrot is what, in a bunch, BunnyRabbit on Captain Kangaroo used to put up on the counter all the time

    I believe the ^ (capital 6) is a caret

    carat (or karat?) is a gold weight

    Like you said... "not that it matters", but the "^" sign (caret) has a lot of different meanings depending on where it is used and is sometimes referred to as a "circumflex".

    http://www.merriam-webster.com/dictionary/caret

    http://www.merriam-webster.com/dictionary/circumflex (see the NOUN version)

    http://en.wikipedia.org/wiki/Caret

    http://en.wikipedia.org/wiki/Circumflex

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply