Suggested Approaches Appreciated

Question

Suggested Approaches Appreciated

Jason Selburg

SSC-Insane

Points: 24641
More actions
June 17, 2008 at 9:49 am

#94207

First off I want to say that I know I'll come up with something on my own, I'm just looking for other suggestions and approaches to this requirement.
Here's the sample data and required results...
DECLARE @sampleData TABLE
(personID INT
,fullName VARCHAR(64))
INSERT @sampleData
SELECT 999888899, 'MIKE SMITH^^^^' UNION ALL
SELECT 123456789, 'SELBURG JASON L' UNION ALL
SELECT 377775014, 'GEORGE W TAKEYAMO' UNION ALL
SELECT 354517002, 'WAYNE A YADA^^^^' UNION ALL
SELECT 115455522, 'DONNA M AABERG^^^^' UNION ALL
SELECT 349555554, 'ECKERLIND^FRANCES^C^^' UNION ALL
SELECT 351874470, 'YABOVICH^LISA^M^^' UNION ALL
SELECT 351874470, 'JONES^STEVE^^^' UNION ALL
SELECT 354891162, 'YALUNG^JOSEFINA^Y^^'
/*
Rules:
Output should be in the format LASTNAME^FIRSTNAME^MIDDLEINITIAL^^
The carrot '^' splits each word
A total of four '^' symbols, so if no middle initial then LASTNAME^FIRSTNAME^^^
I understand that the biggest hurdle is determining whether or not a record
with no middle initial is FIRST-LAST or LAST-FIRST, but let's ignore that issue for now.
Assume that there will not be any single word names, everyone has at least a first and last name
Expected result
'SMITH^MIKE^^^'
'SELBURG^JASON^L^^'
'TAKEYAMO^GEORGE^W^^'
'YADA^WAYNE^A^^'
'AABERG^DONNA^M^^'
'ECKERLIND^FRANCES^C^^'
'YABOVICH^LISA^M^^'
'JONES^STEVE^^^'
'YALUNG^JOSEFINA^Y^^'
*/
Any suggestions are appreciated, and I'm specifically interested in the SET based approach. 😎
NOTE: This is a third part application and the fact that the data isn't normalized is beyond my control and CANT BE CHANGED. So please don't reply telling me that a correct design in the first place would avoid this problem... DUH! 😀
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. Selburg

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 1

Change the design - just kidding.

You are in trouble - mostly because of the issue you presented. Even in your sample data, MIKE SMITH and STEVE JONES are impossible to sort out. This will be an especially big problem when you get to someone's name like mine (MICHAEL EARL or EARL MICHAEL - could go either way).

Here is a shot at an "intelligent" parse that really looks for the middle initial position to get an assumption about the data format.

I used sub-queries because I did not want to fight so much with the nested functions.

[font="Courier New"]SELECT

PersonID

, FullString

, CASE WHEN LEN(Part2) = 1 THEN Part1 WHEN LEN(Part3) = 1 THEN Part2 ELSE Part1 END AS FirstName

, CASE WHEN LEN(Part2) = 1 THEN Part2 WHEN LEN(Part3) = 1 THEN Part3 ELSE '' END AS MiddleInit

, CASE WHEN LEN(Part2) = 1 THEN Part3 WHEN LEN(Part3) <> 1 THEN Part3 ELSE Part1 END AS LastName

FROM (

SELECT

PersonID

, FullString

, Part1

, REPLACE(REPLACE(FullString,Part1 + ' ',''),' ' + Part3,'') AS Part2

, Part3

FROM (

SELECT

PersonID

, LTRIM(RTRIM(REPLACE(FullName,'^',' '))) AS FullString

, LEFT(LTRIM(RTRIM(REPLACE(FullName,'^',' '))),CHARINDEX(' ',LTRIM(RTRIM(REPLACE(FullName,'^',' '))))-1) AS Part1

, REVERSE(LEFT(REVERSE(LTRIM(RTRIM(REPLACE(FullName,'^',' ')))),CHARINDEX(' ',REVERSE(LTRIM(RTRIM(REPLACE(FullName,'^',' ')))))-1)) AS Part3

FROM

#sampleData

) X

) Y[/font]

Jason Selburg SSC-Insane Points: 24641 More actions · Answer 2

Nice. I'll give it a whirl.

Any more takers out there?

______________________________________________________________________

Personal Motto: Why push the envelope when you can just open it?

If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

Jason L. Selburg

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 3

Jason - have you considered a CLR function? You could run a split on BOTH ^ and (space), and then put it back together from there, or a regex replace (one of the things SQLCLR is actually fairly good at).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jason Selburg SSC-Insane Points: 24641 More actions · Answer 4

Matt,

Thanks for the suggestion, but I left out one VERY important point. This needs to work on 2000 as well. I probably should have posted in that forum :hehe:

Anywho, I'm going with Michael's solution.

Thanks

______________________________________________________________________

Personal Motto: Why push the envelope when you can just open it?

If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

Jason L. Selburg

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 5

So you need to change the design AND upgrade your server...

Is this a one-time update, something you intend to do regularly in batch, or something you need to really happen real-time?

If this needs to happen in batch, you could use SSIS. A data flow and a script task would give you a lot of options. Regardless, I think the problem with not being able to figure out which is the first and last name and then issues with dutch names that have spaces in them are going to tank whatever you write. At the end of the day, someone is going to have to look at the records to really be sure they are correct - and even then, there may be no way to tell.

Jason Selburg SSC-Insane Points: 24641 More actions · Answer 6

Michael Earl (6/18/2008)
So you need to change the design AND upgrade your server...
Is this a one-time update, something you intend to do regularly in batch, or something you need to really happen real-time?

This is a one time update, but needs to be done over the next 6 months at over 150 client sites in conjunction with another software update. Each site has a different version of SQL from 2000 on. :w00t:

Regardless, I think the problem with not being able to figure out which is the first and last name and then issues with dutch names that have spaces in them are going to tank whatever you write. At the end of the day, someone is going to have to look at the records to really be sure they are correct - and even then, there may be no way to tell.

I don't think that issue (first/last or last/first) is actually going to be a problem. I was just being proactive in my description. The data I've looked into so far doesn't have them mixed together.

______________________________________________________________________

Personal Motto: Why push the envelope when you can just open it?

If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

Jason L. Selburg

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 7

Just curious... do you have any names like the following example?

Dr. and Mrs. J R van de Camp Rosenburg V MD. PHD. Esquire

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jason Selburg SSC-Insane Points: 24641 More actions · Answer 8

Not so far!

$#&&@%....! ... Jeff, you just had to bring that up didn't you!

______________________________________________________________________

Personal Motto: Why push the envelope when you can just open it?

If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

Jason L. Selburg

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 9

I have worked for a couple of mailing houses.

You would not believe how many people have names like that.

Seggerman-675349 Hall of Fame Points: 3563 More actions · Answer 10

Not that it matters but carrot is what, in a bunch, BunnyRabbit on Captain Kangaroo used to put up on the counter all the time

I believe the ^ (capital 6) is a caret

carat (or karat?) is a gold weight

Jason Selburg SSC-Insane Points: 24641 More actions · Answer 11

😛

______________________________________________________________________

Personal Motto: Why push the envelope when you can just open it?

If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

Jason L. Selburg

sgmunson SSC Guru Points: 110651 More actions · Answer 12

Well, I WAS going to suggest having a table of the first names to deal with that issue, but any example beyond the original sample data would break the idea, and I really like the solution that was chosen, so in addition to being a day late (sort of), I'm a dollar short as well. Oh well... maybe next time...

Steve

(aka smunson)

:):):)

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 13

Seggerman (6/19/2008)
Not that it matters but carrot is what, in a bunch, BunnyRabbit on Captain Kangaroo used to put up on the counter all the time
I believe the ^ (capital 6) is a caret
carat (or karat?) is a gold weight

Like you said... "not that it matters", but the "^" sign (caret) has a lot of different meanings depending on where it is used and is sometimes referred to as a "circumflex".

http://www.merriam-webster.com/dictionary/caret

http://www.merriam-webster.com/dictionary/circumflex (see the NOUN version)

http://en.wikipedia.org/wiki/Caret

http://en.wikipedia.org/wiki/Circumflex

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)