Parsing the name field with T-SQL

Question

Parsing the name field with T-SQL

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply

achavez Old Hand Points: 359 More actions · Answer 1

I must agree with Frank, you might want to consider another tool. When I used to work in junk mail, er...Direct Marketing...we'd get millions of names a week to process and used a DOS based package (hey, what works works) to do parsing and genderizing, weeding out companies and duplicate removal, Name (Bill Jones, William Jones, Billy Jones). I can't remember the name of that one, but look around, they're out there.

http://www.semaphorecorp.com/nc/namechop.html

http://www.infoplan.com.au/nameparser/

http://www.guysoftware.com/parserat.htm

-------
at us, very deafly, a most stares
collosal hoax of clocks and calendars

eecummings

cgunner Ten Centuries Points: 1118 More actions · Answer 2

Thanks for the links. I will check them out. I got a script from SQLTeam that works:

Select

[LName] =

case

when charindex(' ', Col001) = 0 then Col001

Else subString(Col001, 1, charindex(' ', Col001))

End

,[FName] =

case

when charindex(' ', Col001) = 0 then null

when charindex(' ', Col001, charindex(' ', Col001)+1) = 0 then subString(Col001, charindex(' ', Col001)+1, 50)

else subString(Col001, charindex(' ', Col001)+1, charindex(' ', Col001, charindex(' ', Col001)+1) - charindex(' ', Col001))

end

,[MName] = --this leaves everything after 2nd space combined ie: R LEE H

case

when charindex(' ', Col001) = 0 then null

when charindex(' ', Col001, charindex(' ', Col001)+1) = 0 then null

else convert(varChar(25), substring(Col001, charindex(' ', Col001, charindex(' ',Col001)+1)+1, 50))

end

From table

Frank Kalis SSC Guru Points: 111183 More actions · Answer 3

Yes, I have seen that. It's exactly what I meant with ugly T-SQL solution above.

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

cgunner Ten Centuries Points: 1118 More actions · Answer 4

I hope Yukon will offer more. This seems like it would be a common problem.

Bob Monahon Ten Centuries Points: 1282 More actions · Answer 5

Hello cgrunner,

The parsing-names problem has existed since names have been automated. I don't think it's susceptible to a technology quick-fix. Typical name data may include: honorific titles (DR., etc.), hyphenated last names that may or may not include the hyphen, generation designations (SR, JR, III, etc.), two persons included in the name (SMITH BOB AND JANE), etc. etc.

Depending on how the names are used, and the business impact of a bad name, the best solution I've seen is to apply a simple formatting rule, followed by a manual review of potential bad names.

You seem to be reasonably sure that the sequence of names will be {last, first, middle}. In that case, I'd suggest a procedure like this:

A. Run all names through the script you got from SQLTeam. It looks like that will work adequately for most names.

B. Identify potential problem names and route these for manual review. As as starter, these probably should include:

- blank first name

- imbedded spaces found in the middle name

If your organization can implement such a procedure, then a good data element to add to your parsed-names table might be "reviewed_by" and "reviewed_ts". Potential bad names that have been reviewed can be excluded from subsequent bad-names lists.

Good luck

Regards,

Bob Monahon