Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Help with implemting phonetic algorithm Expand / Collapse
Author
Message
Posted Wednesday, January 23, 2013 10:54 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 4:16 PM
Points: 494, Visits: 2,151
Please consider the following example:


create table #Names (Forename nvarchar(50), Surname nvarchar(50), PhoneticNameKey nvarchar(20))

insert into #Names(Forename, Surname)
select 'JOSE', 'ANTINORI'

select * from #Names

;with cte as
(
select
substring(Surname, 1, 1) as Chars,
stuff(Surname, 1, 1, '') as Surname,
1 as RowID
from #Names
union all
select
substring(Surname, 1, 1) as Chars,
stuff(Surname, 1, 1, '') as data,
RowID + 1 as RowID
from cte
where len(Surname) > 0
)

select RowID, Chars into #StringInTable
from cte
order by RowID

select * from #StringInTable

The idea here is to perform a series of replaces on the characters to produce a phonetic key. So I started off with something like this:

select case chars
when 'A' then 'y'
when 'B' then 'b'
when 'C' then 'k'
when 'D' then 'd'
when 'E' then 'y'
when 'F' then 'f'
when 'G' then 'g'
when 'H' then 'h'
when 'I' then 'y'
when 'J' then 'j'
when 'K' then 'k' -- but if K is followed by N then should become n
when 'L' then 'l'
when 'M' then 'm'
when 'N' then 'n' -- if N is followed by I or T then set to m
when 'O' then 'y'
end

from #StringInTable

But as you can see, for some characters, I need to check the next character to decide on the phonetic character to use.

Can someone help me with this?

Thanks in advance.


-----------------------------------
http://www.SQL4n00bs.com
Post #1410712
Posted Wednesday, January 23, 2013 12:37 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Yesterday @ 2:33 PM
Points: 8,620, Visits: 8,261
What a completely bizarre requirement. However you did provide very easy to consume data!!! A solution here is not really too bad. Just add the next character into your CTE and you then have access to the "next" character.

if object_id('tempdb..#Names') is not null
drop table #Names

if object_id('tempdb..#StringInTable') is not null
drop table #StringInTable

create table #Names (Forename nvarchar(50), Surname nvarchar(50), PhoneticNameKey nvarchar(20))

insert into #Names(Forename, Surname)
select 'JOSE', 'ANTINORI' union all
select 'TEST', 'Knuckle' union all
select 'ITest', 'Nint'

select * from #Names

;with cte as
(
select
substring(Surname, 1, 1) as Chars,
substring(Surname, 2, 1) as Char2,
stuff(Surname, 1, 1, '') as Surname,
1 as RowID
from #Names
union all
select
substring(Surname, 1, 1) as Chars,
substring(Surname, 2, 1) as Char2,
stuff(Surname, 1, 1, '') as data,
RowID + 1 as RowID
from cte
where len(Surname) > 0
)

select RowID, Chars, Char2 into #StringInTable
from cte
order by RowID

select * from #StringInTable

select chars, char2, case chars
when 'A' then 'y'
when 'B' then 'b'
when 'C' then 'k'
when 'D' then 'd'
when 'E' then 'y'
when 'F' then 'f'
when 'G' then 'g'
when 'H' then 'h'
when 'I' then 'y'
when 'J' then 'j'
when 'K' then case when char2 = 'N' then 'n' else 'k' end -- but if K is followed by N then should become n
when 'L' then 'l'
when 'M' then 'm'
when 'N' then case when char2 in ('I', 'T') then 'm' else 'n' end -- if N is followed by I or T then set to m
when 'O' then 'y'
end

from #StringInTable



_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Post #1410753
Posted Wednesday, January 23, 2013 1:00 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 11:07 PM
Points: 921, Visits: 3,745
How are you doing, Abu Dina?

As luck would have it, one or two of us have been experimenting with exactly this requirement here on this thread, which will continue to run for a day or two. Enjoy, and pick your favourite from the mix. Each method has something different to offer (or bitch about, depending on your POV).
The fastest method by far - and it seems a good fit to your requirement - is the iTVF in which a table variable is hard coded with the find and replace characters. If it's not there (it doesn't work with the test harness), I'll post it tomorrow.

Cheers

ChrisM



Low-hanging fruit picker and defender of the moggies





For better assistance in answering your questions, please read this.




Understanding and using APPLY, (I) and (II) Paul White

Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Post #1410766
Posted Wednesday, January 23, 2013 1:40 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 4:16 PM
Points: 494, Visits: 2,151
Haha Sean..... I stopped at O as I didn't want to give away how other keys are created. This is not as bizarre as you think. A lot of work has gone into this. And when it's finished it will help me a lot.

Not tried your solution but will do first thing tomorrow morning.

Cheers for the effort.



-----------------------------------
http://www.SQL4n00bs.com
Post #1410780
Posted Wednesday, January 23, 2013 1:44 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 4:16 PM
Points: 494, Visits: 2,151
Hi Chris hope you're well!

Thanks for the thread. I shall go and have a thorough read in a mo.

I'm really grateful you brought my attention to iTVF Last year.
I've been using it with great results and when this is finished it will end up wrapped inside an iTFV.

BTW, do you find my requirement as bizarre as Sean thinks lol.. I'm beginning to worry now...


-----------------------------------
http://www.SQL4n00bs.com
Post #1410783
Posted Wednesday, January 23, 2013 2:05 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 11:07 PM
Points: 921, Visits: 3,745
Abu Dina (1/23/2013)
Hi Chris hope you're well!

Thanks for the thread. I shall go and have a thorough read in a mo.

I'm really grateful you brought my attention to iTVF Last year.
I've been using it with great results and when this is finished it will end up wrapped inside an iTFV.

BTW, do you find my requirement as bizarre as Sean thinks lol.. I'm beginning to worry now...


Nah mate I've seen this before, it's a method used by some of the professional matching packages. MatchIT, IIRC.

BTW here's probably the "right way" to do what you want:

SELECT Forename, Surname, x.PhoneticNameKey
FROM #Names
CROSS APPLY (
SELECT PhoneticNameKey =
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
Surname COLLATE Latin1_General_BIN,
'A','y'),'B','b'),'C','k'),'D','d'),'E','y'),'F','f'),'G','g'),'H','h'),'NI','m')
,'I','y'),'J','j'),'KN','n'),'K','k'),'L','l'),'M','m'),'NT','m'),'N','n'),'O','y')
) x





Low-hanging fruit picker and defender of the moggies





For better assistance in answering your questions, please read this.




Understanding and using APPLY, (I) and (II) Paul White

Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Post #1410791
Posted Wednesday, January 23, 2013 2:53 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 4:16 PM
Points: 494, Visits: 2,151
Lol Chris you scare me.... And embarrass me at the same time.

Simple solution....

This is another addition to the various phonetic algorithm implementations I have. Always hoping for better match keys lol!

Thanks.


-----------------------------------
http://www.SQL4n00bs.com
Post #1410806
Posted Thursday, January 24, 2013 8:57 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 4:16 PM
Points: 494, Visits: 2,151
I've been trying all morning to get a set of REPLACES to work but there are too many rules to apply.

I may have to go down the CLR route for this.


-----------------------------------
http://www.SQL4n00bs.com
Post #1411189
Posted Thursday, January 24, 2013 3:19 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 11:07 PM
Points: 921, Visits: 3,745
Abu Dina (1/23/2013)
Lol Chris you scare me.... And embarrass me at the same time.

Simple solution....

This is another addition to the various phonetic algorithm implementations I have. Always hoping for better match keys lol!

Thanks.


Hey geezer!
I'll tell you what's scary - it's taken 20 years for me to be able to figure out an answer to your question on the second shot, but there are folks at almost every gig I go to, who can do better, after only three years' playing with SQL. There are some very talented players on the field and we all chase the same jobs...



Low-hanging fruit picker and defender of the moggies





For better assistance in answering your questions, please read this.




Understanding and using APPLY, (I) and (II) Paul White

Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Post #1411366
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse