Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Help with implemting phonetic algorithm Expand / Collapse
Author
Message
Posted Wednesday, January 23, 2013 10:54 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, October 31, 2014 10:27 AM
Points: 708, Visits: 3,293
Please consider the following example:


create table #Names (Forename nvarchar(50), Surname nvarchar(50), PhoneticNameKey nvarchar(20))

insert into #Names(Forename, Surname)
select 'JOSE', 'ANTINORI'

select * from #Names

;with cte as
(
select
substring(Surname, 1, 1) as Chars,
stuff(Surname, 1, 1, '') as Surname,
1 as RowID
from #Names
union all
select
substring(Surname, 1, 1) as Chars,
stuff(Surname, 1, 1, '') as data,
RowID + 1 as RowID
from cte
where len(Surname) > 0
)

select RowID, Chars into #StringInTable
from cte
order by RowID

select * from #StringInTable

The idea here is to perform a series of replaces on the characters to produce a phonetic key. So I started off with something like this:

select case chars
when 'A' then 'y'
when 'B' then 'b'
when 'C' then 'k'
when 'D' then 'd'
when 'E' then 'y'
when 'F' then 'f'
when 'G' then 'g'
when 'H' then 'h'
when 'I' then 'y'
when 'J' then 'j'
when 'K' then 'k' -- but if K is followed by N then should become n
when 'L' then 'l'
when 'M' then 'm'
when 'N' then 'n' -- if N is followed by I or T then set to m
when 'O' then 'y'
end

from #StringInTable

But as you can see, for some characters, I need to check the next character to decide on the phonetic character to use.

Can someone help me with this?

Thanks in advance.


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1410712
Posted Wednesday, January 23, 2013 12:37 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: 2 days ago @ 2:05 PM
Points: 13,327, Visits: 12,820
What a completely bizarre requirement. However you did provide very easy to consume data!!! A solution here is not really too bad. Just add the next character into your CTE and you then have access to the "next" character.

if object_id('tempdb..#Names') is not null
drop table #Names

if object_id('tempdb..#StringInTable') is not null
drop table #StringInTable

create table #Names (Forename nvarchar(50), Surname nvarchar(50), PhoneticNameKey nvarchar(20))

insert into #Names(Forename, Surname)
select 'JOSE', 'ANTINORI' union all
select 'TEST', 'Knuckle' union all
select 'ITest', 'Nint'

select * from #Names

;with cte as
(
select
substring(Surname, 1, 1) as Chars,
substring(Surname, 2, 1) as Char2,
stuff(Surname, 1, 1, '') as Surname,
1 as RowID
from #Names
union all
select
substring(Surname, 1, 1) as Chars,
substring(Surname, 2, 1) as Char2,
stuff(Surname, 1, 1, '') as data,
RowID + 1 as RowID
from cte
where len(Surname) > 0
)

select RowID, Chars, Char2 into #StringInTable
from cte
order by RowID

select * from #StringInTable

select chars, char2, case chars
when 'A' then 'y'
when 'B' then 'b'
when 'C' then 'k'
when 'D' then 'd'
when 'E' then 'y'
when 'F' then 'f'
when 'G' then 'g'
when 'H' then 'h'
when 'I' then 'y'
when 'J' then 'j'
when 'K' then case when char2 = 'N' then 'n' else 'k' end -- but if K is followed by N then should become n
when 'L' then 'l'
when 'M' then 'm'
when 'N' then case when char2 in ('I', 'T') then 'm' else 'n' end -- if N is followed by I or T then set to m
when 'O' then 'y'
end

from #StringInTable



_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1410753
Posted Wednesday, January 23, 2013 1:00 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Yesterday @ 2:07 AM
Points: 1,037, Visits: 7,025
How are you doing, Abu Dina?

As luck would have it, one or two of us have been experimenting with exactly this requirement here on this thread, which will continue to run for a day or two. Enjoy, and pick your favourite from the mix. Each method has something different to offer (or bitch about, depending on your POV).
The fastest method by far - and it seems a good fit to your requirement - is the iTVF in which a table variable is hard coded with the find and replace characters. If it's not there (it doesn't work with the test harness), I'll post it tomorrow.

Cheers

ChrisM



Low-hanging fruit picker and defender of the moggies





For better assistance in answering your questions, please read this.




Understanding and using APPLY, (I) and (II) Paul White

Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Post #1410766
Posted Wednesday, January 23, 2013 1:40 PM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, October 31, 2014 10:27 AM
Points: 708, Visits: 3,293
Haha Sean..... I stopped at O as I didn't want to give away how other keys are created. This is not as bizarre as you think. A lot of work has gone into this. And when it's finished it will help me a lot.

Not tried your solution but will do first thing tomorrow morning.

Cheers for the effort.



---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1410780
Posted Wednesday, January 23, 2013 1:44 PM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, October 31, 2014 10:27 AM
Points: 708, Visits: 3,293
Hi Chris hope you're well!

Thanks for the thread. I shall go and have a thorough read in a mo.

I'm really grateful you brought my attention to iTVF Last year.
I've been using it with great results and when this is finished it will end up wrapped inside an iTFV.

BTW, do you find my requirement as bizarre as Sean thinks lol.. I'm beginning to worry now...


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1410783
Posted Wednesday, January 23, 2013 2:05 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Yesterday @ 2:07 AM
Points: 1,037, Visits: 7,025
Abu Dina (1/23/2013)
Hi Chris hope you're well!

Thanks for the thread. I shall go and have a thorough read in a mo.

I'm really grateful you brought my attention to iTVF Last year.
I've been using it with great results and when this is finished it will end up wrapped inside an iTFV.

BTW, do you find my requirement as bizarre as Sean thinks lol.. I'm beginning to worry now...


Nah mate I've seen this before, it's a method used by some of the professional matching packages. MatchIT, IIRC.

BTW here's probably the "right way" to do what you want:

SELECT Forename, Surname, x.PhoneticNameKey
FROM #Names
CROSS APPLY (
SELECT PhoneticNameKey =
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
Surname COLLATE Latin1_General_BIN,
'A','y'),'B','b'),'C','k'),'D','d'),'E','y'),'F','f'),'G','g'),'H','h'),'NI','m')
,'I','y'),'J','j'),'KN','n'),'K','k'),'L','l'),'M','m'),'NT','m'),'N','n'),'O','y')
) x





Low-hanging fruit picker and defender of the moggies





For better assistance in answering your questions, please read this.




Understanding and using APPLY, (I) and (II) Paul White

Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Post #1410791
Posted Wednesday, January 23, 2013 2:53 PM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, October 31, 2014 10:27 AM
Points: 708, Visits: 3,293
Lol Chris you scare me.... And embarrass me at the same time.

Simple solution....

This is another addition to the various phonetic algorithm implementations I have. Always hoping for better match keys lol!

Thanks.


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1410806
Posted Thursday, January 24, 2013 8:57 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, October 31, 2014 10:27 AM
Points: 708, Visits: 3,293
I've been trying all morning to get a set of REPLACES to work but there are too many rules to apply.

I may have to go down the CLR route for this.


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1411189
Posted Thursday, January 24, 2013 3:19 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Yesterday @ 2:07 AM
Points: 1,037, Visits: 7,025
Abu Dina (1/23/2013)
Lol Chris you scare me.... And embarrass me at the same time.

Simple solution....

This is another addition to the various phonetic algorithm implementations I have. Always hoping for better match keys lol!

Thanks.


Hey geezer!
I'll tell you what's scary - it's taken 20 years for me to be able to figure out an answer to your question on the second shot, but there are folks at almost every gig I go to, who can do better, after only three years' playing with SQL. There are some very talented players on the field and we all chase the same jobs...



Low-hanging fruit picker and defender of the moggies





For better assistance in answering your questions, please read this.




Understanding and using APPLY, (I) and (II) Paul White

Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Post #1411366
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse