Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Help with implemting phonetic algorithm


Help with implemting phonetic algorithm

Author
Message
Abu Dina
Abu Dina
Right there with Babe
Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)

Group: General Forum Members
Points: 723 Visits: 3323
Please consider the following example:


create table #Names (Forename nvarchar(50), Surname nvarchar(50), PhoneticNameKey nvarchar(20))

insert into #Names(Forename, Surname)
select 'JOSE', 'ANTINORI'

select * from #Names

;with cte as
(
select
substring(Surname, 1, 1) as Chars,
stuff(Surname, 1, 1, '') as Surname,
1 as RowID
from #Names
union all
select
substring(Surname, 1, 1) as Chars,
stuff(Surname, 1, 1, '') as data,
RowID + 1 as RowID
from cte
where len(Surname) > 0
)

select RowID, Chars into #StringInTable
from cte
order by RowID

select * from #StringInTable



The idea here is to perform a series of replaces on the characters to produce a phonetic key. So I started off with something like this:

select case chars
when 'A' then 'y'
when 'B' then 'b'
when 'C' then 'k'
when 'D' then 'd'
when 'E' then 'y'
when 'F' then 'f'
when 'G' then 'g'
when 'H' then 'h'
when 'I' then 'y'
when 'J' then 'j'
when 'K' then 'k' -- but if K is followed by N then should become n
when 'L' then 'l'
when 'M' then 'm'
when 'N' then 'n' -- if N is followed by I or T then set to m
when 'O' then 'y'
end

from #StringInTable



But as you can see, for some characters, I need to check the next character to decide on the phonetic character to use.

Can someone help me with this?

Thanks in advance.

---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Sean Lange
Sean Lange
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16550 Visits: 17004
What a completely bizarre requirement. However you did provide very easy to consume data!!! A solution here is not really too bad. Just add the next character into your CTE and you then have access to the "next" character.


if object_id('tempdb..#Names') is not null
drop table #Names

if object_id('tempdb..#StringInTable') is not null
drop table #StringInTable

create table #Names (Forename nvarchar(50), Surname nvarchar(50), PhoneticNameKey nvarchar(20))

insert into #Names(Forename, Surname)
select 'JOSE', 'ANTINORI' union all
select 'TEST', 'Knuckle' union all
select 'ITest', 'Nint'

select * from #Names

;with cte as
(
select
substring(Surname, 1, 1) as Chars,
substring(Surname, 2, 1) as Char2,
stuff(Surname, 1, 1, '') as Surname,
1 as RowID
from #Names
union all
select
substring(Surname, 1, 1) as Chars,
substring(Surname, 2, 1) as Char2,
stuff(Surname, 1, 1, '') as data,
RowID + 1 as RowID
from cte
where len(Surname) > 0
)

select RowID, Chars, Char2 into #StringInTable
from cte
order by RowID

select * from #StringInTable

select chars, char2, case chars
when 'A' then 'y'
when 'B' then 'b'
when 'C' then 'k'
when 'D' then 'd'
when 'E' then 'y'
when 'F' then 'f'
when 'G' then 'g'
when 'H' then 'h'
when 'I' then 'y'
when 'J' then 'j'
when 'K' then case when char2 = 'N' then 'n' else 'k' end -- but if K is followed by N then should become n
when 'L' then 'l'
when 'M' then 'm'
when 'N' then case when char2 in ('I', 'T') then 'm' else 'n' end -- if N is followed by I or T then set to m
when 'O' then 'y'
end

from #StringInTable



_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
ChrisM@home
ChrisM@home
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1161 Visits: 9734
How are you doing, Abu Dina?

As luck would have it, one or two of us have been experimenting with exactly this requirement here on this thread, which will continue to run for a day or two. Enjoy, and pick your favourite from the mix. Each method has something different to offer (or bitch about, depending on your POV).
The fastest method by far - and it seems a good fit to your requirement - is the iTVF in which a table variable is hard coded with the find and replace characters. If it's not there (it doesn't work with the test harness), I'll post it tomorrow.

Cheers

ChrisM


Low-hanging fruit picker and defender of the moggies





For better assistance in answering your questions, please read this.




Understanding and using APPLY, (I) and (II) Paul White

Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Abu Dina
Abu Dina
Right there with Babe
Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)

Group: General Forum Members
Points: 723 Visits: 3323
Haha Sean..... I stopped at O as I didn't want to give away how other keys are created. This is not as bizarre as you think. A lot of work has gone into this. And when it's finished it will help me a lot.

Not tried your solution but will do first thing tomorrow morning.

Cheers for the effort.

---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Abu Dina
Abu Dina
Right there with Babe
Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)

Group: General Forum Members
Points: 723 Visits: 3323
Hi Chris hope you're well!

Thanks for the thread. I shall go and have a thorough read in a mo.

I'm really grateful you brought my attention to iTVF Last year.
I've been using it with great results and when this is finished it will end up wrapped inside an iTFV.

BTW, do you find my requirement as bizarre as Sean thinks lol.. I'm beginning to worry now...

---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
ChrisM@home
ChrisM@home
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1161 Visits: 9734
Abu Dina (1/23/2013)
Hi Chris hope you're well!

Thanks for the thread. I shall go and have a thorough read in a mo.

I'm really grateful you brought my attention to iTVF Last year.
I've been using it with great results and when this is finished it will end up wrapped inside an iTFV.

BTW, do you find my requirement as bizarre as Sean thinks lol.. I'm beginning to worry now...


Nah mate I've seen this before, it's a method used by some of the professional matching packages. MatchIT, IIRC.

BTW here's probably the "right way" to do what you want:

SELECT Forename, Surname, x.PhoneticNameKey
FROM #Names
CROSS APPLY (
SELECT PhoneticNameKey =
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
Surname COLLATE Latin1_General_BIN,
'A','y'),'B','b'),'C','k'),'D','d'),'E','y'),'F','f'),'G','g'),'H','h'),'NI','m')
,'I','y'),'J','j'),'KN','n'),'K','k'),'L','l'),'M','m'),'NT','m'),'N','n'),'O','y')
) x





Low-hanging fruit picker and defender of the moggies





For better assistance in answering your questions, please read this.




Understanding and using APPLY, (I) and (II) Paul White

Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Abu Dina
Abu Dina
Right there with Babe
Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)

Group: General Forum Members
Points: 723 Visits: 3323
Lol Chris you scare me.... And embarrass me at the same time.

Simple solution....

This is another addition to the various phonetic algorithm implementations I have. Always hoping for better match keys lol!

Thanks.

---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Abu Dina
Abu Dina
Right there with Babe
Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)

Group: General Forum Members
Points: 723 Visits: 3323
I've been trying all morning to get a set of REPLACES to work but there are too many rules to apply.

I may have to go down the CLR route for this.

---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
ChrisM@home
ChrisM@home
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1161 Visits: 9734
Abu Dina (1/23/2013)
Lol Chris you scare me.... And embarrass me at the same time.

Simple solution....

This is another addition to the various phonetic algorithm implementations I have. Always hoping for better match keys lol!

Thanks.


Hey geezer!
I'll tell you what's scary - it's taken 20 years for me to be able to figure out an answer to your question on the second shot, but there are folks at almost every gig I go to, who can do better, after only three years' playing with SQL. There are some very talented players on the field and we all chase the same jobs...


Low-hanging fruit picker and defender of the moggies





For better assistance in answering your questions, please read this.




Understanding and using APPLY, (I) and (II) Paul White

Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search