Help with implemting phonetic algorithm

  • Please consider the following example:

    create table #Names (Forename nvarchar(50), Surname nvarchar(50), PhoneticNameKey nvarchar(20))

    insert into #Names(Forename, Surname)

    select 'JOSE', 'ANTINORI'

    select * from #Names

    ;with cte as

    (

    select

    substring(Surname, 1, 1) as Chars,

    stuff(Surname, 1, 1, '') as Surname,

    1 as RowID

    from #Names

    union all

    select

    substring(Surname, 1, 1) as Chars,

    stuff(Surname, 1, 1, '') as data,

    RowID + 1 as RowID

    from cte

    where len(Surname) > 0

    )

    select RowID, Chars into #StringInTable

    from cte

    order by RowID

    select * from #StringInTable

    The idea here is to perform a series of replaces on the characters to produce a phonetic key. So I started off with something like this:

    select case chars

    when 'A' then 'y'

    when 'B' then 'b'

    when 'C' then 'k'

    when 'D' then 'd'

    when 'E' then 'y'

    when 'F' then 'f'

    when 'G' then 'g'

    when 'H' then 'h'

    when 'I' then 'y'

    when 'J' then 'j'

    when 'K' then 'k' -- but if K is followed by N then should become n

    when 'L' then 'l'

    when 'M' then 'm'

    when 'N' then 'n' -- if N is followed by I or T then set to m

    when 'O' then 'y'

    end

    from #StringInTable

    But as you can see, for some characters, I need to check the next character to decide on the phonetic character to use.

    Can someone help me with this?

    Thanks in advance.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • What a completely bizarre requirement. However you did provide very easy to consume data!!! A solution here is not really too bad. Just add the next character into your CTE and you then have access to the "next" character.

    if object_id('tempdb..#Names') is not null

    drop table #Names

    if object_id('tempdb..#StringInTable') is not null

    drop table #StringInTable

    create table #Names (Forename nvarchar(50), Surname nvarchar(50), PhoneticNameKey nvarchar(20))

    insert into #Names(Forename, Surname)

    select 'JOSE', 'ANTINORI' union all

    select 'TEST', 'Knuckle' union all

    select 'ITest', 'Nint'

    select * from #Names

    ;with cte as

    (

    select

    substring(Surname, 1, 1) as Chars,

    substring(Surname, 2, 1) as Char2,

    stuff(Surname, 1, 1, '') as Surname,

    1 as RowID

    from #Names

    union all

    select

    substring(Surname, 1, 1) as Chars,

    substring(Surname, 2, 1) as Char2,

    stuff(Surname, 1, 1, '') as data,

    RowID + 1 as RowID

    from cte

    where len(Surname) > 0

    )

    select RowID, Chars, Char2 into #StringInTable

    from cte

    order by RowID

    select * from #StringInTable

    select chars, char2, case chars

    when 'A' then 'y'

    when 'B' then 'b'

    when 'C' then 'k'

    when 'D' then 'd'

    when 'E' then 'y'

    when 'F' then 'f'

    when 'G' then 'g'

    when 'H' then 'h'

    when 'I' then 'y'

    when 'J' then 'j'

    when 'K' then case when char2 = 'N' then 'n' else 'k' end -- but if K is followed by N then should become n

    when 'L' then 'l'

    when 'M' then 'm'

    when 'N' then case when char2 in ('I', 'T') then 'm' else 'n' end -- if N is followed by I or T then set to m

    when 'O' then 'y'

    end

    from #StringInTable

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • How are you doing, Abu Dina?

    As luck would have it, one or two of us have been experimenting with exactly this requirement here on this thread, which will continue to run for a day or two. Enjoy, and pick your favourite from the mix. Each method has something different to offer (or bitch about, depending on your POV).

    The fastest method by far - and it seems a good fit to your requirement - is the iTVF in which a table variable is hard coded with the find and replace characters. If it's not there (it doesn't work with the test harness), I'll post it tomorrow.

    Cheers

    ChrisM


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Haha Sean..... I stopped at O as I didn't want to give away how other keys are created. This is not as bizarre as you think. A lot of work has gone into this. And when it's finished it will help me a lot.

    Not tried your solution but will do first thing tomorrow morning.

    Cheers for the effort.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Hi Chris hope you're well!

    Thanks for the thread. I shall go and have a thorough read in a mo.

    I'm really grateful you brought my attention to iTVF Last year.

    I've been using it with great results and when this is finished it will end up wrapped inside an iTFV.

    BTW, do you find my requirement as bizarre as Sean thinks lol.. I'm beginning to worry now...

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Abu Dina (1/23/2013)


    Hi Chris hope you're well!

    Thanks for the thread. I shall go and have a thorough read in a mo.

    I'm really grateful you brought my attention to iTVF Last year.

    I've been using it with great results and when this is finished it will end up wrapped inside an iTFV.

    BTW, do you find my requirement as bizarre as Sean thinks lol.. I'm beginning to worry now...

    Nah mate I've seen this before, it's a method used by some of the professional matching packages. MatchIT, IIRC.

    BTW here's probably the "right way" to do what you want:

    SELECT Forename, Surname, x.PhoneticNameKey

    FROM #Names

    CROSS APPLY (

    SELECT PhoneticNameKey =

    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(

    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(

    Surname COLLATE Latin1_General_BIN,

    'A','y'),'B','b'),'C','k'),'D','d'),'E','y'),'F','f'),'G','g'),'H','h'),'NI','m')

    ,'I','y'),'J','j'),'KN','n'),'K','k'),'L','l'),'M','m'),'NT','m'),'N','n'),'O','y')

    ) x


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Lol Chris you scare me.... And embarrass me at the same time.

    Simple solution....

    This is another addition to the various phonetic algorithm implementations I have. Always hoping for better match keys lol!

    Thanks.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • I've been trying all morning to get a set of REPLACES to work but there are too many rules to apply.

    I may have to go down the CLR route for this.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Abu Dina (1/23/2013)


    Lol Chris you scare me.... And embarrass me at the same time.

    Simple solution....

    This is another addition to the various phonetic algorithm implementations I have. Always hoping for better match keys lol!

    Thanks.

    Hey geezer!

    I'll tell you what's scary - it's taken 20 years for me to be able to figure out an answer to your question on the second shot, but there are folks at almost every gig I go to, who can do better, after only three years' playing with SQL. There are some very talented players on the field and we all chase the same jobs...


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply