MetaPhone - Implementing

  • I can find many articles on MetaPhone, but nothing that is understandable on how to implement it and use in Query Analyzer

    I need to use this type of function to de-dupe my database.  SoundEx is very bad.  Any suggestions or resources you can recommend would be much appreciated!

     

    Thanks

  • first google hit...

    http://www.windowsitpro.com/Article/ArticleID/26094/26094.html?Ad=1

    alternatively, just wrap Lawrence Phillips C++ version

  • I'm not sure how "understandable" this will be, as the algorithm is very complex, but here's an implementation I found on the net some time ago.  (The ) are close parenthesis -- How do you turn off the smilie substitution, or format code?)


    declare @ret char(20)

    exec @ret = doublemetaphone 'rachael'

    print @ret

    exec @ret = doublemetaphone 'rachel'

    print @ret

    exec @ret = doublemetaphone 'richelle'

    print @ret

    /* Metaphone */

    --**************************************

    --    

    -- Name: dbo.DoubleMetaPhone

    -- Description:Creates a better phonetic

    --      matching key than Soundex.

    -- By: Keith Henry

    --

    -- Inputs: proper name to generate key

    --

    -- Returns: char(10) string containing two

    --     char(5) keys

    -- The first 5 chars are the most common pronouciation, AND the last 5 are the next most common.

    --

    -- Assumes: This is based off a c++ article that described a better phonetic algorithm than soundex.

    -- I re-wrote the algorithm IN t-SQL (the languages are too different to simply translate) so that

    -- I could USE it WITH our UK Market File.  Soundex IS OK, but it is OVER 100 years old now AND it

    -- misses an awful lot OF names that should match.

    --

    -- Side Effects: More load than soundex, but still much better.

    --

    -- This code is copyrighted and has-- limited warranties.Please see http://

    --     http://www.Planet-Source-Code.com/vb/scripts/Sh

    --     owCode.asp?txtCodeId=519&lngWId=5--for details.--**************************************

    --    

    ALTER FUNCTION dbo.DoubleMetaPhone (@str varchar(70))

    RETURNS char(20)

    AS

        BEGIN

         

         /*#########################################################################

         

         DOUBLE Metaphone Phonetic Matching Function

         

         This reduces word TO approximate phonetic string. This is deliberately

         NOT a direct phonetic

         

         Based OFF original C++ code AND algorithm BY

              Lawrence Philips (lphilips_AT_verity.com)

         

         Published IN the C/C++ Users Journal:

          http://www.cuj.com/articles/2000/0006/0006d/0006d.htm?topic=articles

         

         Original Metaphone presented in article in "Computer Language" in 1990.

         

         Reduces alphabet TO

         

           The 14 constonant sounds:

           "sh""p"or"b" "th"

           | | |

           X S K J T F H L M N P R 0 W

         

           DROP vowels EXCEPT at the beginning

         

         Produces a char(10) string. The left(@result,5) gives the most common

         pronouciation, right(@result,5) gives the commonest alternate.

         

         

         Translated INTO t-SQL BY Keith Henry (keithh@lbm-solutions.com)

         

         #########################################################################*/

         DECLARE @original  varchar(70),

          @primary varchar(70),

          @secondary  varchar(70),

          @length  int,

          @last   int,

          @current int,

          @strcur1 char(1) ,

          @strnext1  char(1) ,

          @strprev1 char(1),

          @SlavoGermanic  bit

         SET @SlavoGermanic = 0

         SET @primary   = ''

         SET @secondary  = ''

         SET @current   = 1

         SET @length  = len(@str)

         SET @Last   = @length

         SET @original   = lTrim(isnull(@str,'')) + ' '

         

         SET @original   = upper(@original)

         IF patindex('%[WK]%',@str) + charindex('CZ',@str) + charindex('WITZ',@str) <> 0

          SET @SlavoGermanic = 1

         -- skip this at beginning OF word

         IF substring(@original, 1, 2) in ('GN', 'KN', 'PN', 'WR', 'PS')

           SET @current = @current + 1

         -- Initial 'X' IS pronounced 'Z' e.g. 'Xavier'

         IF substring(@original, 1, 1) = 'X'

         BEGIN

          SET @primary = @primary + 'S'-- 'Z' maps TO 'S'

          SET @secondary = @secondary + 'S'

          SET @current = @current + 1

         END

         IF substring(@original, 1, 1) in ('A', 'E', 'I', 'O', 'U', 'Y')

         BEGIN

          SET @primary = @primary + 'A' -- ALL init vowels now map TO 'A'

          SET @secondary = @secondary + 'A'

          SET @current = @current + 1

         END

         WHILE @current <= @length

         BEGIN

          IF len(@primary) >= 10 BREAK

         

          SET @strcur1 = substring(@original, @current, 1)

          SET @strnext1 = substring(@original, (@current + 1), 1)

          SET @strprev1 = substring(@original, (@current - 1), 1)

          

          IF @strcur1 IN ('A', 'E', 'I', 'O', 'U', 'Y', ' ', '''', '-')

           SET @current = @current + 1

          ELSE

          

          IF @strcur1 = 'B'   -- '-mb', e.g. 'dumb', already skipped OVER...

          BEGIN

           SET @primary = @primary + 'P'

           SET @secondary = @secondary + 'P'

           

           IF @strnext1 = 'B'

            SET @current = @current + 2

           ELSE

            SET @current = @current + 1

          END

          ELSE

         

          IF @strcur1 = 'Ç'

          BEGIN

           SET @primary = @primary + 'S'

           SET @secondary = @secondary + 'S'

           SET @current = @current + 1

          END

          ELSE

         

          IF @strcur1 = 'C'

          BEGIN 

           IF @strnext1 = 'H'

           BEGIN  

          

            IF substring(@original, @current, 4) = 'CHIA' -- italian 'chianti'

            BEGIN

             SET @primary = @primary + 'K'

             SET @secondary = @secondary + 'K'

            END

            ELSE

            BEGIN

             IF @current > 1 -- find 'michael'

              AND substring(@original, @current, 4) = 'CHAE'

             BEGIN

               SET @primary = @primary + 'K'

               SET @secondary = @secondary + 'X'

             END

             ELSE

             BEGIN

              IF @current = 1  -- greek roots e.g. 'chemistry', 'chorus'

               AND (substring(@original, @current + 1, 5) in ('HARAC', 'HARIS')

                OR substring(@original, @current + 1, 3) in ('HOR', 'HYM', 'HIA', 'HEM')

              &nbsp

               AND substring(@original, 1, 5) <> 'CHORE'

              BEGIN

               SET @primary = @primary + 'K'

               SET @secondary = @secondary + 'K'

              END

              ELSE

              BEGIN

               IF  ( substring(@original, 0, 4) in ('VAN ', 'VON ') -- germanic, greek, or otherwise 'ch' FOR 'kh' sound

                 OR substring(@original, 0, 3) = 'SCH'

               &nbsp

                OR substring(@original, @current - 2, 6) in ('ORCHES', 'ARCHIT', 'ORCHID') -- 'architect' but NOT 'arch', orchestra', 'orchid'

                OR substring(@original, @current + 2, 1) in ('T', 'S')

                OR  ( ( @strprev1 IN ('A','O','U','E')

                   OR @current = 1

                 &nbsp

                 AND substring(@original, @current + 2, 1) in ('L','R','N','M','B','H','F','V','W',' ') -- e.g. 'wachtler', 'weschsler', but NOT 'tichner'

               &nbsp

               BEGIN

                SET @primary = @primary + 'K'

                SET @secondary = @secondary + 'K'

               END

               ELSE

               BEGIN

                IF (@current > 1)

                BEGIN

                 IF substring(@original, 1, 2) = 'MC' -- e.g. 'McHugh'

                 BEGIN

                  SET @primary = @primary + 'K'

                  SET @secondary = @secondary + 'K'

                 END

                 ELSE

                 BEGIN

                  SET @primary = @primary + 'X'

                  SET @secondary = @secondary + 'K'

                 END

                END

                ELSE

                BEGIN

                 SET @primary = @primary + 'X'

                 SET @secondary = @secondary + 'X'

                END

               END

              END

             END

            END

            SET @current = @current + 2

           END --ch logic

           ELSE

           BEGIN

            IF @strnext1 = 'C' -- DOUBLE 'C', but NOT McClellan'

             AND not(@current = 1

               AND substring(@original, 1, 1) = 'M'

             &nbsp

            BEGIN

             IF substring(@original, @current + 2, 1) in ('I','E','H') -- 'bellocchio' but NOT 'bacchus'

              AND substring(@original, @current + 2, 2) <> 'HU'

             BEGIN

              IF ( @current = 2 -- 'accident', 'accede', 'succeed'

                AND @strprev1 = 'A'

              &nbsp

               OR substring(@original, @current - 1, 5) in ('UCCEE', 'UCCES')

              BEGIN

               SET @primary = @primary + 'KS'

               SET @secondary = @secondary + 'KS'

              END

              ELSE

              BEGIN -- 'bacci', 'bertucci', other italian

               SET @primary = @primary + 'X'

               SET @secondary = @secondary + 'X'

               -- e.g. 'focaccia' IF substring(@original, @current, 4) = 'CCIA' 

              END

              SET @current = @current + 3

             END

             ELSE

             BEGIN

              SET @primary = @primary + 'K' -- Pierce's RULE

              SET @secondary = @secondary + 'K'

              SET @current = @current + 2

             END

            END

            ELSE

            BEGIN

             IF @strnext1 IN ('K','G','Q')

             BEGIN

              SET @primary = @primary + 'K'

              SET @secondary = @secondary + 'K'

              SET @current = @current + 2

             END

             ELSE

             BEGIN

              IF @strnext1 IN ('I','E','Y')

              BEGIN

               IF substring(@original, @current, 3) in ('CIO','CIE','CIA') -- italian vs. english

               BEGIN

                SET @primary = @primary + 'S'

                SET @secondary = @secondary + 'X'

               END

               ELSE

               BEGIN

                SET @primary = @primary + 'S'

                SET @secondary = @secondary + 'S'

               END

               SET @current = @current + 2

              END

              ELSE

              BEGIN

               IF @strnext1 = 'Z' -- e.g. 'czerny'

                AND substring(@original, @current -2, 4) <> 'WICZ'

               BEGIN

                SET @primary = @primary + 'S'

                SET @secondary = @secondary + 'X'

                SET @current = @current + 2

               END

               ELSE

               BEGIN

                IF @current > 2 -- various gremanic

                 AND substring(@original, @current - 2,1) NOT in ('A', 'E', 'I', 'O', 'U', 'Y')

                 AND substring(@original, @current - 1, 3) = 'ACH'

                 AND ((substring(@original, @current + 2, 1) <> 'I')

                  AND ((substring(@original, @current + 2, 1) <> 'E')

                   OR substring(@original, @current - 2, 6) in ('BACHER', 'MACHER')

                 &nbsp

                &nbsp

                BEGIN

                 SET @primary = @primary + 'K'

                 SET @secondary = @secondary + 'K'

                 SET @current = @current + 2

                END

                ELSE

                BEGIN

                 IF @current = 1 -- special CASE 'caesar'

                  AND substring(@original, @current, 6) = 'CAESAR'

                 

                 BEGIN

                  SET @primary = @primary + 'S'

                  SET @secondary = @secondary + 'S'

                  SET @current = @current + 2

                 END

                 ELSE

                 BEGIN -- final ELSE

                  SET @primary = @primary + 'K'

                  SET @secondary = @secondary + 'K'

                 

                  IF substring(@original, @current + 1, 2) in (' C',' Q',' G') -- name sent in 'mac caffrey', 'mac gregor'

                   SET @current = @current + 3

                  ELSE

                    SET @current = @current + 1

                 END

                END

               END

              END

             END

            END

           END

          END

          ELSE

         

          IF @strcur1 = 'D'

          BEGIN

           IF @strnext1 = 'G'

           BEGIN

            IF substring(@original, @current + 2, 1) in ('I','E','Y')

            BEGIN

             SET @primary = @primary + 'J' -- e.g. 'edge'

             SET @secondary = @secondary + 'J'

             SET @current = @current + 3

            END

            ELSE

            BEGIN

             SET @primary = @primary + 'TK' -- e.g. 'edgar'

             SET @secondary = @secondary + 'TK'

             SET @current = @current + 2

            END

           END

           ELSE

           BEGIN

            IF substring(@original, @current, 2) in ('DT','DD')

            BEGIN

             SET @primary = @primary + 'T'

             SET @secondary = @secondary + 'T'

             SET @current = @current + 2

            END

            ELSE

            BEGIN

             SET @primary = @primary + 'T'

             SET @secondary = @secondary + 'T'

             SET @current = @current + 1

            END

           END

          END

          ELSE

         

          IF @strcur1 = 'F'

          BEGIN

           SET @primary = @primary + 'F'

           SET @secondary = @secondary + 'F'

           IF (@strnext1 = 'F')

            SET @current = @current + 2

           ELSE

            SET @current = @current + 1

          END

          ELSE

         

          IF @strcur1 = 'G'

          BEGIN

           IF (@strnext1 = 'H')

           BEGIN

            IF @current > 1

             AND @strprev1 NOT IN ('A', 'E', 'I', 'O', 'U', 'Y')

            BEGIN

             SET @primary = @primary + 'K'

             SET @secondary = @secondary + 'K'

            END

            ELSE

            BEGIN

           

             IF  not( (@current > 2 -- Parker's RULE (with SOME further refinements) - e.g. 'hugh'

                AND substring(@original, @current - 2, 1) in ('B','H','D')

              &nbsp -- e.g. 'bough'

               OR (@current > 3

                AND substring(@original, @current - 3, 1) in ('B','H','D')

              &nbsp -- e.g. 'broughton'

               OR (@current > 4

                AND substring(@original, @current - 4, 1) in ('B','H')

             &nbsp&nbsp

             BEGIN

              IF @current > 3  -- e.g. 'laugh', 'McLaughlin', 'cough', 'gough', 'rough', 'tough'

               AND @strprev1 = 'U'

               AND substring(@original, @current - 3, 1) in ('C','G','L','R','T')

              BEGIN

               SET @primary = @primary + 'F'

               SET @secondary = @secondary + 'F'

              END

              ELSE

              BEGIN

               IF @current > 1

                AND @strprev1 <> 'I'

               BEGIN

                SET @primary = @primary + 'K'

                SET @secondary = @secondary + 'K'

               END

               ELSE

               BEGIN

                IF (@current < 4)

                BEGIN

                 IF (@current = 1) -- 'ghislane', 'ghiradelli'

                 BEGIN

                  IF (substring(@original, @current + 2, 1) = 'I')

                  BEGIN

                   SET @primary = @primary + 'J'

                   SET @secondary = @secondary + 'J'

                  END

                  ELSE

                  BEGIN

                   SET @primary = @primary + 'K'

                   SET @secondary = @secondary + 'K'

                  END

                 END

                END

               END

              END

             END

            END

            SET @current = @current + 2

           END

           ELSE

           BEGIN

            IF (@strnext1 = 'N')

            BEGIN

             IF @current = 1

              AND substring(@original, 0,1) in ('A', 'E', 'I', 'O', 'U', 'Y')

              AND @SlavoGermanic = 0

             BEGIN

              SET @primary = @primary + 'KN'

              SET @secondary = @secondary + 'N'

             END

             ELSE

             BEGIN

              -- NOT e.g. 'cagney'

              IF substring(@original, @current + 2, 2) = 'EY'

               AND (@strnext1 <> 'Y')

               AND @SlavoGermanic = 0

              BEGIN

               SET @primary = @primary + 'N'

               SET @secondary = @secondary + 'KN'

              END

              ELSE

              BEGIN

               SET @primary = @primary + 'KN'

               SET @secondary = @secondary + 'KN'

              END

             END

             SET @current = @current + 2

            END

            ELSE

            BEGIN

             IF substring(@original, @current + 1, 2) = 'LI' -- 'tagliaro'

              AND @SlavoGermanic = 0

             BEGIN

              SET @primary = @primary + 'KL'

              SET @secondary = @secondary + 'L'

              SET @current = @current + 2

             END

             ELSE

             BEGIN

              IF @current = 1  -- -ges-, -gep-, -gel- at beginning

               AND (@strnext1 = 'Y'

                OR substring(@original, @current + 1, 2) in ('ES','EP','EB','EL','EY','IB','IL','IN','IE', 'EI','ER')

              &nbsp

              BEGIN

               SET @primary = @primary + 'K'

               SET @secondary = @secondary + 'J'

               SET @current = @current + 2

              END

              ELSE

              BEGIN

               IF (substring(@original, @current + 1, 2) = 'ER' -- -ger-, -gy-

                OR @strnext1 = 'Y'

               &nbsp

                 AND substring(@original, 1, 6) NOT in ('DANGER','RANGER','MANGER')

                 AND @strprev1 NOT IN ('E', 'I')

                 AND substring(@original, @current - 1, 3) NOT in ('RGY','OGY')

               BEGIN

                SET @primary = @primary + 'K'

                SET @secondary = @secondary + 'J'

                SET @current = @current + 2

               END

               ELSE

               BEGIN

                IF @strnext1 IN ('E','I','Y') -- italian e.g. 'biaggi'

                 OR substring(@original, @current -1, 4) in ('AGGI','OGGI')

                BEGIN

                 IF (substring(@original, 1, 4) in ('VAN ', 'VON ') -- obvious germanic

                  OR substring(@original, 1, 3) = 'SCH'

                 &nbsp

                  OR substring(@original, @current + 1, 2) = 'ET'

                 BEGIN

                  SET @primary = @primary + 'K'

                  SET @secondary = @secondary + 'K'

                 END

                 ELSE

                 BEGIN

                  -- always soft IF french ending

                  IF substring(@original, @current + 1, 4) = 'IER '

                  BEGIN

                   SET @primary = @primary + 'J'

                   SET @secondary = @secondary + 'J'

                  END

                  ELSE

                  BEGIN

                   SET @primary = @primary + 'J'

                   SET @secondary = @secondary + 'K'

                  END

                 END

                 SET @current = @current + 2

                END

                ELSE

                BEGIN -- other options exausted call it k sound

                 SET @primary = @primary + 'K'

                 SET @secondary = @secondary + 'K'

                 IF (@strnext1 = 'G')

                  SET @current = @current + 2

                 ELSE

                  SET @current = @current + 1

                END

               END

              END

             END

            END

           END

          END

          ELSE

         

          IF @strcur1 = 'H'

          BEGIN

           IF (@current = 1  -- ONLY keep if first & before vowel or btw. 2 vowels

             OR @strprev1 IN ('A', 'E', 'I', 'O', 'U', 'Y')

           &nbsp

            AND @strnext1 IN ('A', 'E', 'I', 'O', 'U', 'Y')

           BEGIN

            SET @primary = @primary + 'H'

            SET @secondary = @secondary + 'H'

            SET @current = @current + 2

           END

           ELSE

            SET @current = @current + 1

          END

          ELSE

         

          IF @strcur1 = 'J'

          BEGIN

           IF substring(@original, @current, 4) = 'JOSE' -- obvious spanish, 'jose', 'san jacinto'

            OR substring(@original, 1, 4) = 'SAN '

           BEGIN

            IF (@current = 1

             AND substring(@original, @current + 4, 1) = ' '

            &nbsp

             OR substring(@original, 1, 4) = 'SAN '

            BEGIN

             SET @primary = @primary + 'H'

             SET @secondary = @secondary + 'H'

            END

            ELSE

            BEGIN

             SET @primary = @primary + 'J'

             SET @secondary = @secondary + 'H'

            END

          

            SET @current = @current + 1

           END

           ELSE

           BEGIN

            IF @current = 1

            BEGIN

             SET @primary = @primary + 'J' -- Yankelovich/Jankelowicz

             SET @secondary = @secondary + 'A'

             SET @current = @current + 1

            END

            ELSE

            BEGIN

             IF @strprev1 IN ('A', 'E', 'I', 'O', 'U', 'Y') -- spanish pron. OF .e.g. 'bajador'

              AND @SlavoGermanic = 0

              AND @strnext1 IN ('A','O')

             BEGIN

              SET @primary = @primary + 'J'

              SET @secondary = @secondary + 'H'

              SET @current = @current + 1

             END

             ELSE

             BEGIN

              IF (@current = @Last)

              BEGIN

               SET @primary = @primary + 'J'

               SET @secondary = @secondary + ''

               SET @current = @current + 1

              END

              ELSE

              BEGIN

               IF @strnext1 IN ('L','T','K','S','N','M','B','Z')

                AND @strprev1 NOT IN ('S','K','L')

               BEGIN

                SET @primary = @primary + 'J'

                SET @secondary = @secondary + 'J'

                SET @current = @current + 1

               END

               ELSE

               BEGIN

                IF (@strnext1 = 'J') -- it could happen

                 SET @current = @current + 2

                ELSE

                 SET @current = @current + 1

               END

              END

             END 

            END

           END

          END

          ELSE

         

          IF @strcur1 = 'K'

          BEGIN

           SET @primary = @primary + 'K'

           SET @secondary = @secondary + 'K'

         

           IF (@strnext1 = 'K')

            SET @current = @current + 2

           ELSE

            SET @current = @current + 1

          END

          ELSE

         

          IF @strcur1 = 'L'

          BEGIN

           IF (@strnext1 = 'L')

           BEGIN

            IF (@current = (@length - 3) -- spanish e.g. 'cabrillo', 'gallegos'

             AND substring(@original, @current - 1, 4) in ('ILLO','ILLA','ALLE')

            &nbsp

             OR ((substring(@original, @Last - 1, 2) in ('AS','OS')

               OR substring(@original, @Last, 1) in ('A','O')

             &nbsp

              AND substring(@original, @current - 1, 4) = 'ALLE'

            &nbsp

             SET @primary = @primary + 'L' --Alternate IS silent

            ELSE

            BEGIN

             SET @primary = @primary + 'L'

             SET @secondary = @secondary + 'L'

            END

             SET @current = @current + 2

           END

           ELSE

           BEGIN

            SET @current = @current + 1

            SET @primary = @primary + 'L'

            SET @secondary = @secondary + 'L'

           END

          END

          ELSE

         

          IF @strcur1 = 'M'

          BEGIN

           SET @primary = @primary + 'M'

           SET @secondary = @secondary + 'M'

         

           IF substring(@original, @current - 1, 3) = 'UMB'

             AND (@current + 1 = @Last

              OR substring(@original, @current + 2, 2) = 'ER'

            &nbsp -- 'dumb', 'thumb'

            OR @strnext1 = 'M'

            SET @current = @current + 2

           ELSE

            SET @current = @current + 1

          END

          ELSE

         

          IF @strcur1 IN ('N','Ñ')

          BEGIN

           SET @primary = @primary + 'N'

           SET @secondary = @secondary + 'N'

         

           IF @strnext1 IN ('N','Ñ')

            SET @current = @current + 2

           ELSE

            SET @current = @current + 1

          END

          ELSE

         

          IF @strcur1 = 'P'

          BEGIN

           IF (@strnext1 = 'H')

           BEGIN

            SET @current = @current + 2

            SET @primary = @primary + 'F'

            SET @secondary = @secondary + 'F'

           END

           ELSE

           BEGIN

            -- also account FOR 'campbell' AND 'raspberry'

            IF @strnext1 IN ('P','B')

             SET @current = @current + 2

            ELSE

            BEGIN

             SET @current = @current + 1

             SET @primary = @primary + 'P'

             SET @secondary = @secondary + 'P'

            END

           END

          END

          ELSE

         

          IF @strcur1 = 'Q'

          BEGIN

           SET @primary = @primary + 'K'

           SET @secondary = @secondary + 'K'

           

           IF (@strnext1 = 'Q')

            SET @current = @current + 2

           ELSE

            SET @current = @current + 1

          END

          ELSE

         

          IF @strcur1 = 'R'

          BEGIN

           IF @current = @Last -- french e.g. 'rogier', but exclude 'hochmeier'

            AND @SlavoGermanic = 0

            AND substring(@original, @current - 2, 2) = 'IE'

            AND substring(@original, @current - 4, 2) NOT in ('ME','MA')

            SET @secondary = @secondary + 'R' --set @primary = @primary + ''

           ELSE

           BEGIN

            SET @primary = @primary + 'R'

            SET @secondary = @secondary + 'R'

           END

         

           IF (@strnext1 = 'R')

           BEGIN

            IF substring(@original, @current, 3) = 'RRI' --alternate Kerrigan, Corrigan

             SET @secondary = @secondary + 'R'

            

            SET @current = @current + 2

           END

           ELSE

            SET @current = @current + 1

          END

          ELSE

         

          IF @strcur1 = 'S'

          BEGIN

           IF substring(@original, @current - 1, 3) in ('ISL','YSL') -- special cases 'island', 'isle', 'carlisle', 'carlysle'

            SET @current = @current + 1 --silent s

           ELSE

           BEGIN

            IF substring(@original, @current, 2) = 'SH'

            BEGIN

             -- germanic

             IF substring(@original, @current + 1, 4) in ('HEIM','HOEK','HOLM','HOLZ')

             BEGIN

              SET @primary = @primary + 'S'

              SET @secondary = @secondary + 'S'

             END

             ELSE

             BEGIN

              SET @primary = @primary + 'X'

              SET @secondary = @secondary + 'X'

             END

           

             SET @current = @current + 2

            END

            ELSE

            BEGIN

          

            

           

            

             -- italian & armenian

             IF substring(@original, @current, 3) in ('SIO','SIA')

              OR substring(@original, @current, 4) in ('SIAN')

             BEGIN

              IF @SlavoGermanic = 0

              BEGIN

               SET @primary = @primary + 'S'

               SET @secondary = @secondary + 'X'

              END

              ELSE

              BEGIN

               SET @primary = @primary + 'S'

               SET @secondary = @secondary + 'S'

              END

            

              SET @current = @current + 3

             END

             ELSE

             BEGIN

              IF (@current = 1     -- german & anglicisations, e.g. 'smith' match 'schmidt', 'snider' match 'schneider'

                AND @strnext1 IN ('M','N','L','W') -- also, -sz- in slavic language altho in hungarian it IS pronounced 's'

              &nbsp

               OR @strnext1 = 'Z'

              BEGIN

               SET @primary = @primary + 'S'

               SET @secondary = @secondary + 'X'

           

               IF @strnext1 = 'Z'

                SET @current = @current + 2

               ELSE

                SET @current = @current + 1

              END

              ELSE

              BEGIN

               IF substring(@original, @current, 2) = 'SC'

               BEGIN

                IF substring(@original, @current + 2, 1) = 'H' -- Schlesinger's RULE

                BEGIN

                 IF substring(@original, @current + 3, 2) in ('OO','ER','EN','UY','ED','EM') -- dutch origin, e.g. 'school', 'schooner'

                 BEGIN

                  IF substring(@original, @current + 3, 2) in ('ER','EN') -- 'schermerhorn', 'schenker'

                  BEGIN

                   SET @primary = @primary + 'X'

                   SET @secondary = @secondary + 'SK'

                  END

                  ELSE

                  BEGIN

                   SET @primary = @primary + 'SK'

                   SET @secondary = @secondary + 'SK'

                  END

              

                  SET @current = @current + 3

                 END

                 ELSE

                 BEGIN

                  IF @current = 1

                   AND substring(@original, 3,1) NOT in ('A', 'E', 'I', 'O', 'U', 'Y')

                   AND substring(@original, @current + 3, 1) <> 'W'

                  BEGIN

                   SET @primary = @primary + 'X'

                   SET @secondary = @secondary + 'S'

                  END

                  ELSE

                  BEGIN

                   SET @primary = @primary + 'X'

                   SET @secondary = @secondary + 'X'

                  END

                  

                  SET @current = @current + 3

                 END

                END

                ELSE

                BEGIN

                 IF substring(@original, @current + 2, 1) in ('I','E','Y')

                 BEGIN

                  SET @primary = @primary + 'S'

                  SET @secondary = @secondary + 'S'

                 END

                 ELSE

                 BEGIN

                  SET @primary = @primary + 'SK'

                  SET @secondary = @secondary + 'SK'

                 END

                 SET @current = @current + 3

                END

               END

               ELSE

               BEGIN

                IF @current = 1  -- special CASE 'sugar-'

                 AND substring(@original, @current, 5) = 'SUGAR'

                BEGIN

                 SET @primary = @primary + 'X'

                 SET @secondary = @secondary + 'S'

                 SET @current = @current + 1

                END

                ELSE

                BEGIN

                 IF @current = @Last -- french e.g. 'resnais', 'artois'

                  AND substring(@original, @current - 2, 2) in ('AI','OI')

                  SET @secondary = @secondary + 'S' --set @primary = @primary + ''

                 ELSE

                 BEGIN

                  SET @primary = @primary + 'S'

                  SET @secondary = @secondary + 'S'

                 END

                 

                 IF @strnext1 IN ('S','Z')

                  SET @current = @current + 2

                 ELSE

                  SET @current = @current + 1

                END

               END

              END

             END

            END

           END

          END

          ELSE

         

          IF @strcur1 = 'T'

          BEGIN

           IF substring(@original, @current, 4) = 'TION'

           BEGIN

            SET @primary = @primary + 'X'

            SET @secondary = @secondary + 'X'

            SET @current = @current + 3

           END

           ELSE

            IF substring(@original, @current, 3) in ('TIA','TCH')

            BEGIN

             SET @primary = @primary + 'X'

             SET @secondary = @secondary + 'X'

             SET @current = @current + 3

            END

            ELSE

             IF substring(@original, @current, 2) = 'TH'

              OR substring(@original, @current, 3) = 'TTH'

             BEGIN

              IF substring(@original, @current + 2, 2) in ('OM','AM') -- special CASE 'thomas', 'thames' or germanic

               OR substring(@original, 0, 4) in ('VAN ','VON ')

               OR substring(@original, 0, 3) = 'SCH'

              BEGIN

               SET @primary = @primary + 'T'

               SET @secondary = @secondary + 'T'

              END

              ELSE

              BEGIN

               SET @primary = @primary + '0'

               SET @secondary = @secondary + 'T'

              END

              SET @current = @current + 2

             END

             ELSE

             BEGIN

              IF @strnext1 IN ('T','D')

              BEGIN

               SET @current = @current + 2

               SET @primary = @primary + 'T'

               SET @secondary = @secondary + 'T'

              END

              ELSE

              BEGIN

               SET @current = @current + 1

               SET @primary = @primary + 'T'

               SET @secondary = @secondary + 'T'

              END

             END

          END

          ELSE

         

          IF @strcur1 = 'V'

           IF (@strnext1 = 'V')

            SET @current = @current + 2

           ELSE

           BEGIN

            SET @current = @current + 1

            SET @primary = @primary + 'F'

            SET @secondary = @secondary + 'F'

           END

          ELSE

         

          IF @strcur1 = 'W'

          BEGIN

           -- can also be IN middle OF word

           IF substring(@original, @current, 2) = 'WR'

           BEGIN

            SET @primary = @primary + 'R'

            SET @secondary = @secondary + 'R'

            SET @current = @current + 2

           END

           ELSE

            IF @current = 1

             AND (@strnext1 IN ('A', 'E', 'I', 'O', 'U', 'Y')

              OR substring(@original, @current, 2) = 'WH'

            &nbsp

            BEGIN

             IF @strnext1 IN ('A', 'E', 'I', 'O', 'U', 'Y') -- Wasserman should match Vasserman

             BEGIN

              SET @primary = @primary + 'A'

              SET @secondary = @secondary + 'F'

              SET @current = @current + 1

             END

             ELSE

             BEGIN

              SET @primary = @primary + 'A' -- need Uomo TO match Womo

              SET @secondary = @secondary + 'A'

              SET @current = @current + 1

             END

            END

            ELSE

             IF (@current = @Last -- Arnow should match Arnoff

               AND @strprev1 IN ('A', 'E', 'I', 'O', 'U', 'Y')

             &nbsp

               OR substring(@original, @current - 1, 5) in ('EWSKI','EWSKY','OWSKI','OWSKY')

               OR substring(@original, 0, 3) = 'SCH'

             BEGIN

              SET @secondary = @secondary + 'F' --set @primary = @primary + ''

              SET @current = @current + 1

             END

             ELSE

              IF substring(@original, @current, 4) in ('WICZ','WITZ') -- polish e.g. 'filipowicz'

              BEGIN

               SET @primary = @primary + 'TS'

               SET @secondary = @secondary + 'FX'

               SET @current = @current + 4

              END

              ELSE  

               SET @current = @current + 1 -- ELSE skip it

          END

          ELSE

         

          IF @strcur1 = 'X'

          BEGIN

           IF NOT (@current = @Last -- french e.g. breaux

            AND (substring(@original, @current - 3, 3) in ('IAU', 'EAU')

              OR substring(@original, @current - 2, 2) in ('AU', 'OU')

           &nbsp

          &nbsp

           BEGIN

            SET @primary = @primary + 'KS'

            SET @secondary = @secondary + 'KS'

           END --else skip it

           

           IF @strnext1 IN ('C','X')

            SET @current = @current + 2

           ELSE

            SET @current = @current + 1

          END

          ELSE

         

          IF @strcur1 = 'Z'

          BEGIN

           IF (@strnext1 = 'Z')

           BEGIN

            SET @primary = @primary + 'S'

            SET @secondary = @secondary + 'S'

            SET @current = @current + 2

           END

           ELSE

           BEGIN

            IF (@strnext1 = 'H') -- chinese pinyin e.g. 'zhao'

            BEGIN

             SET @primary = @primary + 'J'

             SET @secondary = @secondary + 'J'

             SET @current = @current + 2

            END

            ELSE

            BEGIN

             IF (substring(@original, @current + 1, 2) in ('ZO', 'ZI', 'ZA'))

               OR (@SlavoGermanic = 1

                AND (@current > 1

                 AND @strprev1 <> 'T'

               &nbsp

              &nbsp

             BEGIN

              SET @primary = @primary + 'S'

              SET @secondary = @secondary + 'TS'

             END

             ELSE

             BEGIN

              SET @primary = @primary + 'S'

              SET @secondary = @secondary + 'S'

             END

            END

            SET @current = @current + 1

           END

          END

          ELSE

           SET @current = @current + 1

         END

         RETURN cast(@primary as char(10)) + cast(@secondary as char(10))

    end

    So long, and thanks for all the fish,

    Russell Shilling, MCDBA, MCSA 2K3, MCSE 2K3

  • Thank you for the info!  I did find this code in multiple places, but no instructions or direction with what to do with it.  I'll keep digging, but I appreciate the feedback!

  • I don't think you'll find a tutorial that will meet your exact needs.  Are you hoping to find something you can cut-n-paste that will de-dupe your DB?

    Perhaps you should contract with someone to do this for you?

    So long, and thanks for all the fish,

    Russell Shilling, MCDBA, MCSA 2K3, MCSE 2K3

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply