Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

MetaPhone - Implementing Expand / Collapse
Author
Message
Posted Monday, March 21, 2005 1:48 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Saturday, December 8, 2012 7:21 PM
Points: 234, Visits: 269

I can find many articles on MetaPhone, but nothing that is understandable on how to implement it and use in Query Analyzer

I need to use this type of function to de-dupe my database.  SoundEx is very bad.  Any suggestions or resources you can recommend would be much appreciated!

 

Thanks

Post #169027
Posted Tuesday, March 22, 2005 9:55 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, September 3, 2014 1:06 PM
Points: 24, Visits: 352
first google hit...

http://www.windowsitpro.com/Article/ArticleID/26094/26094.html?Ad=1

alternatively, just wrap Lawrence Phillips C++ version
Post #169238
Posted Tuesday, March 22, 2005 11:44 AM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Today @ 9:10 AM
Points: 761, Visits: 136

I'm not sure how "understandable" this will be, as the algorithm is very complex, but here's an implementation I found on the net some time ago.  (The ) are close parenthesis -- How do you turn off the smilie substitution, or format code?)


declare @ret char(20)
exec @ret = doublemetaphone 'rachael'
print @ret
exec @ret = doublemetaphone 'rachel'
print @ret
exec @ret = doublemetaphone 'richelle'
print @ret

/* Metaphone */

--**************************************
--    
-- Name: dbo.DoubleMetaPhone
-- Description:Creates a better phonetic
--      matching key than Soundex.
-- By: Keith Henry
--
-- Inputs: proper name to generate key
--
-- Returns: char(10) string containing two
--     char(5) keys
-- The first 5 chars are the most common pronouciation, AND the last 5 are the next most common.
--
-- Assumes: This is based off a c++ article that described a better phonetic algorithm than soundex.
-- I re-wrote the algorithm IN t-SQL (the languages are too different to simply translate) so that
-- I could USE it WITH our UK Market File.  Soundex IS OK, but it is OVER 100 years old now AND it
-- misses an awful lot OF names that should match.
--
-- Side Effects: More load than soundex, but still much better.
--
-- This code is copyrighted and has-- limited warranties.Please see http://
--     www.Planet-Source-Code.com/vb/scripts/Sh
--     owCode.asp?txtCodeId=519&lngWId=5--for details.--**************************************
--    

ALTER FUNCTION dbo.DoubleMetaPhone (@str varchar(70))
RETURNS char(20)
AS


    BEGIN
     
     /*#########################################################################
     
     DOUBLE Metaphone Phonetic Matching Function
     
     This reduces word TO approximate phonetic string. This is deliberately
     NOT a direct phonetic
     
     Based OFF original C++ code AND algorithm BY
          Lawrence Philips (lphilips_AT_verity.com)
     
     Published IN the C/C++ Users Journal:
      http://www.cuj.com/articles/2000/0006/0006d/0006d.htm?topic=articles
     
     Original Metaphone presented in article in "Computer Language" in 1990.
     
     Reduces alphabet TO
     
       The 14 constonant sounds:
       "sh""p"or"b" "th"
       | | |
       X S K J T F H L M N P R 0 W
     
       DROP vowels EXCEPT at the beginning
     
     Produces a char(10) string. The left(@result,5) gives the most common
     pronouciation, right(@result,5) gives the commonest alternate.
     
     
     Translated INTO t-SQL BY Keith Henry (keithh@lbm-solutions.com)
     
     #########################################################################*/

     DECLARE @original  varchar(70),
      @primary varchar(70),
      @secondary  varchar(70),
      @length  int,
      @last   int,
      @current int,
      @strcur1 char(1) ,
      @strnext1  char(1) ,
      @strprev1 char(1),
      @SlavoGermanic  bit

     SET @SlavoGermanic = 0
     SET @primary   = ''
     SET @secondary  = ''
     SET @current   = 1
     SET @length  = len(@str)
     SET @last   = @length
     SET @original   = lTrim(isnull(@str,'')) + ' '
     
     SET @original   = upper(@original)

     IF patindex('%[WK]%',@str) + charindex('CZ',@str) + charindex('WITZ',@str) <> 0
      SET @SlavoGermanic = 1

     -- skip this at beginning OF word
     IF substring(@original, 1, 2) in ('GN', 'KN', 'PN', 'WR', 'PS')
       SET @current = @current + 1

     -- Initial 'X' IS pronounced 'Z' e.g. 'Xavier'
     IF substring(@original, 1, 1) = 'X'
     BEGIN
      SET @primary = @primary + 'S'-- 'Z' maps TO 'S'
      SET @secondary = @secondary + 'S'
      SET @current = @current + 1
     END

     IF substring(@original, 1, 1) in ('A', 'E', 'I', 'O', 'U', 'Y')
     BEGIN
      SET @primary = @primary + 'A' -- ALL init vowels now map TO 'A'
      SET @secondary = @secondary + 'A'
      SET @current = @current + 1
     END

     WHILE @current <= @length
     BEGIN
      IF len(@primary) >= 10 BREAK
     
      SET @strcur1 = substring(@original, @current, 1)
      SET @strnext1 = substring(@original, (@current + 1), 1)
      SET @strprev1 = substring(@original, (@current - 1), 1)
      
      IF @strcur1 IN ('A', 'E', 'I', 'O', 'U', 'Y', ' ', '''', '-')
       SET @current = @current + 1
      ELSE
      
      IF @strcur1 = 'B'   -- '-mb', e.g. 'dumb', already skipped OVER...
      BEGIN
       SET @primary = @primary + 'P'
       SET @secondary = @secondary + 'P'
       
       IF @strnext1 = 'B'
        SET @current = @current + 2
       ELSE
        SET @current = @current + 1
      END
      ELSE
     
      IF @strcur1 = 'Ç'
      BEGIN
       SET @primary = @primary + 'S'
       SET @secondary = @secondary + 'S'
       SET @current = @current + 1
      END
      ELSE
     
      IF @strcur1 = 'C'
      BEGIN 
       IF @strnext1 = 'H'
       BEGIN  
      
        IF substring(@original, @current, 4) = 'CHIA' -- italian 'chianti'
        BEGIN
         SET @primary = @primary + 'K'
         SET @secondary = @secondary + 'K'
        END
        ELSE
        BEGIN
         IF @current > 1 -- find 'michael'
          AND substring(@original, @current, 4) = 'CHAE'
         BEGIN
           SET @primary = @primary + 'K'
           SET @secondary = @secondary + 'X'
         END
         ELSE
         BEGIN
          IF @current = 1  -- greek roots e.g. 'chemistry', 'chorus'
           AND (substring(@original, @current + 1, 5) in ('HARAC', 'HARIS')
            OR substring(@original, @current + 1, 3) in ('HOR', 'HYM', 'HIA', 'HEM')
           
           AND substring(@original, 1, 5) <> 'CHORE'
          BEGIN
           SET @primary = @primary + 'K'
           SET @secondary = @secondary + 'K'
          END
          ELSE
          BEGIN
           IF  ( substring(@original, 0, 4) in ('VAN ', 'VON ') -- germanic, greek, or otherwise 'ch' FOR 'kh' sound
             OR substring(@original, 0, 3) = 'SCH'
            
            OR substring(@original, @current - 2, 6) in ('ORCHES', 'ARCHIT', 'ORCHID') -- 'architect' but NOT 'arch', orchestra', 'orchid'
            OR substring(@original, @current + 2, 1) in ('T', 'S')
            OR  ( ( @strprev1 IN ('A','O','U','E')
               OR @current = 1
              
             AND substring(@original, @current + 2, 1) in ('L','R','N','M','B','H','F','V','W',' ') -- e.g. 'wachtler', 'weschsler', but NOT 'tichner'
            
           BEGIN
            SET @primary = @primary + 'K'
            SET @secondary = @secondary + 'K'
           END
           ELSE
           BEGIN
            IF (@current > 1)
            BEGIN
             IF substring(@original, 1, 2) = 'MC' -- e.g. 'McHugh'
             BEGIN
              SET @primary = @primary + 'K'
              SET @secondary = @secondary + 'K'
             END
             ELSE
             BEGIN
              SET @primary = @primary + 'X'
              SET @secondary = @secondary + 'K'
             END
            END
            ELSE
            BEGIN
             SET @primary = @primary + 'X'
             SET @secondary = @secondary + 'X'
            END
           END
          END
         END
        END
        SET @current = @current + 2
       END --ch logic
       ELSE
       BEGIN
        IF @strnext1 = 'C' -- DOUBLE 'C', but NOT McClellan'
         AND not(@current = 1
           AND substring(@original, 1, 1) = 'M'
          
        BEGIN
         IF substring(@original, @current + 2, 1) in ('I','E','H') -- 'bellocchio' but NOT 'bacchus'
          AND substring(@original, @current + 2, 2) <> 'HU'
         BEGIN
          IF ( @current = 2 -- 'accident', 'accede', 'succeed'
            AND @strprev1 = 'A'
           
           OR substring(@original, @current - 1, 5) in ('UCCEE', 'UCCES')
          BEGIN
           SET @primary = @primary + 'KS'
           SET @secondary = @secondary + 'KS'
          END
          ELSE
          BEGIN -- 'bacci', 'bertucci', other italian
           SET @primary = @primary + 'X'
           SET @secondary = @secondary + 'X'
           -- e.g. 'focaccia' IF substring(@original, @current, 4) = 'CCIA' 
          END
          SET @current = @current + 3
         END
         ELSE
         BEGIN
          SET @primary = @primary + 'K' -- Pierce's RULE
          SET @secondary = @secondary + 'K'
          SET @current = @current + 2
         END
        END
        ELSE
        BEGIN
         IF @strnext1 IN ('K','G','Q')
         BEGIN
          SET @primary = @primary + 'K'
          SET @secondary = @secondary + 'K'
          SET @current = @current + 2
         END
         ELSE
         BEGIN
          IF @strnext1 IN ('I','E','Y')
          BEGIN
           IF substring(@original, @current, 3) in ('CIO','CIE','CIA') -- italian vs. english
           BEGIN
            SET @primary = @primary + 'S'
            SET @secondary = @secondary + 'X'
           END
           ELSE
           BEGIN
            SET @primary = @primary + 'S'
            SET @secondary = @secondary + 'S'
           END
           SET @current = @current + 2
          END
          ELSE
          BEGIN
           IF @strnext1 = 'Z' -- e.g. 'czerny'
            AND substring(@original, @current -2, 4) <> 'WICZ'
           BEGIN
            SET @primary = @primary + 'S'
            SET @secondary = @secondary + 'X'
            SET @current = @current + 2
           END
           ELSE
           BEGIN
            IF @current > 2 -- various gremanic
             AND substring(@original, @current - 2,1) NOT in ('A', 'E', 'I', 'O', 'U', 'Y')
             AND substring(@original, @current - 1, 3) = 'ACH'
             AND ((substring(@original, @current + 2, 1) <> 'I')
              AND ((substring(@original, @current + 2, 1) <> 'E')
               OR substring(@original, @current - 2, 6) in ('BACHER', 'MACHER')
              
             
            BEGIN
             SET @primary = @primary + 'K'
             SET @secondary = @secondary + 'K'
             SET @current = @current + 2
            END
            ELSE
            BEGIN
             IF @current = 1 -- special CASE 'caesar'
              AND substring(@original, @current, 6) = 'CAESAR'
             
             BEGIN
              SET @primary = @primary + 'S'
              SET @secondary = @secondary + 'S'
              SET @current = @current + 2
             END
             ELSE
             BEGIN -- final ELSE
              SET @primary = @primary + 'K'
              SET @secondary = @secondary + 'K'
             
              IF substring(@original, @current + 1, 2) in (' C',' Q',' G') -- name sent in 'mac caffrey', 'mac gregor'
               SET @current = @current + 3
              ELSE
                SET @current = @current + 1
             END
            END
           END
          END
         END
        END
       END
      END
      ELSE
     
      IF @strcur1 = 'D'
      BEGIN
       IF @strnext1 = 'G'
       BEGIN
        IF substring(@original, @current + 2, 1) in ('I','E','Y')
        BEGIN
         SET @primary = @primary + 'J' -- e.g. 'edge'
         SET @secondary = @secondary + 'J'
         SET @current = @current + 3
        END
        ELSE
        BEGIN
         SET @primary = @primary + 'TK' -- e.g. 'edgar'
         SET @secondary = @secondary + 'TK'
         SET @current = @current + 2
        END
       END
       ELSE
       BEGIN
        IF substring(@original, @current, 2) in ('DT','DD')
        BEGIN
         SET @primary = @primary + 'T'
         SET @secondary = @secondary + 'T'
         SET @current = @current + 2
        END
        ELSE
        BEGIN
         SET @primary = @primary + 'T'
         SET @secondary = @secondary + 'T'
         SET @current = @current + 1
        END
       END
      END
      ELSE
     
      IF @strcur1 = 'F'
      BEGIN
       SET @primary = @primary + 'F'
       SET @secondary = @secondary + 'F'
       IF (@strnext1 = 'F')
        SET @current = @current + 2
       ELSE
        SET @current = @current + 1
      END
      ELSE
     
      IF @strcur1 = 'G'
      BEGIN
       IF (@strnext1 = 'H')
       BEGIN
        IF @current > 1
         AND @strprev1 NOT IN ('A', 'E', 'I', 'O', 'U', 'Y')
        BEGIN
         SET @primary = @primary + 'K'
         SET @secondary = @secondary + 'K'
        END
        ELSE
        BEGIN
       
         IF  not( (@current > 2 -- Parker's RULE (with SOME further refinements) - e.g. 'hugh'
            AND substring(@original, @current - 2, 1) in ('B','H','D')
            -- e.g. 'bough'
           OR (@current > 3
            AND substring(@original, @current - 3, 1) in ('B','H','D')
            -- e.g. 'broughton'
           OR (@current > 4
            AND substring(@original, @current - 4, 1) in ('B','H')
           
         BEGIN
          IF @current > 3  -- e.g. 'laugh', 'McLaughlin', 'cough', 'gough', 'rough', 'tough'
           AND @strprev1 = 'U'
           AND substring(@original, @current - 3, 1) in ('C','G','L','R','T')
          BEGIN
           SET @primary = @primary + 'F'
           SET @secondary = @secondary + 'F'
          END
          ELSE
          BEGIN
           IF @current > 1
            AND @strprev1 <> 'I'
           BEGIN
            SET @primary = @primary + 'K'
            SET @secondary = @secondary + 'K'
           END
           ELSE
           BEGIN
            IF (@current < 4)
            BEGIN
             IF (@current = 1) -- 'ghislane', 'ghiradelli'
             BEGIN
              IF (substring(@original, @current + 2, 1) = 'I')
              BEGIN
               SET @primary = @primary + 'J'
               SET @secondary = @secondary + 'J'
              END
              ELSE
              BEGIN
               SET @primary = @primary + 'K'
               SET @secondary = @secondary + 'K'
              END
             END
            END
           END
          END
         END
        END
        SET @current = @current + 2
       END
       ELSE
       BEGIN
        IF (@strnext1 = 'N')
        BEGIN
         IF @current = 1
          AND substring(@original, 0,1) in ('A', 'E', 'I', 'O', 'U', 'Y')
          AND @SlavoGermanic = 0
         BEGIN
          SET @primary = @primary + 'KN'
          SET @secondary = @secondary + 'N'
         END
         ELSE
         BEGIN
          -- NOT e.g. 'cagney'
          IF substring(@original, @current + 2, 2) = 'EY'
           AND (@strnext1 <> 'Y')
           AND @SlavoGermanic = 0
          BEGIN
           SET @primary = @primary + 'N'
           SET @secondary = @secondary + 'KN'
          END
          ELSE
          BEGIN
           SET @primary = @primary + 'KN'
           SET @secondary = @secondary + 'KN'
          END
         END
         SET @current = @current + 2
        END
        ELSE
        BEGIN
         IF substring(@original, @current + 1, 2) = 'LI' -- 'tagliaro'
          AND @SlavoGermanic = 0
         BEGIN
          SET @primary = @primary + 'KL'
          SET @secondary = @secondary + 'L'
          SET @current = @current + 2
         END
         ELSE
         BEGIN
          IF @current = 1  -- -ges-, -gep-, -gel- at beginning
           AND (@strnext1 = 'Y'
            OR substring(@original, @current + 1, 2) in ('ES','EP','EB','EL','EY','IB','IL','IN','IE', 'EI','ER')
           
          BEGIN
           SET @primary = @primary + 'K'
           SET @secondary = @secondary + 'J'
           SET @current = @current + 2
          END
          ELSE
          BEGIN
           IF (substring(@original, @current + 1, 2) = 'ER' -- -ger-, -gy-
            OR @strnext1 = 'Y'
            
             AND substring(@original, 1, 6) NOT in ('DANGER','RANGER','MANGER')
             AND @strprev1 NOT IN ('E', 'I')
             AND substring(@original, @current - 1, 3) NOT in ('RGY','OGY')
           BEGIN
            SET @primary = @primary + 'K'
            SET @secondary = @secondary + 'J'
            SET @current = @current + 2
           END
           ELSE
           BEGIN
            IF @strnext1 IN ('E','I','Y') -- italian e.g. 'biaggi'
             OR substring(@original, @current -1, 4) in ('AGGI','OGGI')
            BEGIN
             IF (substring(@original, 1, 4) in ('VAN ', 'VON ') -- obvious germanic
              OR substring(@original, 1, 3) = 'SCH'
              
              OR substring(@original, @current + 1, 2) = 'ET'
             BEGIN
              SET @primary = @primary + 'K'
              SET @secondary = @secondary + 'K'
             END
             ELSE
             BEGIN
              -- always soft IF french ending
              IF substring(@original, @current + 1, 4) = 'IER '
              BEGIN
               SET @primary = @primary + 'J'
               SET @secondary = @secondary + 'J'
              END
              ELSE
              BEGIN
               SET @primary = @primary + 'J'
               SET @secondary = @secondary + 'K'
              END
             END
             SET @current = @current + 2
            END
            ELSE
            BEGIN -- other options exausted call it k sound
             SET @primary = @primary + 'K'
             SET @secondary = @secondary + 'K'
             IF (@strnext1 = 'G')
              SET @current = @current + 2
             ELSE
              SET @current = @current + 1
            END
           END
          END
         END
        END
       END
      END
      ELSE
     
      IF @strcur1 = 'H'
      BEGIN
       IF (@current = 1  -- ONLY keep if first & before vowel or btw. 2 vowels
         OR @strprev1 IN ('A', 'E', 'I', 'O', 'U', 'Y')
        
        AND @strnext1 IN ('A', 'E', 'I', 'O', 'U', 'Y')
       BEGIN
        SET @primary = @primary + 'H'
        SET @secondary = @secondary + 'H'
        SET @current = @current + 2
       END
       ELSE
        SET @current = @current + 1
      END
      ELSE
     
      IF @strcur1 = 'J'
      BEGIN
       IF substring(@original, @current, 4) = 'JOSE' -- obvious spanish, 'jose', 'san jacinto'
        OR substring(@original, 1, 4) = 'SAN '
       BEGIN
        IF (@current = 1
         AND substring(@original, @current + 4, 1) = ' '
         
         OR substring(@original, 1, 4) = 'SAN '
        BEGIN
         SET @primary = @primary + 'H'
         SET @secondary = @secondary + 'H'
        END
        ELSE
        BEGIN
         SET @primary = @primary + 'J'
         SET @secondary = @secondary + 'H'
        END
      
        SET @current = @current + 1
       END
       ELSE
       BEGIN
        IF @current = 1
        BEGIN
         SET @primary = @primary + 'J' -- Yankelovich/Jankelowicz
         SET @secondary = @secondary + 'A'
         SET @current = @current + 1
        END
        ELSE
        BEGIN
         IF @strprev1 IN ('A', 'E', 'I', 'O', 'U', 'Y') -- spanish pron. OF .e.g. 'bajador'
          AND @SlavoGermanic = 0
          AND @strnext1 IN ('A','O')
         BEGIN
          SET @primary = @primary + 'J'
          SET @secondary = @secondary + 'H'
          SET @current = @current + 1
         END
         ELSE
         BEGIN
          IF (@current = @last)
          BEGIN
           SET @primary = @primary + 'J'
           SET @secondary = @secondary + ''
           SET @current = @current + 1
          END
          ELSE
          BEGIN
           IF @strnext1 IN ('L','T','K','S','N','M','B','Z')
            AND @strprev1 NOT IN ('S','K','L')
           BEGIN
            SET @primary = @primary + 'J'
            SET @secondary = @secondary + 'J'
            SET @current = @current + 1
           END
           ELSE
           BEGIN
            IF (@strnext1 = 'J') -- it could happen
             SET @current = @current + 2
            ELSE
             SET @current = @current + 1
           END
          END
         END 
        END
       END
      END
      ELSE
     
      IF @strcur1 = 'K'
      BEGIN
       SET @primary = @primary + 'K'
       SET @secondary = @secondary + 'K'
     
       IF (@strnext1 = 'K')
        SET @current = @current + 2
       ELSE
        SET @current = @current + 1
      END
      ELSE
     
      IF @strcur1 = 'L'
      BEGIN
       IF (@strnext1 = 'L')
       BEGIN
        IF (@current = (@length - 3) -- spanish e.g. 'cabrillo', 'gallegos'
         AND substring(@original, @current - 1, 4) in ('ILLO','ILLA','ALLE')
         
         OR ((substring(@original, @last - 1, 2) in ('AS','OS')
           OR substring(@original, @last, 1) in ('A','O')
          
          AND substring(@original, @current - 1, 4) = 'ALLE'
         
         SET @primary = @primary + 'L' --Alternate IS silent
        ELSE
        BEGIN
         SET @primary = @primary + 'L'
         SET @secondary = @secondary + 'L'
        END
         SET @current = @current + 2
       END
       ELSE
       BEGIN
        SET @current = @current + 1
        SET @primary = @primary + 'L'
        SET @secondary = @secondary + 'L'
       END
      END
      ELSE
     
      IF @strcur1 = 'M'
      BEGIN
       SET @primary = @primary + 'M'
       SET @secondary = @secondary + 'M'
     
       IF substring(@original, @current - 1, 3) = 'UMB'
         AND (@current + 1 = @last
          OR substring(@original, @current + 2, 2) = 'ER'
          -- 'dumb', 'thumb'
        OR @strnext1 = 'M'
        SET @current = @current + 2
       ELSE
        SET @current = @current + 1
      END
      ELSE
     
      IF @strcur1 IN ('N','Ñ')
      BEGIN
       SET @primary = @primary + 'N'
       SET @secondary = @secondary + 'N'
     
       IF @strnext1 IN ('N','Ñ')
        SET @current = @current + 2
       ELSE
        SET @current = @current + 1
      END
      ELSE
     
      IF @strcur1 = 'P'
      BEGIN
       IF (@strnext1 = 'H')
       BEGIN
        SET @current = @current + 2
        SET @primary = @primary + 'F'
        SET @secondary = @secondary + 'F'
       END
       ELSE
       BEGIN
        -- also account FOR 'campbell' AND 'raspberry'
        IF @strnext1 IN ('P','B')
         SET @current = @current + 2
        ELSE
        BEGIN
         SET @current = @current + 1
         SET @primary = @primary + 'P'
         SET @secondary = @secondary + 'P'
        END
       END
      END
      ELSE
     
      IF @strcur1 = 'Q'
      BEGIN
       SET @primary = @primary + 'K'
       SET @secondary = @secondary + 'K'
       
       IF (@strnext1 = 'Q')
        SET @current = @current + 2
       ELSE
        SET @current = @current + 1
      END
      ELSE
     
      IF @strcur1 = 'R'
      BEGIN
       IF @current = @last -- french e.g. 'rogier', but exclude 'hochmeier'
        AND @SlavoGermanic = 0
        AND substring(@original, @current - 2, 2) = 'IE'
        AND substring(@original, @current - 4, 2) NOT in ('ME','MA')
        SET @secondary = @secondary + 'R' --set @primary = @primary + ''
       ELSE
       BEGIN
        SET @primary = @primary + 'R'
        SET @secondary = @secondary + 'R'
       END
     
       IF (@strnext1 = 'R')
       BEGIN
        IF substring(@original, @current, 3) = 'RRI' --alternate Kerrigan, Corrigan
         SET @secondary = @secondary + 'R'
        
        SET @current = @current + 2
       END
       ELSE
        SET @current = @current + 1
      END
      ELSE
     
      IF @strcur1 = 'S'
      BEGIN
       IF substring(@original, @current - 1, 3) in ('ISL','YSL') -- special cases 'island', 'isle', 'carlisle', 'carlysle'
        SET @current = @current + 1 --silent s
       ELSE
       BEGIN
        IF substring(@original, @current, 2) = 'SH'
        BEGIN
         -- germanic
         IF substring(@original, @current + 1, 4) in ('HEIM','HOEK','HOLM','HOLZ')
         BEGIN
          SET @primary = @primary + 'S'
          SET @secondary = @secondary + 'S'
         END
         ELSE
         BEGIN
          SET @primary = @primary + 'X'
          SET @secondary = @secondary + 'X'
         END
       
         SET @current = @current + 2
        END
        ELSE
        BEGIN
      
        
       
        
         -- italian & armenian
         IF substring(@original, @current, 3) in ('SIO','SIA')
          OR substring(@original, @current, 4) in ('SIAN')
         BEGIN
          IF @SlavoGermanic = 0
          BEGIN
           SET @primary = @primary + 'S'
           SET @secondary = @secondary + 'X'
          END
          ELSE
          BEGIN
           SET @primary = @primary + 'S'
           SET @secondary = @secondary + 'S'
          END
        
          SET @current = @current + 3
         END
         ELSE
         BEGIN
          IF (@current = 1     -- german & anglicisations, e.g. 'smith' match 'schmidt', 'snider' match 'schneider'
            AND @strnext1 IN ('M','N','L','W') -- also, -sz- in slavic language altho in hungarian it IS pronounced 's'
           
           OR @strnext1 = 'Z'
          BEGIN
           SET @primary = @primary + 'S'
           SET @secondary = @secondary + 'X'
       
           IF @strnext1 = 'Z'
            SET @current = @current + 2
           ELSE
            SET @current = @current + 1
          END
          ELSE
          BEGIN
           IF substring(@original, @current, 2) = 'SC'
           BEGIN
            IF substring(@original, @current + 2, 1) = 'H' -- Schlesinger's RULE
            BEGIN
             IF substring(@original, @current + 3, 2) in ('OO','ER','EN','UY','ED','EM') -- dutch origin, e.g. 'school', 'schooner'
             BEGIN
              IF substring(@original, @current + 3, 2) in ('ER','EN') -- 'schermerhorn', 'schenker'
              BEGIN
               SET @primary = @primary + 'X'
               SET @secondary = @secondary + 'SK'
              END
              ELSE
              BEGIN
               SET @primary = @primary + 'SK'
               SET @secondary = @secondary + 'SK'
              END
          
              SET @current = @current + 3
             END
             ELSE
             BEGIN
              IF @current = 1
               AND substring(@original, 3,1) NOT in ('A', 'E', 'I', 'O', 'U', 'Y')
               AND substring(@original, @current + 3, 1) <> 'W'
              BEGIN
               SET @primary = @primary + 'X'
               SET @secondary = @secondary + 'S'
              END
              ELSE
              BEGIN
               SET @primary = @primary + 'X'
               SET @secondary = @secondary + 'X'
              END
              
              SET @current = @current + 3
             END
            END
            ELSE
            BEGIN
             IF substring(@original, @current + 2, 1) in ('I','E','Y')
             BEGIN
              SET @primary = @primary + 'S'
              SET @secondary = @secondary + 'S'
             END
             ELSE
             BEGIN
              SET @primary = @primary + 'SK'
              SET @secondary = @secondary + 'SK'
             END
             SET @current = @current + 3
            END
           END
           ELSE
           BEGIN
            IF @current = 1  -- special CASE 'sugar-'
             AND substring(@original, @current, 5) = 'SUGAR'
            BEGIN
             SET @primary = @primary + 'X'
             SET @secondary = @secondary + 'S'
             SET @current = @current + 1
            END
            ELSE
            BEGIN
             IF @current = @last -- french e.g. 'resnais', 'artois'
              AND substring(@original, @current - 2, 2) in ('AI','OI')
              SET @secondary = @secondary + 'S' --set @primary = @primary + ''
             ELSE
             BEGIN
              SET @primary = @primary + 'S'
              SET @secondary = @secondary + 'S'
             END
             
             IF @strnext1 IN ('S','Z')
              SET @current = @current + 2
             ELSE
              SET @current = @current + 1
            END
           END
          END
         END
        END
       END
      END
      ELSE
     
      IF @strcur1 = 'T'
      BEGIN
       IF substring(@original, @current, 4) = 'TION'
       BEGIN
        SET @primary = @primary + 'X'
        SET @secondary = @secondary + 'X'
        SET @current = @current + 3
       END
       ELSE
        IF substring(@original, @current, 3) in ('TIA','TCH')
        BEGIN
         SET @primary = @primary + 'X'
         SET @secondary = @secondary + 'X'
         SET @current = @current + 3
        END
        ELSE
         IF substring(@original, @current, 2) = 'TH'
          OR substring(@original, @current, 3) = 'TTH'
         BEGIN
          IF substring(@original, @current + 2, 2) in ('OM','AM') -- special CASE 'thomas', 'thames' or germanic
           OR substring(@original, 0, 4) in ('VAN ','VON ')
           OR substring(@original, 0, 3) = 'SCH'
          BEGIN
           SET @primary = @primary + 'T'
           SET @secondary = @secondary + 'T'
          END
          ELSE
          BEGIN
           SET @primary = @primary + '0'
           SET @secondary = @secondary + 'T'
          END
          SET @current = @current + 2
         END
         ELSE
         BEGIN
          IF @strnext1 IN ('T','D')
          BEGIN
           SET @current = @current + 2
           SET @primary = @primary + 'T'
           SET @secondary = @secondary + 'T'
          END
          ELSE
          BEGIN
           SET @current = @current + 1
           SET @primary = @primary + 'T'
           SET @secondary = @secondary + 'T'
          END
         END
      END
      ELSE
     
      IF @strcur1 = 'V'
       IF (@strnext1 = 'V')
        SET @current = @current + 2
       ELSE
       BEGIN
        SET @current = @current + 1
        SET @primary = @primary + 'F'
        SET @secondary = @secondary + 'F'
       END
      ELSE
     
      IF @strcur1 = 'W'
      BEGIN
       -- can also be IN middle OF word
       IF substring(@original, @current, 2) = 'WR'
       BEGIN
        SET @primary = @primary + 'R'
        SET @secondary = @secondary + 'R'
        SET @current = @current + 2
       END
       ELSE
        IF @current = 1
         AND (@strnext1 IN ('A', 'E', 'I', 'O', 'U', 'Y')
          OR substring(@original, @current, 2) = 'WH'
         
        BEGIN
         IF @strnext1 IN ('A', 'E', 'I', 'O', 'U', 'Y') -- Wasserman should match Vasserman
         BEGIN
          SET @primary = @primary + 'A'
          SET @secondary = @secondary + 'F'
          SET @current = @current + 1
         END
         ELSE
         BEGIN
          SET @primary = @primary + 'A' -- need Uomo TO match Womo
          SET @secondary = @secondary + 'A'
          SET @current = @current + 1
         END
        END
        ELSE
         IF (@current = @last -- Arnow should match Arnoff
           AND @strprev1 IN ('A', 'E', 'I', 'O', 'U', 'Y')
          
           OR substring(@original, @current - 1, 5) in ('EWSKI','EWSKY','OWSKI','OWSKY')
           OR substring(@original, 0, 3) = 'SCH'
         BEGIN
          SET @secondary = @secondary + 'F' --set @primary = @primary + ''
          SET @current = @current + 1
         END
         ELSE
          IF substring(@original, @current, 4) in ('WICZ','WITZ') -- polish e.g. 'filipowicz'
          BEGIN
           SET @primary = @primary + 'TS'
           SET @secondary = @secondary + 'FX'
           SET @current = @current + 4
          END
          ELSE  
           SET @current = @current + 1 -- ELSE skip it
      END
      ELSE
     
      IF @strcur1 = 'X'
      BEGIN
       IF NOT (@current = @last -- french e.g. breaux
        AND (substring(@original, @current - 3, 3) in ('IAU', 'EAU')
          OR substring(@original, @current - 2, 2) in ('AU', 'OU')
        
       
       BEGIN
        SET @primary = @primary + 'KS'
        SET @secondary = @secondary + 'KS'
       END --else skip it
       
       IF @strnext1 IN ('C','X')
        SET @current = @current + 2
       ELSE
        SET @current = @current + 1
      END
      ELSE
     
      IF @strcur1 = 'Z'
      BEGIN
       IF (@strnext1 = 'Z')
       BEGIN
        SET @primary = @primary + 'S'
        SET @secondary = @secondary + 'S'
        SET @current = @current + 2
       END
       ELSE
       BEGIN
        IF (@strnext1 = 'H') -- chinese pinyin e.g. 'zhao'
        BEGIN
         SET @primary = @primary + 'J'
         SET @secondary = @secondary + 'J'
         SET @current = @current + 2
        END
        ELSE
        BEGIN
         IF (substring(@original, @current + 1, 2) in ('ZO', 'ZI', 'ZA'))
           OR (@SlavoGermanic = 1
            AND (@current > 1
             AND @strprev1 <> 'T'
            
           
         BEGIN
          SET @primary = @primary + 'S'
          SET @secondary = @secondary + 'TS'
         END
         ELSE
         BEGIN
          SET @primary = @primary + 'S'
          SET @secondary = @secondary + 'S'
         END
        END
        SET @current = @current + 1
       END
      END
      ELSE
       SET @current = @current + 1
     END
     RETURN cast(@primary as char(10)) + cast(@secondary as char(10))
end



So long, and thanks for all the fish,

Russell Shilling, MCDBA, MCSA 2K3, MCSE 2K3

Post #169281
Posted Tuesday, March 22, 2005 1:26 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Saturday, December 8, 2012 7:21 PM
Points: 234, Visits: 269
Thank you for the info!  I did find this code in multiple places, but no instructions or direction with what to do with it.  I'll keep digging, but I appreciate the feedback!
Post #169331
Posted Tuesday, March 22, 2005 5:06 PM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Today @ 9:10 AM
Points: 761, Visits: 136

I don't think you'll find a tutorial that will meet your exact needs.  Are you hoping to find something you can cut-n-paste that will de-dupe your DB?

Perhaps you should contract with someone to do this for you?



So long, and thanks for all the fish,

Russell Shilling, MCDBA, MCSA 2K3, MCSE 2K3

Post #169382
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse