• Hi Robert, Duray and Shewmaker -

    I resolved the issues and have sent the updated files to the good guys here at SQLServerCentral.  Here's the fixes:

    -NYSIIS multi-character encoding (i.e., "vowel + H", "vowel + W", "SCH", "EV", "PH", etc.) was modified to resolve the issues found.  'JOSEPH' now encodes as 'JASAF'. 'SCHUMAKER', 'SHOEMAKER', 'SHEWMAKER', and 'SCHUWMAKER' now all properly encode as 'SANAKAR'.

    -Levenshtein Edit Distance algorithm was revised to provide "symmetric" results.  I also tested the Levenshtein Edit Distance against a couple of versions in other languages to make sure the results are consistent.  dbo.udf_levenshtein('Yacht Sales', 'Yacht Charters International') and dbo.udf_levenshtein('Yacht Charters International', 'Yacht Sales') both return an accurate edit distance calculation of 19.

    Also, a small correction to the article.  In NYSIIS,  'johnson', 'johnsen' encode as 'JANSAN'.  'johansen' and 'johannsen' encode as 'JAHANSAN'.  The way NYSIIS works, if an 'H' is surrounded by vowels, it is kept as an 'H' otherwise it is eliminated from the encoded string.

    The good guys here uploaded the new version, and it can now be downloaded directly from the article.

    Thanks for the feedback guys!