• That's it, just copy the new DLLs into the MSSQL\BINN directory (may require you to stop and restart MSSQLServer service).

    That's a very good point.  The handling of punctuation is implementation-specific.  There's really no 'standards body' or organization that sets specific rules for implementation of the various phonetic algorithms.  (This is really apparent with newer algorithms, like Double Metaphone!)  It appears that the MS implementation of soundex stops encoding when it hits a punctuation mark.  You'll get 'O000' if you try to encode SOUNDEX('O.BRIEN'), SOUNDEX('O''BRIEN') or SOUNDEX ('O BRIEN').

    The NYSIIS implementation in the toolkit, on the other hand, ignores punctuation altogether.  One way to deal with it on the SOUNDEX side is to eliminate all non-alphabetic characters from your string before attempting to encode it with SOUNDEX.  This could be done in a UDF, but keep in mind that string manipulations and looping in T-SQL are not extremely efficient (that's one of the main reasons I implemented NYSIIS and Levenshtein Edit Distance as XP's in C++).

    I would recommend that you create a separate table with the proper SOUNDEX encoding and the NYSIIS encoding, and relate it to your main table.  That way you would only have to perform these string manipulations once; also, you'll be able to take advantage of proper indexes on the Encodings table in your queries.  All of the encoding overhead would be handled at load time, and your queries would run a lot more efficiently.  The down-side is that you'll be using up more space to store these encodings, but it shouldn't be significantly more.  For 1 million rows, you're looking at about ~10 MB to store the Soundex encodings and six-byte NYSIIS encodings.  Proper non-clustered indexes will take up additional space, of course.

    You could cut down the amount of space used in the Encodings table by using the actual name as a foreign key, to avoid duplicate entries for "JOHNSON, JAMES" and other common names (example).

    Alternatively, since NYSIIS encodes the first letter as itself (as does SOUNDEX), you could narrow down your searches by just comparing the first letter.

    SELECT * FROM People

    WHERE Name LIKE 'J%'

    AND dbo.udf_NYSIIS('JOHNSON') = dbo.udf_NYSIIS(Name)

    Thanks!