Comparing Strings

  • We need to compare client names in one database to names in another database, identifying those that may be the same. The purpose is not to find dupes - one db is the US Treasury OFAC SDN list, the other is our clients. A simple string compare isn't sufficient.

    I have read numerous articles on this site and others and experimented with SOUNDEX/DIFFERENCE as well as double-metaphone, NYsiis and Levenshtein.

    I am currently using a method that is combination of SOUNDEX and double-metaphone and while it is much better than a straight compare, it cannot handle common nicknames (Jim vs James, Bill vs William, etc).  I was recently sent information on Netrics and it is a interesting technology that can apparently do what I would like, but the licensing cost is prohibitively expensive.

    Does anyone know of a solution that can take a first name, last name and perform a compare similar to the above algorithims, but accomodates common nicknames and misspellings?

     

  • This was removed by the editor as SPAM

  • I am not really sure but have you tried the CONTAINS and FREETEXT functions. They might be of help.

     

  • Both of those provide simple text comparisons which is more restrictive than SOUNDEX/DIFFERENCE or the other algorithms, but do nothing for the nickname issue.

    While we could build a nickname table trying to maintain that with any degree of accuracy isn't feasible.

    Just thought someone else may have faced a similar issue.

     

  • Hi,

    I am also working on the OFAC compliance queries. I am trying to use "CONTAINS" that is available in full text search. you can change the thesaurus file and in the thesaurus file you can have something like this

    <expansion>

    <sub>Jim</sub>

    <sub>James</sub>

    </expansion>

    <expansion>

    <sub>Bill</sub>

    <sub>Williams</sub>

    </expansion>

    This says that Jim & James are interchangeable. so if we search for "Jim" and we have a record for "James" it returns that record.

    Can you please tell me how you used the soundex for OFAC compliance? probably a query example would be better.

    Thanks,

    sridhar.

  • I know this might not be the answer you are looking for but if you are using SQL Server 2005, you should try fuzzy lookup transformation in SSIS. The fuzzy lookup transformation is used to compare one set of values to a reference/original set.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply