SOUNDEX of course is far from exact - Joe Smith will match Jose Smithers, etc.
The first time the comparison was run, we had a few dozen false positives.
That is why we maintain a table of the 'potential matches', so they can be excluded by our daily job that does the compare. This table also maintains who checked it and when, a column indicated if a match/false positive and any comments.
Once the initial compare is run and the false positives identified, subsequent runs (we run the job daily) only return results when:
1) The SDN is updated
2) We add a new client
The numbers for us in the above two instances, are very small.
This downside also has an upside, in that you know that if the names are even close, you will get a notification and it provides a great audit trail for the SEC.
This is also why we could not use any of the existing services. They have no methodology to know that you have already checked Joe Smith against that specific SDN entry for Jose Smithers and it is NOT a match. All existing products/web services are new searches/compares each time. By maintaining our own results/compares, we can run the comparison against our entire client list each time there are changes to the SDN and for each new client. Our auditors love it.