• Adam Sottosanti (2/3/2011)


    Sweet, thanks Mike! If I ever get some free time, maybe I'll run some side by side compares. I run new records through multiple "rounds" of matching against a lookup table which is at 250K records and growing, using T-SQL, and it keeps getting slower... and slower... and slower...

    I've found (heavily) indexed #temp tables to help significantly with speed on larger datasets. Calculate your double metaphone and cleaned up variants once, and then do Jaro-Winkler based on a WHERE clause (first two characters match, first two characters of double metaphone match, etc.). It's naturally a cartesian [O(n^2)] operation, which does get very slow, very fast, so to speak.