|
|
|
Say Hey Kid
      
Group: General Forum Members
Last Login: 2 days ago @ 12:28 PM
Points: 675,
Visits: 2,031
|
|
Adam Sottosanti (2/3/2011)
Agreed. I try to avoid the cartesian joins during matching as much as I can, and generally require certain demographics to match (inner join on DOB + Gender instead of cross join) to limit the result set, then also limit the amount I run through at a time. I may miss a handful of records with this approach, but they are easy to identify, and since the matching is fuzzy to begin with, you are going to need a clean up process anyway.
Have you considered building a (large) precomputed Jaro-Winkler lookup table, and indexing it? Perhaps run your precise matching first, and once you're at the Jaro-Winkler stage, run against the lookup table first, and then work on those values that weren't in the table (and add them to the table)?
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Friday, May 17, 2013 10:40 AM
Points: 128,
Visits: 313
|
|
Yep, for the most part that is exactly what I'm dong for the fuzzy matching.
Adam Sottosanti
|
|
|
|