Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««12345

Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL Expand / Collapse
Author
Message
Posted Thursday, February 03, 2011 5:10 PM
Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: 2 days ago @ 12:28 PM
Points: 675, Visits: 2,031
Adam Sottosanti (2/3/2011)


Agreed. I try to avoid the cartesian joins during matching as much as I can, and generally require certain demographics to match (inner join on DOB + Gender instead of cross join) to limit the result set, then also limit the amount I run through at a time. I may miss a handful of records with this approach, but they are easy to identify, and since the matching is fuzzy to begin with, you are going to need a clean up process anyway.



Have you considered building a (large) precomputed Jaro-Winkler lookup table, and indexing it? Perhaps run your precise matching first, and once you're at the Jaro-Winkler stage, run against the lookup table first, and then work on those values that weren't in the table (and add them to the table)?
Post #1058464
Posted Thursday, February 03, 2011 5:19 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, May 17, 2013 10:40 AM
Points: 128, Visits: 313
Yep, for the most part that is exactly what I'm dong for the fuzzy matching.

Adam Sottosanti
Post #1058468
« Prev Topic | Next Topic »

Add to briefcase «««12345

Permissions Expand / Collapse