Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Fuzzy Lookup??? Expand / Collapse
Author
Message
Posted Thursday, February 26, 2009 11:15 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, February 26, 2009 11:06 PM
Points: 14, Visits: 73
Hi all,

I have 2 tables, a Persons table (alot of records, where the surnames could be enter incorrectly due to human error) and a Surname table (has 10 records with correct surnames that I am looking for in the Persons table).

How do I use the Fuzzy lookup to lookup on surnames that look the same.

Eg. Petersen could look like peteiren, peterson, pterson.

NB. There could be more one of the same surnames that appear in the Persons table.
Post #665519
Posted Friday, February 26, 2010 10:23 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, August 14, 2012 1:10 PM
Points: 15, Visits: 113
For something like names, i will start by using TSQL Soundex.

http://www.devx.com/enterprise/Article/43757

Fuzzy Matching Process:

If it didnt help lets try Fuzzy Matching. You can start by getting yourself friendly with the data.
Developing a successful fuzzy matching process is a very custom development process. Its not exact science, but an art.

Instead of fuzzy matching blindly on surnames, start looking for fields that sould be used for exact match.
Like age\DOB, zip code.

Keep those fields as an exact match and start fuzzy matching on surname. Adding more fields for fuzzy matching will improve you possibility to arrive at a better match. Please make the maximum number of possibility to 100 which seems to be the higher limit in the advanced property. Pick the record with the highest similarity score as you matching reference data.

If you can identify patterns in the data like repeated mistakes, you can start massaging your source data to arrive at a better match.

Improving the fuzzy matching process cannot be done in one cycle. It will take numerous iterations until you feel comfortable with the result. Once you reach the saturation, you have to meet with the users and let them know that you are expecting 2%(which ever is the analysis on a sample set telling you) error rate in the data. If they cant accept that and if you have reached the saturation point of fuzzy logic then , its time to develop a Exception UI and design a more complex fuzzy logic to push records with unacceptable similarity score to human interpretation.

If
Post #873865
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse