SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Fuzzy Lookup???


Fuzzy Lookup???

Author
Message
JarrodR
JarrodR
Grasshopper
Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)

Group: General Forum Members
Points: 14 Visits: 73
Hi all,

I have 2 tables, a Persons table (alot of records, where the surnames could be enter incorrectly due to human error) and a Surname table (has 10 records with correct surnames that I am looking for in the Persons table).

How do I use the Fuzzy lookup to lookup on surnames that look the same.

Eg. Petersen could look like peteiren, peterson, pterson.

NB. There could be more one of the same surnames that appear in the Persons table.
albertarun
albertarun
SSC Rookie
SSC Rookie (27 reputation)SSC Rookie (27 reputation)SSC Rookie (27 reputation)SSC Rookie (27 reputation)SSC Rookie (27 reputation)SSC Rookie (27 reputation)SSC Rookie (27 reputation)SSC Rookie (27 reputation)

Group: General Forum Members
Points: 27 Visits: 121
For something like names, i will start by using TSQL Soundex.

http://www.devx.com/enterprise/Article/43757

Fuzzy Matching Process:

If it didnt help lets try Fuzzy Matching. You can start by getting yourself friendly with the data.
Developing a successful fuzzy matching process is a very custom development process. Its not exact science, but an art.

Instead of fuzzy matching blindly on surnames, start looking for fields that sould be used for exact match.
Like age\DOB, zip code.

Keep those fields as an exact match and start fuzzy matching on surname. Adding more fields for fuzzy matching will improve you possibility to arrive at a better match. Please make the maximum number of possibility to 100 which seems to be the higher limit in the advanced property. Pick the record with the highest similarity score as you matching reference data.

If you can identify patterns in the data like repeated mistakes, you can start massaging your source data to arrive at a better match.

Improving the fuzzy matching process cannot be done in one cycle. It will take numerous iterations until you feel comfortable with the result. Once you reach the saturation, you have to meet with the users and let them know that you are expecting 2%(which ever is the analysis on a sample set telling you) error rate in the data. If they cant accept that and if you have reached the saturation point of fuzzy logic then , its time to develop a Exception UI and design a more complex fuzzy logic to push records with unacceptable similarity score to human interpretation.

If
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search