Newbie needs help: Am I using Fuzzy lookup correctly??

  • Hi,

    I'm working on a data cleansing task and the first stage is to try to identify duplicates. I'm dealing with people records eg name, address,phone numbers. I have list A which is a "clean" list people (around 10,000) and need to find any other related people in list B (around 500,000). I can match them by exact value but obviously there are variance's in data eg spelling mistakes, formats etc etc.

    I thought I would try to use a fuzzy lookup transformation, using the clean list (A) as the source and the and list (b) as the fuzzy lookup references table and set the "max number of lookups to output per lookup" to a high number.

    Is this the correct use? Will this work?

    Also, if i have a number of scenarios where two records could potentially be a match is its best to handle these separately?

    1 - If the email address is the same for two records and the names are similar then its a match

    2 - If the phone number is the same for two records and the names are similar then its a match

    3 - If the email address is similar for two records and the names are similar then its a match

    I thought I would be better off doing this in one lookup and query the results based on similarity/confidence.

    Thanks in advance, I hope I made some sort of sense!

  • Duplicate post.

    Please reply in the following topic:

    http://www.sqlservercentral.com/Forums/Topic1075326-148-1.aspx

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply