Newbie needs help: Am I using Fuzzy lookup correctly??

  • Hi,

    I'm working on a data cleansing task and the first stage is to try to identify duplicates. I'm dealing with people records eg name, address,phone numbers. I have list A which is a "clean" list people (around 10,000) and need to find any other related people in list B (around 500,000). I can match them by exact value but obviously there are variance's in data eg spelling mistakes, formats etc etc.

    I thought I would try to use a fuzzy lookup transformation, using the clean list (A) as the source and the and list (b) as the fuzzy lookup references table and set the "max number of lookups to output per lookup" to a high number.

    Is this the correct use? Will this work?

    Also, if i have a number of scenarios where two records could potentially be a match is its best to handle these separately?

    1 - If the email address is the same for two records and the names are similar then its a match

    2 - If the phone number is the same for two records and the names are similar then its a match

    3 - If the email address is similar for two records and the names are similar then its a match

    I thought I would be better off doing this in one lookup and query the results based on similarity/confidence.

    Thanks in advance, I hope I made some sort of sense!

  • The fuzzy lookup has not so good performance, so you should do a normal lookup with list B as input and list A as the reference table (since it is much smaller and cleaner). All the exact matches can be handled with this component. Route all the non-matching rows to a fuzzy lookup component with again list A as reference table.

    Regarding the configuration of the fuzzy lookup component, the following resources can be helpful:

    http://msdn.microsoft.com/en-us/library/ms137786.aspx

    http://www.sqlteam.com/article/using-fuzzy-lookup-transformations-in-sql-server-integration-services

    http://msdn.microsoft.com/en-us/library/ms345128(v=sql.90).aspx

    http://www.bimonkey.com/2009/06/the-fuzzy-lookup-transformation/

    http://msdn.microsoft.com/en-us/magazine/cc163731.aspx

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply