• craig.bobchin (8/28/2015)


    Jayanth_Kurup (8/27/2015)


    I think your looking at the problem the wrong way. using t-sql to do fuzzy lookups is kind of tricky since the db is not really suited to doing pattern matches. Sure you can write really complicated code but it still wont be bullet proof. Have to explored using the Fuzzy grouping task in SSIS.

    Check out this video [/url]

    All you might need is to schedule the package to run frequently.

    Jayanth,

    I watched the video, and it does give me some more information. However how would I handle it in my situation where I have multiple columns that I need to match on and not all of them have valid nor consistent data between records?

    Do I need multiple fuzzy grouping transforms or do I do it in one?

    How is the canonical field value determined?

    How do I assign a new key unique to the to the grouping?

    If you are looking to build a dictionary then you could probably do it in one fuzzy grouping task. I havent tried multiple columns but I'll try it out and post something on the blog.

    The canonical field value is the output of the Fuzzy Grouping task and is determined on 2 criteria. Similarity and Confidence. Similarity dictates how similar two words are for example

    James Smith and James Smiht will have a high similarity ( you can set this values as shown in the video) . Confidence is where SQL tells you how confident it feels that the two words are related e.g Road and Rd or Saint and St. this is not controlled by the user.

    Once you do fuzzy grouping , you can use Fuzzy lookup to perform the mathc on the columns you need.

    Jayanth Kurup[/url]