Accepting fuzzy lookup changes into actual dataset

  • Hi,

    I have 802K records in a dataset and a reference table. I have two fields: schoolcode and schoolname. I have a reference table with schoolcode and schoolname in it with some other data I need in my original table.

    The trouble is, they just don't match up exactly in all cases. Some of the differences are predictable while others aren't.

    I ran fuzzy lookup and sent the results to another table to look at how they match up.

    There doesn't appear to be a similarity threshold that would automatically just get me the dataset with the automatic matches.

    I created a view with all of the distinct code and name matches. I'd like to add a column such as 'isAccpeted' or something like that. This way, I can look over each distinct match and put a 1 for the ones I want and a 0 for the one's I don't.

    What I'm not sure about, however, is how to send those changes BACK to the original table. Because my intermediate view only shows distinct values, I would only be marking each match, not each row.

    What kind of thing would I need to do to send those changes back to the original table? I've thought of perhaps using the Merge task or maybe the Join task. Any ideas?

    Thanks,

    Ron

  • I worked on some of the large scale fuzzy matching projects. Based on my experience you are in the right track.

    You already have a table to which you load your fuzzy results. Add a derived column named IsMatchFlg and default the value to 0. Once you finish loading the data into the table run a SQL script to flag you aceptable record as IsMatchFlg=1.

    This flag should be applied to 1 record per batch which is the record having maximum similarity score.

    Then a select from this table where IsMatchFlg=1 will give your matching records.

    I guess this helped. Let me know

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply