Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Accepting fuzzy lookup changes into actual dataset Expand / Collapse
Author
Message
Posted Monday, December 21, 2009 7:40 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, March 24, 2011 6:24 PM
Points: 20, Visits: 62
Hi,

I have 802K records in a dataset and a reference table. I have two fields: schoolcode and schoolname. I have a reference table with schoolcode and schoolname in it with some other data I need in my original table.

The trouble is, they just don't match up exactly in all cases. Some of the differences are predictable while others aren't.

I ran fuzzy lookup and sent the results to another table to look at how they match up.

There doesn't appear to be a similarity threshold that would automatically just get me the dataset with the automatic matches.

I created a view with all of the distinct code and name matches. I'd like to add a column such as 'isAccpeted' or something like that. This way, I can look over each distinct match and put a 1 for the ones I want and a 0 for the one's I don't.

What I'm not sure about, however, is how to send those changes BACK to the original table. Because my intermediate view only shows distinct values, I would only be marking each match, not each row.

What kind of thing would I need to do to send those changes back to the original table? I've thought of perhaps using the Merge task or maybe the Join task. Any ideas?

Thanks,

Ron
Post #837696
Posted Monday, March 1, 2010 11:06 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, August 14, 2012 1:10 PM
Points: 15, Visits: 113
I worked on some of the large scale fuzzy matching projects. Based on my experience you are in the right track.

You already have a table to which you load your fuzzy results. Add a derived column named IsMatchFlg and default the value to 0. Once you finish loading the data into the table run a SQL script to flag you aceptable record as IsMatchFlg=1.
This flag should be applied to 1 record per batch which is the record having maximum similarity score.

Then a select from this table where IsMatchFlg=1 will give your matching records.

I guess this helped. Let me know
Post #874865
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse