|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Thursday, May 16, 2013 9:48 PM
Points: 42,
Visits: 272
|
|
|
|
|
|
Old Hand
      
Group: General Forum Members
Last Login: Thursday, May 16, 2013 4:05 AM
Points: 318,
Visits: 640
|
|
A very interesting article, I'm looking forward to the follow ups as we're heading in to a fairly big matching job (merging 1/4 million education records to 100k social care records for our county). A couple of questions (which you might be covering in the future). Does fuzzy matching work in the same way as fuzzy grouping? I had assumed that there was some blocking going on in the background on any exact matches defined but the tests in the article imlies not. Also is there a way of getting the blocking to work in SSIS with a much larger number of blocks e.g. one for every date of birth or post code? We could probably run the fuzzy matching in a loop but that would loose the advantage of the blocks running in parallel.
Thanks Barney
|
|
|
|
|
Old Hand
      
Group: General Forum Members
Last Login: 2 days ago @ 10:05 AM
Points: 308,
Visits: 682
|
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Thursday, May 16, 2013 6:11 AM
Points: 73,
Visits: 580
|
|
Excellent article....
Regards,
Basit A. Farooq (MSC Computing, MCITP SQL Server 2005 & 2008, MCDBA SQL Server 2000)
http://basitaalishan.com
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, July 09, 2012 7:19 PM
Points: 3,
Visits: 22
|
|
[Thank you for an article. It is indeed interesting but, in practice a bit simplistic. Matching zip codes, phones are fine as long as it is entered accurately or it is a part of current address of the person. (Assuming that person is being matched). If for example I moved to another locations (address, zip code) then grouping would be negative effort. Sometimes, it is hard to match person by address. You don’t know it is address change or totally different person.
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Friday, January 11, 2013 1:07 PM
Points: 587,
Visits: 131
|
|
Nice article Ira, kinda familiar too. 
JOEL-145858, don't take the content of the article out of context. This is one of MANY methods that you can use in matching, but it doesn't represent a complete matching solution. You would obviously find as many exact matches as possible first, as any kind of fuzzy matching is cost-prohibitive comparatively. Then, using the model Ira outlined, you can perform fuzzy matching against what remains unmatched. And using geographic elements for blocking and fuzzy matching only serves as one example; The same model would fit other elements. Example:
Demographic - Block on First 2 letters of last name, year of birth / fuzzy match on FirstName, LastName, DOB Demo / Geo - Block on FirstName, DOB, State / Fuzzy Match on FirstName, LastName, Address, City, Zip
The whole point of the parallel blocks is to minimize your comparison set, and thereby the number of potential combinations. In matching solutions I have done using this exact method, multiple iterations of this model with different criteria served our matching needs very well.
P.S - the method is also highly scalable if you have the processing power and memory on your SSIS Box.
Joshua T. Lewis
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Thursday, May 16, 2013 9:48 PM
Points: 42,
Visits: 272
|
|
Mr DeusExDatum is exactly "spot on" and I highly recommend that you follow him, he has very deep and extensive experience and I have had the pleasure in working with him.
The reason I focus on methodology first in my article and a specific SSIS technique was to emphasis the entire process of record linkage, obviously you would get all the exact matches out of the mix first and cycle thru the remaing with Fuzzy and utilitize various blocking criteria and even develop multiple scoring.
Mr DeusExDatum has implemented this method in real world solution.
If I am able to do a Part Duex of this topic I will seek Mr DeusExDatum contribution.
Ira Warren Whiteside
|
|
|
|