Match firstname lastname

  • Hi,

    Could anyone help me with T-SQL to match names. I have to match names from one database to bank account details entered by clients in another. The problem is that clients enter full name, initials, fistname last and lastname first, spouses names etc. for account details. Huge dilemma!

    Thanks in advance,

    diesel

  • Instead of trying to compare single column values to multivalue columns, look at using the like operator to search through the multi-value columns looking for instances of the name you are searching. If you post your schema, we can show you the T-SQL.

  • I agree that seeing the schema will up. The logic will might be a bit complex, may have to have more than 1 temp table, but again without the schema (from both databases), it's a hard call.

  • Maybe more of the business rules aswell, how do you determine if Joe Doe is really Joe Doe in the other database, both maybe married to a Sue.

  • How are you going to cater for Rob, Bob, Robert, Robin, Bert. They could all be the same person or they could be all different.

    Speaking as someone who has done more name and address matching than I've had hot dinners this is not something you should do on financially sensitive data such as bank accounts.

    The best you could hope for would be and exact name and address match, but that is likely to be about 40% of cases if you are lucky.

  • All valid points. I have done a link to the QAS package we use, (ACE in the USA) which narrows it down. I guess that considering it's not the brightest of requests I have had from Customer Services I will send the extract to them to wade through. Revenge is sweet.

    Thanks for the help!

  • I would use Fuzzy Matching in SSIS. Throw everything to temp table. Remove your direct matches and then fuzzy matching the rest. Sorry can't give more details, but I did this in a project once when we were trying to correlate various school lists across government agencies. A joy that was.

  • I don't think the OP will notice because he hasn't logged in since 2009....

  • True, but others reading through a forum might be looking for ideas on how to tackle problems like this.

  • emoore 99634 (9/24/2012)


    True, but others reading through a forum might be looking for ideas on how to tackle problems like this.

    +1

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/24/2012)


    emoore 99634 (9/24/2012)


    True, but others reading through a forum might be looking for ideas on how to tackle problems like this.

    +1

    +2

    So I will chime in that I too have been handed this same problem many times over the years and it's a nightmare! The permutations of names that can be entered into a single string value is almost impossible to deal with. (The same issue can occur with addresses...oh my!) Any specific solution would be dependent on the business requirements.

    The only permanent solution I've found is to parse the name-string into the proper individual columns for prefix, first name, middle name, last name, suffix etc. But this is hard to do for compound names such as 'Van Dyke' or a suffix like 'DDS MD'. For me its always been a laborious iterative process to get the names right. If new data is constantly coming in then this may be difficult to keep up with.

    Otherwise, one way to deal with searching or filtering on such bastard data (besides the simple LIKE statement) is to set up a Full-Text index on the column and develop the search criteria to use that. This will likely create a lot of false positives and for any large amount of data the best one can usually do is to offer up a list of the closest matches and then let a user pick from the list. When dealing with fuzzy data humans can still do some things much more reliably than our computers.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply