Joining names where names match or don't match

  • Ok, so we have two environments which store employee data. Majority of the records match, but quite a few do not. Whether they do not reside in one system or the other (they'll always reside in tbl2, but sometimes not tbl1, and this is ok), or they are simply named differently (ie. Ben vs. Benjamin, Tom vs. Tomas, etc. etc.). This is the problem we're trying to resolve. tbl2 is the master set of data and we want tbl1 to reflect tbl2.

    So I'm having a hard time figuring out how to do my join ... If there is a one for one match on first name + last name, return it. If either side does not have one or the other, join either on first or last name.

    Make sense?

    I think my query is almost right, but even when a record matches, I return every possibility for each name on both sides.

    For example:

    Table 1 Table 2

    Adam B Adam B

    Adam B Adam C

    Adam B Adam D

    Adam C Adam B

    Adam C Adam C

    Adam C Adam D

    Adam D Adam B

    Adam D Adam C

    Adam D Adam D

    If you're still with me, take a look at my query and see if you can tell me what I'm doing wrong on my join. The CTE is nothing more than parsing out the name as it's stored in tbl1 only as a full name, so I had to split it on a space. Just made it easier for joining.

    Thanks!

    (replaced all db's/tables/etc. with generic names)

    ;WITH tbl1cte ([tbl1_FullName], [tbl1_FirstName], [tbl1_LastName])

    AS

    (

    SELECT

    [FULLNAME]AS [tbl1_FullName]

    ,SUBSTRING([FULLNAME], 1, NULLIF(CHARINDEX(' ', [FULLNAME]) - 1, -1))AS [tbl1_FirstName]

    ,SUBSTRING([FULLNAME], CHARINDEX(' ', [FULLNAME]) + 1, LEN([FULLNAME]))AS [tbl1_LastName]

    FROM db1..tbl1

    )

    SELECT

    a.[tbl1_FullName]

    ,a.[tbl1_FirstName]

    ,a.[tbl1_LastName]

    ,b.[tbl2_FullName]

    ,b.[tbl2_FirstName]

    ,b.[tbl2_LastName]

    FROM tbl1cte a

    INNER JOIN (

    SELECT

    [Firstname]AS [tbl2_FirstName]

    ,[LastName]AS [tbl2_LastName]

    ,[Firstname] + ' ' + [LastName] AS [tbl2_FullName]

    FROM OPENDATASOURCE

    ('SQLNCLI','Data Source=server;Integrated Security=SSPI')

    .db2..table2

    WHERE [CurrentRecord] = 'Yes'

    AND [EmployeeStatus] = 'A'

    ) b

    ON

    (

    (

    a.[tbl1_FirstName] = b.[tbl2_FirstName] AND a.[tbl1_LastName] = b.[tbl2_LastName]

    )

    OR

    (

    a.[tbl1_FirstName] = b.[tbl2_FirstName]

    )

    OR

    (

    a.[tbl1_LastName] = b.[tbl2_LastName]

    )

    )

    ORDER BY 1

  • To get the results that you want, you can't have the "full match" join argument with the "half match" arguments, because every "full match" is also 2 "half matches". Make sense?

    The quickest way to write this (albeit, possibly not the best, but the alternatives are far more complex and I am suffering from endofthedayitis), is to have two selects with different search arguments.

    To get the "full match":

    a.[tbl1_FirstName] = b.[tbl2_FirstName] AND a.[tbl1_LastName] = b.[tbl2_LastName]

    To get the "half matches":

    (a.[tbl1_FirstName] = b.[tbl2_FirstName] AND a.[tbl1_LastName] != b.[tbl2_LastName])

    or

    (a.[tbl1_FirstName] != b.[tbl2_FirstName] AND a.[tbl1_LastName] = b.[tbl2_LastName])

    Hope this helps...

  • Thanks John, that definitely makes sense and seems to be about the best bet.

    Thanks!

  • Adam, As I said in my email, I think you need two passes at the data but one of which is an Exception process that may need manual intervention. John Smith vs John Simpson are both John S but if you try a partial match, you get duplicates.

    Using SSIS Fuzzy Lookup you can take the partial matches and throw them into an exceptions table (file, etc.) to be examined. Exact matches will fall out as a confidence and score of 1 and you can let them flow through to your destination. You destination should have unique id's (not GUIDs but integers). You can use a conditional split to fork your data into the proper place.

  • Trying it now ... ugh, I hate SSIS ... yet this could be really cool and get some much more accurate results.

  • You hate change my friend. How long did it take you before you used SSMS? LOL

  • haha, point taken ...

    So I played with SSIS and got a Fuzzy Lookup working ... wow, just wow! I am simply amazed at how accurate this is and well it ranks everything. This is perfect!

    Thanks man

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply