Combining data from 2 tables with mostly identical data

Question

Combining data from 2 tables with mostly identical data

SSC Journeyman

Points: 75

November 1, 2017 at 8:02 am

*** Updated with code in Replies***
Super Noob here. So sorry if there is already a thread here for this. I have searched and come up empty so I am hoping my question is unique enough to be new, but not so unique that I cant find a solution.
Problem:
Two(2) tables in the same dB have content which is almost identical less 2 specific fields. Those to fields contain the main identifies for these tables, but they are not the connecting piece to each other.
Example:

Table One Identifier	Table Two Identifier	Last Name	First Name	Middle Name	Mothers Maiden Name	Date of Birth	Gender
'LOL000001234'	'D000000987'	'FILLET'	'PHISH'	'R'	''	'11/12/1901'	'M'
'LOL00003743'	'T00000338'	'CASTELIANO	'TERRY'	'R'	''	'11/12/1934'	'M'
'LOL000001345'	'A00002342'	'CASTIGLIONE'	'FRANCIS'	'R'	''	'02/15/1950'	'M'
'LOL000001987'	'T33323'	'FISH'	'WANDA'	'S'	''	'06/12/1912'	'F'
'LOL0000039457'	'K2324'	'MORE'	'MARY'	'K'	''	'05/24/1952'	'F'
'LOL0000023095'	'A3327'	'JONES'	'JAMES'	'J'	''	'02/02/1979'	'F'
	'AAB2234'	'FILLET'	'PHISH'	'R'	''	'11/12/1901'	'M'
	'AAB1122'	'CASTELIANO	'TERRY'	'R'	''	'11/12/1934'	'M'
	'AAB782785'	'CASTIGLIONE'	'FRANCIS'	'R'	''	'02/15/1950'	'M'
	'AAB4886836'	'FISH'	'WANDA'	'S'	''	'06/12/1912'	'F'
	'AAB123123123'	'MORE'	'MARY'	'K'	''	'05/24/1952'	'F'
	'AAB53786'	'JONES'	'JAMES'	'J'	''	'02/02/1979'	'F'

I am looking to combine these tables matching on items like Last Name,First Name, DOB, etc... AS I stated in the beginning, I am super green withSQL. I have several queries to pull data but nothing as far as combiningmatching data when the Keys are not related.

Intended result:

Table One Identifier	Table Two Identifier	Last Name	First Name	Middle Name	Mothers Maiden Name	Date of Birth	Gender
'LOL000001234'	'D000000987~AAB2234'	'FILLET'	'PHISH'	'R'	''	'11/12/1901'	'M'
'LOL00003743'	'T00000338~AAB1122'	'CASTELIANO	'TERRY'	'R'	''	'11/12/1934'	'M'
'LOL000001345'	'A00002342~AAB782785'	'CASTIGLIONE'	'FRANCIS'	'R'	''	'02/15/1950'	'M'
'LOL000001987'	'T33323~AAB4886836'	'FISH'	'WANDA'	'S'	''	'06/12/1912'	'F'
'LOL0000039457'	'K2324~AAB123123123'	'MORE'	'MARY'	'K'	''	'05/24/1952'	'F'
'LOL0000023095'	'A3327~AAB53786'	'JONES'	'JAMES'	'J'	''	'02/02/1979'	'F'

I believe this should be a simple process, but don't yet know the logicto manipulate the data.

Working in MS SQL Server Management Studio on 2012 SQL Server. I cannot make alterations to the dB tables, I can only manipulate the results through query. Please keep that in mind when responding.

Any assistance or direction to the right material is greatlyappreciated. I am eager to learn.

S. Shane Roth

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99143 More actions · Answer 1

Without DDL and consumable Sample Data, this is untested, and I have assumed that the First, Middle, and Last names will be the same in both copies, but this might get your started:

SELECT YT1.TableOneID,
   YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID,
    YT1.LastName, YT1.FirstName, YT1.MiddleName,
    YT1.MothersName,
    YT1.DoB,
    YT1.Gender
FROM [YourTable] YT1
  JOIN [YourTable] YT2 ON YT1.LastName = YT2.LastName
          AND YT1.FirstName = YT2.FirstName
          AND YT1.MiddleName = YT2.MiddleName
WHERE YT1.TableOneID IS NOT NULL;

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

sroth 71504 SSC Journeyman Points: 75 More actions · Answer 2

Thom A - Wednesday, November 1, 2017 8:20 AM
Without DDL and consumable DDL, this is untested, and I have assumed that the First, Middle, and Last names will be the same in both copies, but this might get your started:
SELECT YT1.TableOneID,
   YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID,
   YT1.LastName, YT1.FirstName, YT1.MiddleName,
   YT1.MothersName,
   YT1.DoB,
   YT1.Gender
FROM [YourTable] YT1
JOIN [YourTable] YT2 ON YT1.LastName = YT2.LastName
          AND YT1.FirstName = YT2.FirstName
         AND YT1.MiddleName = YT2.MiddleName
WHERE YT1.TableOneID IS NOT NULL;

Thanks. I had this down to the Where clause. I left that off and really did make the difference. After some extra manipulation I had to tone the JOIN down to just the First and Last to get results. Appears that both dB's are not as accurate as I was told. GIGO is suppose.

What I am running intonow is that there are multiple ID's in Table2 that are creating additionalrows.
CODE:
SELECT YT1.TableOneID AS TableOneID , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID , YT1.LastName , YT1.FirstName , YT1.MiddleName , YT1.MothersName , YT1.DOB , YT1.Gender

FROM [MyTable]YT1 (NOLOCK)
JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
AND YT1.FirstName = YT2.FirstName

WHERE YT1.TableOneIDIS NOT NULLORDER BY YT1.TableOneID

Resulting in:

Table One Id

Table Two Id

Last Name

First Name

Middle Name

Mothers Maiden Name

Date of Birth

Gender

'LOL000001234'

'D000000987~AAB2234'

'FILLET'

'PHISH'

'R'

''

'11/12/1901'

'M'

'LOL000001234'

'D000000987~AAB56655'

'FILLET'

'PHISH'

'R'

''

'11/12/1901'

'M'

'LOL000001234'

'D000000987~AAB456456'

'FILLET'

'PHISH'

'R'

''

'11/12/1901'

'M'

'LOL000001234'

'D000000987~AAB4564'

'FILLET'

'PHISH'

'R'

''

'11/12/1901'

'M'

'LOL000001234'

'D000000987~AAB86856'

'FILLET'

'PHISH'

'R'

''

'11/12/1901'

'M'

'LOL000001234'

'D000000987~AAB78678'

'FILLET'

'PHISH'

'R'

''

'11/12/1901'

'M'

'LOL000001234'

'D000000987~AAB456458'

'FILLET'

'PHISH'

'R'

''

'11/12/1901'

'M'

'LOL00003743'

'T00000338~AAB1122'

'CASTELIANO

'TERRY'

'R'

''

'11/12/1934'

'M'

'LOL000001345'

'A00002342~AAB782785'

'CASTIGLIONE'

'FRANCIS'

'R'

''

'02/15/1950'

'M'

'LOL000001987'

'T33323~AAB4886836'

'FISH'

'WANDA'

'S'

''

'06/12/1912'

'F'

'LOL0000039457'

'K2324~AAB123123123'

'MORE'

'MARY'

'K'

''

'05/24/1952'

'F'

'LOL0000023095'

'A3327~AAB53786'

'JONES'

'JAMES'

'J'

''

'02/02/1979'

'F'

The question nowbecomes, how to present all the separate YT2.TableTwoID data in a single rowwhen there are difference in them as well.
Desired Result would not be:

Top of Form

Table One Id

Bottom of Form

Table Two Id

Last Name

First Name

Middle Name

Mothers Maiden Name

Date of Birth

Gender

'LOL000001234'

D000000987~AAB2234~AAB56655~AAB456456~AAB4564~AAB86856~AAB78678~AAB456458'

'FILLET'

'PHISH'

'R'

''

'11/12/1901'

'M'

'LOL00003743'

'T00000338~AAB1122'

'CASTELIANO

'TERRY'

'R'

''

'11/12/1934'

'M'

'LOL000001345'

'A00002342~AAB782785'

'CASTIGLIONE'

'FRANCIS'

'R'

''

'02/15/1950'

'M'

'LOL000001987'

'T33323~AAB4886836'

'FISH'

'WANDA'

'S'

''

'06/12/1912'

'F'

'LOL0000039457'

'K2324~AAB123123123'

'MORE'

'MARY'

'K'

''

'05/24/1952'

'F'

'LOL0000023095'

'A3327~AAB53786'

'JONES'

'JAMES'

'J'

''

'02/02/1979'

'F'

Ultimately making a singlerow for each person's Name entry.

Thanks again.

S. Shane Roth

drew.allen SSC Guru Points: 77014 More actions · Answer 3

You don't want to do this - at least not this way. You're trying to convert a denormalized table into a different kind of denormalized table. What you need to do is create two tables: The first table will contain just the name information, and the second table will contain the links from the name table to the table2. The following should give you normalized results (or at least closer to normalized).

/* Distinct Name Information */ SELECT DISTINCT YT1.TableOneID AS TableOneID -- , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID , YT1.LastName , YT1.FirstName , YT1.MiddleName , YT1.MothersName , YT1.DOB , YT1.Gender FROM [MyTable]YT1 (NOLOCK) JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName AND YT1.FirstName = YT2.FirstName

WHERE YT1.TableOneIDIS NOT NULL

ORDER BY YT1.TableOneID

/* Table with the joins */ SELECT YT1.TableOneID AS TableOneID , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID FROM [MyTable]YT1 (NOLOCK) JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName AND YT1.FirstName = YT2.FirstName

WHERE YT1.TableOneIDIS NOT NULL

ORDER BY YT1.TableOneID

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

sroth 71504 SSC Journeyman Points: 75 More actions · Answer 4

drew.allen - Thursday, November 2, 2017 1:51 PM
You don't want to do this - at least not this way. You're trying to convert a denormalized table into a different kind of denormalized table. What you need to do is create two tables: The first table will contain just the name information, and the second table will contain the links from the name table to the table2. The following should give you normalized results (or at least closer to normalized).
/* Distinct Name Information */ SELECT DISTINCT YT1.TableOneID AS TableOneID -- , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID , YT1.LastName , YT1.FirstName , YT1.MiddleName , YT1.MothersName , YT1.DOB , YT1.Gender FROM [MyTable]YT1 (NOLOCK) JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName AND YT1.FirstName = YT2.FirstName
WHERE YT1.TableOneIDIS NOT NULL
ORDER BY YT1.TableOneID
/* Table with the joins */ SELECT YT1.TableOneID AS TableOneID , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID FROM [MyTable]YT1 (NOLOCK) JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName AND YT1.FirstName = YT2.FirstName
WHERE YT1.TableOneIDIS NOT NULL
ORDER BY YT1.TableOneID

While I understand what it is you are suggesting here, this will not work for the work I am doing. Having separate Selects will give the results yes, but not in the needed format as required by my vendor. Ultimately I need these files to become "merged" into a single return row for each named person. The returned results are to be exported via .csv files to a new system with a vendor I am working with to populate a new dB with the duplicates and excess garbage removed and all possible identifiers for a single person merged into a single field. This will provide them a single reference point in their system which can be linked to my historical data. I am looking now into a few sub queries nested to help eliminate the noise I am getting in my returns, and have had some basic success moving forward. The biggest limiting factor I have now is that the normal fields that should be the same have empty locations, or the information is not accurate. e.g. 9999999999 as a SOC in one row on one table, with the actual SOC in another row for the same person on the other table. The only common factors I am seeing consistently are the last and first names. This dB is by no way clean or standardized, and I was hoping someone with years more experience than I would have seen something similar and been able to guide me the right way. Perhaps this all should have been mentioned upfront, hindsight is always 20/20. I am now wondering if perhaps I end this thread and begin a new one with better code as I have it, and a more detailed description of what is needed to be done and why.......
I do want to thank you though for the response, I am very grateful that you took the time to try and help.

S. Shane Roth

drew.allen SSC Guru Points: 77014 More actions · Answer 5

sroth 71504 - Thursday, November 2, 2017 2:38 PM
drew.allen - Thursday, November 2, 2017 1:51 PM
You don't want to do this - at least not this way. You're trying to convert a denormalized table into a different kind of denormalized table. What you need to do is create two tables: The first table will contain just the name information, and the second table will contain the links from the name table to the table2. The following should give you normalized results (or at least closer to normalized).
/* Distinct Name Information */ SELECT DISTINCT YT1.TableOneID AS TableOneID -- , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID , YT1.LastName , YT1.FirstName , YT1.MiddleName , YT1.MothersName , YT1.DOB , YT1.Gender FROM [MyTable]YT1 (NOLOCK) JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName AND YT1.FirstName = YT2.FirstName
WHERE YT1.TableOneIDIS NOT NULL
ORDER BY YT1.TableOneID
/* Table with the joins */ SELECT YT1.TableOneID AS TableOneID , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID FROM [MyTable]YT1 (NOLOCK) JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName AND YT1.FirstName = YT2.FirstName
WHERE YT1.TableOneIDIS NOT NULL
ORDER BY YT1.TableOneID
While I understand what it is you are suggesting here, this will not work for the work I am doing. Having separate Selects will give the results yes, but not in the needed format as required by my vendor. Ultimately I need these files to become "merged" into a single return row for each named person. The returned results are to be exported via .csv files to a new system with a vendor I am working with to populate a new dB with the duplicates and excess garbage removed and all possible identifiers for a single person merged into a single field. This will provide them a single reference point in their system which can be linked to my historical data. I am looking now into a few sub queries nested to help eliminate the noise I am getting in my returns, and have had some basic success moving forward. The biggest limiting factor I have now is that the normal fields that should be the same have empty locations, or the information is not accurate. e.g. 9999999999 as a SOC in one row on one table, with the actual SOC in another row for the same person on the other table. The only common factors I am seeing consistently are the last and first names. This dB is by no way clean or standardized, and I was hoping someone with years more experience than I would have seen something similar and been able to guide me the right way. Perhaps this all should have been mentioned upfront, hindsight is always 20/20. I am now wondering if perhaps I end this thread and begin a new one with better code as I have it, and a more detailed description of what is needed to be done and why.......
I do want to thank you though for the response, I am very grateful that you took the time to try and help.

Merging all possible identifiers into a single field is not going to make it easy to link to your historical data. The method I suggested will both clean up the data and allow you to link to your historical data in a clean manner.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA