November 1, 2017 at 8:02 am
*** Updated with code in Replies***
Super Noob here. So sorry if there is already a thread here for this. I have searched and come up empty so I am hoping my question is unique enough to be new, but not so unique that I cant find a solution.
Problem:
Two(2) tables in the same dB have content which is almost identical less 2 specific fields. Those to fields contain the main identifies for these tables, but they are not the connecting piece to each other.
Example:
| Table One Identifier | Table Two Identifier | Last Name | First Name | Middle Name | Mothers Maiden Name | Date of Birth | Gender |
| 'LOL000001234' | 'D000000987' | 'FILLET' | 'PHISH' | 'R' | '' | '11/12/1901' | 'M' |
| 'LOL00003743' | 'T00000338' | 'CASTELIANO | 'TERRY' | 'R' | '' | '11/12/1934' | 'M' |
| 'LOL000001345' | 'A00002342' | 'CASTIGLIONE' | 'FRANCIS' | 'R' | '' | '02/15/1950' | 'M' |
| 'LOL000001987' | 'T33323' | 'FISH' | 'WANDA' | 'S' | '' | '06/12/1912' | 'F' |
| 'LOL0000039457' | 'K2324' | 'MORE' | 'MARY' | 'K' | '' | '05/24/1952' | 'F' |
| 'LOL0000023095' | 'A3327' | 'JONES' | 'JAMES' | 'J' | '' | '02/02/1979' | 'F' |
| 'AAB2234' | 'FILLET' | 'PHISH' | 'R' | '' | '11/12/1901' | 'M' | |
| 'AAB1122' | 'CASTELIANO | 'TERRY' | 'R' | '' | '11/12/1934' | 'M' | |
| 'AAB782785' | 'CASTIGLIONE' | 'FRANCIS' | 'R' | '' | '02/15/1950' | 'M' | |
| 'AAB4886836' | 'FISH' | 'WANDA' | 'S' | '' | '06/12/1912' | 'F' | |
| 'AAB123123123' | 'MORE' | 'MARY' | 'K' | '' | '05/24/1952' | 'F' | |
| 'AAB53786' | 'JONES' | 'JAMES' | 'J' | '' | '02/02/1979' | 'F' |
I am looking to combine these tables matching on items like Last Name,First Name, DOB, etc... AS I stated in the beginning, I am super green withSQL. I have several queries to pull data but nothing as far as combiningmatching data when the Keys are not related.
Intended result:
| Table One Identifier | Table Two Identifier | Last Name | First Name | Middle Name | Mothers Maiden Name | Date of Birth | Gender |
| 'LOL000001234' | 'D000000987~AAB2234' | 'FILLET' | 'PHISH' | 'R' | '' | '11/12/1901' | 'M' |
| 'LOL00003743' | 'T00000338~AAB1122' | 'CASTELIANO | 'TERRY' | 'R' | '' | '11/12/1934' | 'M' |
| 'LOL000001345' | 'A00002342~AAB782785' | 'CASTIGLIONE' | 'FRANCIS' | 'R' | '' | '02/15/1950' | 'M' |
| 'LOL000001987' | 'T33323~AAB4886836' | 'FISH' | 'WANDA' | 'S' | '' | '06/12/1912' | 'F' |
| 'LOL0000039457' | 'K2324~AAB123123123' | 'MORE' | 'MARY' | 'K' | '' | '05/24/1952' | 'F' |
| 'LOL0000023095' | 'A3327~AAB53786' | 'JONES' | 'JAMES' | 'J' | '' | '02/02/1979' | 'F' |
I believe this should be a simple process, but don't yet know the logicto manipulate the data.
Working in MS SQL Server Management Studio on 2012 SQL Server. I cannot make alterations to the dB tables, I can only manipulate the results through query. Please keep that in mind when responding.
Any assistance or direction to the right material is greatlyappreciated. I am eager to learn.
S. Shane Roth
November 1, 2017 at 8:20 am
Without DDL and consumable Sample Data, this is untested, and I have assumed that the First, Middle, and Last names will be the same in both copies, but this might get your started:
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
November 1, 2017 at 8:33 am
Thom A - Wednesday, November 1, 2017 8:20 AMWithout DDL and consumable DDL, this is untested, and I have assumed that the First, Middle, and Last names will be the same in both copies, but this might get your started:SELECT YT1.TableOneID,YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID,YT1.LastName, YT1.FirstName, YT1.MiddleName,YT1.MothersName,YT1.DoB,YT1.GenderFROM [YourTable] YT1JOIN [YourTable] YT2 ON YT1.LastName = YT2.LastNameAND YT1.FirstName = YT2.FirstNameAND YT1.MiddleName = YT2.MiddleNameWHERE YT1.TableOneID IS NOT NULL;
Thanks. I had this down to the Where clause. I left that off and really did make the difference. After some extra manipulation I had to tone the JOIN down to just the First and Last to get results. Appears that both dB's are not as accurate as I was told. GIGO is suppose.
What I am running intonow is that there are multiple ID's in Table2 that are creating additionalrows.
CODE:
SELECT YT1.TableOneID AS TableOneID
, YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
, YT1.LastName
, YT1.FirstName
, YT1.MiddleName
, YT1.MothersName
, YT1.DOB
, YT1.Gender
FROM [MyTable]YT1 (NOLOCK)
JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
AND YT1.FirstName = YT2.FirstName
WHERE YT1.TableOneIDIS NOT NULLORDER BY YT1.TableOneID
Resulting in:
| Table One Id
| Table Two Id
| Last Name
| First Name
| Middle Name
| Mothers Maiden Name
| Date of Birth
| Gender
|
| 'LOL000001234'
| 'D000000987~AAB2234'
| 'FILLET'
| 'PHISH'
| 'R'
| ''
| '11/12/1901'
| 'M'
|
| 'LOL000001234'
| 'D000000987~AAB56655'
| 'FILLET'
| 'PHISH'
| 'R'
| ''
| '11/12/1901'
| 'M'
|
| 'LOL000001234'
| 'D000000987~AAB456456'
| 'FILLET'
| 'PHISH'
| 'R'
| ''
| '11/12/1901'
| 'M'
|
| 'LOL000001234'
| 'D000000987~AAB4564'
| 'FILLET'
| 'PHISH'
| 'R'
| ''
| '11/12/1901'
| 'M'
|
| 'LOL000001234'
| 'D000000987~AAB86856'
| 'FILLET'
| 'PHISH'
| 'R'
| ''
| '11/12/1901'
| 'M'
|
| 'LOL000001234'
| 'D000000987~AAB78678'
| 'FILLET'
| 'PHISH'
| 'R'
| ''
| '11/12/1901'
| 'M'
|
| 'LOL000001234'
| 'D000000987~AAB456458'
| 'FILLET'
| 'PHISH'
| 'R'
| ''
| '11/12/1901'
| 'M'
|
| 'LOL00003743'
| 'T00000338~AAB1122'
| 'CASTELIANO
| 'TERRY'
| 'R'
| ''
| '11/12/1934'
| 'M'
|
| 'LOL000001345'
| 'A00002342~AAB782785'
| 'CASTIGLIONE'
| 'FRANCIS'
| 'R'
| ''
| '02/15/1950'
| 'M'
|
| 'LOL000001987'
| 'T33323~AAB4886836'
| 'FISH'
| 'WANDA'
| 'S'
| ''
| '06/12/1912'
| 'F'
|
| 'LOL0000039457'
| 'K2324~AAB123123123'
| 'MORE'
| 'MARY'
| 'K'
| ''
| '05/24/1952'
| 'F'
|
| 'LOL0000023095'
| 'A3327~AAB53786'
| 'JONES'
| 'JAMES'
| 'J'
| ''
| '02/02/1979'
| 'F'
|
The question nowbecomes, how to present all the separate YT2.TableTwoID data in a single rowwhen there are difference in them as well.
Desired Result would not be:
| Top of Form
Table One Id
Bottom of Form
| Table Two Id
| Last Name
| First Name
| Middle Name
| Mothers Maiden Name
| Date of Birth
| Gender
|
| 'LOL000001234'
| D000000987~AAB2234~AAB56655~AAB456456~AAB4564~AAB86856~AAB78678~AAB456458'
| 'FILLET'
| 'PHISH'
| 'R'
| ''
| '11/12/1901'
| 'M'
|
| 'LOL00003743'
| 'T00000338~AAB1122'
| 'CASTELIANO
| 'TERRY'
| 'R'
| ''
| '11/12/1934'
| 'M'
|
| 'LOL000001345'
| 'A00002342~AAB782785'
| 'CASTIGLIONE'
| 'FRANCIS'
| 'R'
| ''
| '02/15/1950'
| 'M'
|
| 'LOL000001987'
| 'T33323~AAB4886836'
| 'FISH'
| 'WANDA'
| 'S'
| ''
| '06/12/1912'
| 'F'
|
| 'LOL0000039457'
| 'K2324~AAB123123123'
| 'MORE'
| 'MARY'
| 'K'
| ''
| '05/24/1952'
| 'F'
|
| 'LOL0000023095'
| 'A3327~AAB53786'
| 'JONES'
| 'JAMES'
| 'J'
| ''
| '02/02/1979'
| 'F'
|
Ultimately making a singlerow for each person's Name entry.
Thanks again.
S. Shane Roth
November 2, 2017 at 1:51 pm
You don't want to do this - at least not this way. You're trying to convert a denormalized table into a different kind of denormalized table. What you need to do is create two tables: The first table will contain just the name information, and the second table will contain the links from the name table to the table2. The following should give you normalized results (or at least closer to normalized).
/* Distinct Name Information */
SELECT DISTINCT YT1.TableOneID AS TableOneID
-- , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
, YT1.LastName
, YT1.FirstName
, YT1.MiddleName
, YT1.MothersName
, YT1.DOB
, YT1.Gender
FROM [MyTable]YT1 (NOLOCK)
JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
AND YT1.FirstName = YT2.FirstName
WHERE YT1.TableOneIDIS NOT NULL
ORDER BY YT1.TableOneID
/* Table with the joins */
SELECT YT1.TableOneID AS TableOneID
, YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
FROM [MyTable]YT1 (NOLOCK)
JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
AND YT1.FirstName = YT2.FirstName
WHERE YT1.TableOneIDIS NOT NULL
ORDER BY YT1.TableOneID
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
November 2, 2017 at 2:38 pm
drew.allen - Thursday, November 2, 2017 1:51 PMYou don't want to do this - at least not this way. You're trying to convert a denormalized table into a different kind of denormalized table. What you need to do is create two tables: The first table will contain just the name information, and the second table will contain the links from the name table to the table2. The following should give you normalized results (or at least closer to normalized).
/* Distinct Name Information */
SELECT DISTINCT YT1.TableOneID AS TableOneID
-- , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
, YT1.LastName
, YT1.FirstName
, YT1.MiddleName
, YT1.MothersName
, YT1.DOB
, YT1.Gender
FROM [MyTable]YT1 (NOLOCK)
JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
AND YT1.FirstName = YT2.FirstNameWHERE YT1.TableOneIDIS NOT NULL
ORDER BY YT1.TableOneID
/* Table with the joins */
SELECT YT1.TableOneID AS TableOneID
, YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
FROM [MyTable]YT1 (NOLOCK)
JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
AND YT1.FirstName = YT2.FirstNameWHERE YT1.TableOneIDIS NOT NULL
ORDER BY YT1.TableOneID
While I understand what it is you are suggesting here, this will not work for the work I am doing. Having separate Selects will give the results yes, but not in the needed format as required by my vendor. Ultimately I need these files to become "merged" into a single return row for each named person. The returned results are to be exported via .csv files to a new system with a vendor I am working with to populate a new dB with the duplicates and excess garbage removed and all possible identifiers for a single person merged into a single field. This will provide them a single reference point in their system which can be linked to my historical data. I am looking now into a few sub queries nested to help eliminate the noise I am getting in my returns, and have had some basic success moving forward. The biggest limiting factor I have now is that the normal fields that should be the same have empty locations, or the information is not accurate. e.g. 9999999999 as a SOC in one row on one table, with the actual SOC in another row for the same person on the other table. The only common factors I am seeing consistently are the last and first names. This dB is by no way clean or standardized, and I was hoping someone with years more experience than I would have seen something similar and been able to guide me the right way. Perhaps this all should have been mentioned upfront, hindsight is always 20/20. I am now wondering if perhaps I end this thread and begin a new one with better code as I have it, and a more detailed description of what is needed to be done and why.......
I do want to thank you though for the response, I am very grateful that you took the time to try and help.
S. Shane Roth
November 2, 2017 at 2:57 pm
sroth 71504 - Thursday, November 2, 2017 2:38 PMdrew.allen - Thursday, November 2, 2017 1:51 PMYou don't want to do this - at least not this way. You're trying to convert a denormalized table into a different kind of denormalized table. What you need to do is create two tables: The first table will contain just the name information, and the second table will contain the links from the name table to the table2. The following should give you normalized results (or at least closer to normalized).
/* Distinct Name Information */
SELECT DISTINCT YT1.TableOneID AS TableOneID
-- , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
, YT1.LastName
, YT1.FirstName
, YT1.MiddleName
, YT1.MothersName
, YT1.DOB
, YT1.Gender
FROM [MyTable]YT1 (NOLOCK)
JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
AND YT1.FirstName = YT2.FirstNameWHERE YT1.TableOneIDIS NOT NULL
ORDER BY YT1.TableOneID
/* Table with the joins */
SELECT YT1.TableOneID AS TableOneID
, YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
FROM [MyTable]YT1 (NOLOCK)
JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
AND YT1.FirstName = YT2.FirstNameWHERE YT1.TableOneIDIS NOT NULL
ORDER BY YT1.TableOneID
While I understand what it is you are suggesting here, this will not work for the work I am doing. Having separate Selects will give the results yes, but not in the needed format as required by my vendor. Ultimately I need these files to become "merged" into a single return row for each named person. The returned results are to be exported via .csv files to a new system with a vendor I am working with to populate a new dB with the duplicates and excess garbage removed and all possible identifiers for a single person merged into a single field. This will provide them a single reference point in their system which can be linked to my historical data. I am looking now into a few sub queries nested to help eliminate the noise I am getting in my returns, and have had some basic success moving forward. The biggest limiting factor I have now is that the normal fields that should be the same have empty locations, or the information is not accurate. e.g. 9999999999 as a SOC in one row on one table, with the actual SOC in another row for the same person on the other table. The only common factors I am seeing consistently are the last and first names. This dB is by no way clean or standardized, and I was hoping someone with years more experience than I would have seen something similar and been able to guide me the right way. Perhaps this all should have been mentioned upfront, hindsight is always 20/20. I am now wondering if perhaps I end this thread and begin a new one with better code as I have it, and a more detailed description of what is needed to be done and why.......
I do want to thank you though for the response, I am very grateful that you took the time to try and help.
Merging all possible identifiers into a single field is not going to make it easy to link to your historical data. The method I suggested will both clean up the data and allow you to link to your historical data in a clean manner.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 6 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply