Combining data from 2 tables with mostly identical data

  • *** Updated with code in Replies***
    Super Noob here. So sorry if there is already a thread here for this. I have searched and come up empty so I am hoping my question is unique enough to be new, but not so unique that I cant find a solution.
    Problem:
    Two(2) tables in the same dB have content which is almost identical less 2 specific fields. Those to fields contain the main identifies for these tables, but they are not the connecting piece to each other.
    Example:

    Table One IdentifierTable Two IdentifierLast Name First Name Middle NameMothers Maiden NameDate of BirthGender
    'LOL000001234''D000000987''FILLET''PHISH''R''''11/12/1901''M'                 
    'LOL00003743''T00000338''CASTELIANO'TERRY''R''''11/12/1934''M'                 
    'LOL000001345''A00002342''CASTIGLIONE''FRANCIS''R''''02/15/1950''M'                 
    'LOL000001987''T33323''FISH''WANDA''S''''06/12/1912''F'                 
    'LOL0000039457''K2324''MORE''MARY''K''''05/24/1952''F'                 
    'LOL0000023095''A3327''JONES''JAMES''J''''02/02/1979''F'                 
     'AAB2234''FILLET''PHISH''R''''11/12/1901''M'                 
     'AAB1122''CASTELIANO'TERRY''R''''11/12/1934''M'                 
     'AAB782785''CASTIGLIONE''FRANCIS''R''''02/15/1950''M'                 
     'AAB4886836''FISH''WANDA''S''''06/12/1912''F'                 
     'AAB123123123''MORE''MARY''K''''05/24/1952''F'                 
     'AAB53786''JONES''JAMES''J''''02/02/1979''F'                 

    I am looking to combine these tables matching on items like Last Name,First Name, DOB, etc... AS I stated in the beginning, I am super green withSQL. I have several queries to pull data but nothing as far as combiningmatching data when the Keys are not related.

    Intended result:

    Table One IdentifierTable Two IdentifierLast Name First Name Middle NameMothers Maiden NameDate of BirthGender
    'LOL000001234''D000000987~AAB2234''FILLET''PHISH''R''''11/12/1901''M'                 
    'LOL00003743''T00000338~AAB1122''CASTELIANO'TERRY''R''''11/12/1934''M'                 
    'LOL000001345''A00002342~AAB782785''CASTIGLIONE''FRANCIS''R''''02/15/1950''M'                 
    'LOL000001987''T33323~AAB4886836''FISH''WANDA''S''''06/12/1912''F'                 
    'LOL0000039457''K2324~AAB123123123''MORE''MARY''K''''05/24/1952''F'                 
    'LOL0000023095''A3327~AAB53786''JONES''JAMES''J''''02/02/1979''F'                 

    I believe this should be a simple process, but don't yet know the logicto manipulate the data.

    Working in MS SQL Server Management Studio on 2012 SQL Server. I cannot make alterations to the dB tables, I can only manipulate the results through query. Please keep that in mind when responding.

    Any assistance or direction to the right material is greatlyappreciated. I am eager to learn.

    S. Shane Roth

  • Without DDL and consumable Sample Data, this is untested, and I have assumed that the First, Middle, and Last names will be the same in both copies, but this might get your started:

    SELECT YT1.TableOneID,
       YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID,
       YT1.LastName, YT1.FirstName, YT1.MiddleName,
       YT1.MothersName,
       YT1.DoB,
       YT1.Gender
    FROM [YourTable] YT1
      JOIN [YourTable] YT2 ON YT1.LastName = YT2.LastName
              AND YT1.FirstName = YT2.FirstName
             AND YT1.MiddleName = YT2.MiddleName

    WHERE YT1.TableOneID IS NOT NULL;

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Wednesday, November 1, 2017 8:20 AM

    Without DDL and consumable DDL, this is untested, and I have assumed that the First, Middle, and Last names will be the same in both copies, but this might get your started:

    SELECT YT1.TableOneID,
       YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID,
       YT1.LastName, YT1.FirstName, YT1.MiddleName,
       YT1.MothersName,
       YT1.DoB,
       YT1.Gender
    FROM [YourTable] YT1
      JOIN [YourTable] YT2 ON YT1.LastName = YT2.LastName
              AND YT1.FirstName = YT2.FirstName
             AND YT1.MiddleName = YT2.MiddleName

    WHERE YT1.TableOneID IS NOT NULL;

    Thanks. I had this down to the Where clause. I left that off and really did make the difference. After some extra manipulation I had to tone the JOIN down to just the First and Last to get results. Appears that both dB's are not as accurate as I was told. GIGO is suppose.

    What I am running intonow is that there are multiple ID's in Table2 that are creating additionalrows.
    CODE:

    SELECT YT1.TableOneID AS TableOneID
     , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
     , YT1.LastName
     , YT1.FirstName
     , YT1.MiddleName
     , YT1.MothersName
     , YT1.DOB
     , YT1.Gender

    FROM [MyTable]YT1 (NOLOCK)
      JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName 
          AND YT1.FirstName = YT2.FirstName

    WHERE YT1.TableOneIDIS NOT NULLORDER BY YT1.TableOneID

    Resulting in:

    Table One Id

    Table Two Id

    Last Name

    First Name

    Middle Name

    Mothers Maiden Name

    Date of Birth

    Gender

    'LOL000001234'

    'D000000987~AAB2234'

    'FILLET'

    'PHISH'

    'R'

    ''

    '11/12/1901'

    'M'

    'LOL000001234'

    'D000000987~AAB56655'

    'FILLET'

    'PHISH'

    'R'

    ''

    '11/12/1901'

    'M'

    'LOL000001234'

    'D000000987~AAB456456'

    'FILLET'

    'PHISH'

    'R'

    ''

    '11/12/1901'

    'M'

    'LOL000001234'

    'D000000987~AAB4564'

    'FILLET'

    'PHISH'

    'R'

    ''

    '11/12/1901'

    'M'

    'LOL000001234'

    'D000000987~AAB86856'

    'FILLET'

    'PHISH'

    'R'

    ''

    '11/12/1901'

    'M'

    'LOL000001234'

    'D000000987~AAB78678'

    'FILLET'

    'PHISH'

    'R'

    ''

    '11/12/1901'

    'M'

    'LOL000001234'

    'D000000987~AAB456458'

    'FILLET'

    'PHISH'

    'R'

    ''

    '11/12/1901'

    'M'

    'LOL00003743'

    'T00000338~AAB1122'

    'CASTELIANO

    'TERRY'

    'R'

    ''

    '11/12/1934'

    'M'

    'LOL000001345'

    'A00002342~AAB782785'

    'CASTIGLIONE'

    'FRANCIS'

    'R'

    ''

    '02/15/1950'

    'M'

    'LOL000001987'

    'T33323~AAB4886836'

    'FISH'

    'WANDA'

    'S'

    ''

    '06/12/1912'

    'F'

    'LOL0000039457'

    'K2324~AAB123123123'

    'MORE'

    'MARY'

    'K'

    ''

    '05/24/1952'

    'F'

    'LOL0000023095'

    'A3327~AAB53786'

    'JONES'

    'JAMES'

    'J'

    ''

    '02/02/1979'

    'F' 

    The question nowbecomes, how to present all the separate YT2.TableTwoID data in a single rowwhen there are difference in them as well.
    Desired Result would not be:

    Top of Form

    Table One Id

    Bottom of Form

    Table Two Id

    Last Name

    First Name

    Middle Name

    Mothers Maiden Name

    Date of Birth

    Gender

    'LOL000001234'

    D000000987~AAB2234~AAB56655~AAB456456~AAB4564~AAB86856~AAB78678~AAB456458'

    'FILLET'

    'PHISH'

    'R'

    ''

    '11/12/1901'

    'M'

    'LOL00003743'

    'T00000338~AAB1122'

    'CASTELIANO

    'TERRY'

    'R'

    ''

    '11/12/1934'

    'M'

    'LOL000001345'

    'A00002342~AAB782785'

    'CASTIGLIONE'

    'FRANCIS'

    'R'

    ''

    '02/15/1950'

    'M'

    'LOL000001987'

    'T33323~AAB4886836'

    'FISH'

    'WANDA'

    'S'

    ''

    '06/12/1912'

    'F'

    'LOL0000039457'

    'K2324~AAB123123123'

    'MORE'

    'MARY'

    'K'

    ''

    '05/24/1952'

    'F'

    'LOL0000023095'

    'A3327~AAB53786'

    'JONES'

    'JAMES'

    'J'

    ''

    '02/02/1979'

    'F' 


    Ultimately making a singlerow for each person's Name entry.


    Thanks again.

    S. Shane Roth

  • You don't want to do this - at least not this way.  You're trying to convert a denormalized table into a different kind of denormalized table.  What you need to do is create two tables: The first table will contain just the name information, and the second table will contain the links from the name table to the table2.  The following should give you normalized results (or at least closer to normalized).

    /* Distinct Name Information */
    SELECT DISTINCT YT1.TableOneID AS TableOneID
    -- , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
    , YT1.LastName
    , YT1.FirstName
    , YT1.MiddleName
    , YT1.MothersName
    , YT1.DOB
    , YT1.Gender
    FROM [MyTable]YT1 (NOLOCK)
    JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
      AND YT1.FirstName = YT2.FirstName

    WHERE YT1.TableOneIDIS NOT NULL

    ORDER BY YT1.TableOneID

    /* Table with the joins */
    SELECT YT1.TableOneID AS TableOneID
    , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
    FROM [MyTable]YT1 (NOLOCK)
    JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
      AND YT1.FirstName = YT2.FirstName

    WHERE YT1.TableOneIDIS NOT NULL

    ORDER BY YT1.TableOneID

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Thursday, November 2, 2017 1:51 PM

    You don't want to do this - at least not this way.  You're trying to convert a denormalized table into a different kind of denormalized table.  What you need to do is create two tables: The first table will contain just the name information, and the second table will contain the links from the name table to the table2.  The following should give you normalized results (or at least closer to normalized).

    /* Distinct Name Information */
    SELECT DISTINCT YT1.TableOneID AS TableOneID
    -- , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
    , YT1.LastName
    , YT1.FirstName
    , YT1.MiddleName
    , YT1.MothersName
    , YT1.DOB
    , YT1.Gender
    FROM [MyTable]YT1 (NOLOCK)
    JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
      AND YT1.FirstName = YT2.FirstName

    WHERE YT1.TableOneIDIS NOT NULL

    ORDER BY YT1.TableOneID

    /* Table with the joins */
    SELECT YT1.TableOneID AS TableOneID
    , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
    FROM [MyTable]YT1 (NOLOCK)
    JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
      AND YT1.FirstName = YT2.FirstName

    WHERE YT1.TableOneIDIS NOT NULL

    ORDER BY YT1.TableOneID

    While I understand what it is you are suggesting here, this will not work for the work I am doing. Having separate Selects will give the results yes, but not in the needed format as required by my vendor. Ultimately I need these files to become "merged" into a single return row for each named person. The returned results are to be exported via .csv files to a new system with a vendor I am working with to populate a new dB with the duplicates and excess garbage removed and all possible identifiers for a single person merged into a single field. This will provide them a single reference point in their system which can be linked to my historical data. I am looking now into a few sub queries nested to help eliminate the noise I am getting in my returns, and have had some basic success moving forward. The biggest limiting factor I have now is that the normal fields that should be the same have empty locations, or the information is not accurate. e.g. 9999999999 as a SOC in one row on one table, with the actual SOC in another row for the same person on the other table. The only common factors I am seeing consistently are the last and first names. This dB is by no way clean or standardized, and I was hoping someone with years more experience than I would have seen something similar and been able to guide me the right way. Perhaps this all should have been mentioned upfront, hindsight is always 20/20. I am now wondering if perhaps I end this thread and begin a new one with better code as I have it, and a more detailed description of what is needed to be done and why.......
    I do want to thank you though for the response, I am very grateful that you took the time to try and help.

    S. Shane Roth

  • sroth 71504 - Thursday, November 2, 2017 2:38 PM

    drew.allen - Thursday, November 2, 2017 1:51 PM

    You don't want to do this - at least not this way.  You're trying to convert a denormalized table into a different kind of denormalized table.  What you need to do is create two tables: The first table will contain just the name information, and the second table will contain the links from the name table to the table2.  The following should give you normalized results (or at least closer to normalized).

    /* Distinct Name Information */
    SELECT DISTINCT YT1.TableOneID AS TableOneID
    -- , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
    , YT1.LastName
    , YT1.FirstName
    , YT1.MiddleName
    , YT1.MothersName
    , YT1.DOB
    , YT1.Gender
    FROM [MyTable]YT1 (NOLOCK)
    JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
      AND YT1.FirstName = YT2.FirstName

    WHERE YT1.TableOneIDIS NOT NULL

    ORDER BY YT1.TableOneID

    /* Table with the joins */
    SELECT YT1.TableOneID AS TableOneID
    , YT1.TableTwoID + '~' + YT2.TableTwoID AS TableTwoID
    FROM [MyTable]YT1 (NOLOCK)
    JOIN [MyTable2] TY2 ON (NOLOCK) YT1.LastName = YT2.LastName
      AND YT1.FirstName = YT2.FirstName

    WHERE YT1.TableOneIDIS NOT NULL

    ORDER BY YT1.TableOneID

    While I understand what it is you are suggesting here, this will not work for the work I am doing. Having separate Selects will give the results yes, but not in the needed format as required by my vendor. Ultimately I need these files to become "merged" into a single return row for each named person. The returned results are to be exported via .csv files to a new system with a vendor I am working with to populate a new dB with the duplicates and excess garbage removed and all possible identifiers for a single person merged into a single field. This will provide them a single reference point in their system which can be linked to my historical data. I am looking now into a few sub queries nested to help eliminate the noise I am getting in my returns, and have had some basic success moving forward. The biggest limiting factor I have now is that the normal fields that should be the same have empty locations, or the information is not accurate. e.g. 9999999999 as a SOC in one row on one table, with the actual SOC in another row for the same person on the other table. The only common factors I am seeing consistently are the last and first names. This dB is by no way clean or standardized, and I was hoping someone with years more experience than I would have seen something similar and been able to guide me the right way. Perhaps this all should have been mentioned upfront, hindsight is always 20/20. I am now wondering if perhaps I end this thread and begin a new one with better code as I have it, and a more detailed description of what is needed to be done and why.......
    I do want to thank you though for the response, I am very grateful that you took the time to try and help.

    Merging all possible identifiers into a single field is not going to make it easy to link to your historical data.  The method I suggested will both clean up the data and allow you to link to your historical data in a clean manner.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply