How to join using unique rows in both tables

  • The problem I have is matching redundant data between two tables. Both tables MyTable_A and MyTable_B are completely un-normalized. They contain a lot of redundant data and some unique data. I believe the files were originally large spread sheets that were imported into SQL and passed on to me.

    In order to bring order to this ciaos I created a primary key in each table. Then created an AddressID to group and there for uniquely identify duplicate addresses with in each table. I did this with Phone and name also. These are not keys relating the two files. These ID’s only let me identify the unique addresses, names and phone numbers within the respective tables.

    The first thing I wanted to do is determine the addresses that match between the two files. In order to do this I need to consider only unique AddressID’s in the respective files and match the actual address between the two files.

    CREATE TABLE #MyTable_A (MyTable_AID INT PRIMARY KEY,

    AddressID INT, Address Varchar(50),

    Zip Varchar(10),

    PhoneID INT, Phone Varchar(10),

    NameID INT, Namex Varchar(50))

    INSERT INTO #MyTable_A VALUES (1, 1, '123 Main', '12345', 1, '1231111111', 1, 'John Doe')

    INSERT INTO #MyTable_A VALUES (2, 2, '222 North', '22222', 1, '2222222222', 2, 'Sue Kent')

    INSERT INTO #MyTable_A VALUES (3, 1, '123 Main', '12345', 1, '1232222222', 3, 'Mary Doe')

    INSERT INTO #MyTable_A VALUES (4, 3, '333 South', '33333', 1, '3333333333', 4, 'Frank Black')

    INSERT INTO #MyTable_A VALUES (5, 1, '123 Main', '12345', 1, '1234444444', 5, 'John Smith')

    CREATE TABLE #MyTable_B (MyTable_BID INT PRIMARY KEY,

    AddressID INT, Address Varchar(50),

    Zip Varchar(10),

    PhoneID INT, Phone Varchar(10),

    NameID INT, Namex Varchar(50))

    INSERT INTO #MyTable_B VALUES (1, 5, '123 Main', '12345', 1, '1231111111', 1, 'Larry Franks')

    INSERT INTO #MyTable_B VALUES (2, 2, '444 East', '44444', 2, '4444444444', 2, 'Frank Black')

    INSERT INTO #MyTable_B VALUES (3, 5, '123 Main', '12345', 3, '1232222222', 3, 'Joe Little')

    INSERT INTO #MyTable_B VALUES (4, 7, '333 South', '33333', 4, '3333333333', 2, 'Frank Black')

    INSERT INTO #MyTable_B VALUES (5, 7, '333 South', '33333', 4, '3333333333', 4, 'Phil Johnson')

    I want a result set that looks like this.

    MyAddresses

    MyAddresses_ID.....MTA_AID.....MTB_AID.....MTA_Address.....MTB_Address.....MTA_Zip.....MTB_Zip

    1.......................1........5............123 Main........123 Main.......12345.......12345

    2.......................3........7............333 South......333 South.......33333.......33333

    Now that I know the addresses that match I can bring in the nonmatching addresses from both tables. This will give me an Address table containing unique addresses. I will do the same thing using the Phone and NameID to create Name and phone tables that junction to the address table. This will give me a normalized DB to work with.

    I am certainly open to better ways to do this.

    I have tried

    CREATE TABLE #MyAddresses

    (

    MyAddresses_ID int IDENTITY(1,1),

    MTA_AID int,

    MTB_AID int,

    MTA_Address Varchar(50),

    MTB_Address Varchar(50),

    MTA_Zip Varchar(10),

    MTB_Zip Varchar(10)

    )

    INSERT INTO #MyAddresses (MTA_AID,MTB_AID,

    MTA_Address,MTB_Address,

    MTA_Zip,MTB_Zip)

    SELECT d1.AddressID,d2.AddressID,

    d1.Address.d2.Address,

    d1.Zip,d2.Zip

    FROM

    (

    SELECT DISTINCT AddressID,Address,Zip

    FROM #MyTable_A

    )d1

    INNER JOIN

    (

    SELECT DISTINCT AddressID,Address,Zip

    FROM #MyTable_B

    )d2

    ON d2.Address=d1.Address

    AND d2.Zip=d1.Zip

    select * from #MyAddresses

    But I get this result set:

    1 1 1 123 Main 123 Main 12345 12345

    2 3 3 333 South 333 South 33333 33333

    The addresses are as expected, But the MTA_AID and MTB_AID are not. Based on the data they should be 1, 5 and 3, 7? There is no AddressID 1 for 123 Main or 3 for '333 South' in #MyTable_B.

    How do i correct this or should I usr a differant approach?

  • I ran your query and I got the result

    115123 Main123 Main1234512345

    237333 South333 South3333333333

    Also I hoped this was a typo

    SELECT d1.AddressID,d2.AddressID,

    d1.Address,d2.Address,

    d1.Zip,d2.Zip

    You had d1.Address.d2.Address

  • Thanks for catching that typo. I also found another fat finger mistake that was causing my result set ont to be correct.

  • Thanks for catching that typo. I also found another fat finger mistake that was causing my result set ont to be correct.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply