Find Differences Between Two Tables

  • Hello,

    I’m new to SQL and need some help please. This is such a mind bender for me.

    I am looking for differences between Table 1 and Table 2, and then putthose differences in Table 3.  Table 3 isthe final report table that shows both Table 1 and Table 2 fields so I can seethe differences across both tables.


     My Code:

    Select

    [t1_Name],

    [t2_Name] ,

    [t1_Title] ,

    [t2_Title] ,

    [t1_Table] ,

    [t2_Table] ,

    [t1_Data] ,

    [t2_Data]

    From Table1

    Left Join Table 2 on t1_Name = t2_Name

    Where t1_Name <> t2_Name or

    [t1_Title] <> [t2_Title] or

    [t1_Table] <> [t2_Table] , or

    [t1_Data] <> [t2_Data]


     A Look at the Tables: (See picture attached)


    • The first row of Table 1 equals the first row of Table 2 except theName.  So, will be inserted into Table 3as different

    • The 2nd row of Table 1 equals the 3rd row of Table 2. So an exact match isfound and not inserted into Table 3

    • The 3rd row of Table 1 equals the 4th row of Table 2  except for the Title and Data in Table2.  So, will be inserted into Table 3 asdifferent

    • The 4th row of Table 1 is not found in Table 2, So, will be inserted intoTable 3 as different

    • Table 5th and 6th rows of Table 2 are not in Table 1 . So will be insertedinto Table 3 as different

    The Problem:

    1. Neither Table 1 or Table 2 have primary keys.  So, I have had to look at the field names tofind differences.

    2. The table 1 is not in the same order as Table 2

    3. Table 2 has more rows that Table 1

    4. if there is data in Table 2 that is not in Table 1, need to show Table 1fields as "null"

    5. if there is data in Table 1 that is not in Table 2, need to show Table 2fields as "null"

    6. There can be multiple different columns that are different(for instance theTitle and the Data can be different, but the Name and the Table are the same.

       

      Create Table1(

      [t1_Name] [varchar] (30) NULL,

      [t1_Title] [varchar] (30) NULL,

      [t1_Table] [varchar] (30) NULL,

      [t1_Data] [varchar] (30) NULL)

      Insert [Table1]

      ([t1_Name] , [t1_Title] , [t1_Table] , [t1_Data], ) Values ( N'ABC',  N'Timber',N'B14', N '1,2,4,6')

      ([t1_Name] , [t1_Title] , [t1_Table] , [t1_Data], ) Values ( N'ABC',  N'Ramp', N'B19',N '1,3,5')

      ([t1_Name] , [t1_Title] , [t1_Table] , [t1_Data], ) Values ( N'ABC',  N'Timber',N'B9', N '1,2')

      ([t1_Name] , [t1_Title] , [t1_Table] , [t1_Data], ) Values ( N'ABC',  N'Glass', N'H1',NULL)

       

      Create Table2(

      [t1_Name] [varchar] (30) NULL,

      [t1_Title] [varchar] (30) NULL,

      [t1_Table] [varchar] (30) NULL,

      [t1_Data] [varchar] (30) NULL)

      Insert [Table2]

      ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'ABC_M',  N'Timber',N'B14', N '1,2,4,6')

      ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'XYZ',  N'Cars', N'B17', N '1,2,4')

      ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'ABC',  N'Ramp', N'B19',N '1,3,5')

      ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'GHY',  N'Timber', N'B9', NULL)

      ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'ABN',  N'Cranes', N'B1',N '1,2,3,9')

      ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'NYU',  N'Berries', N'GP78', N '1,2')

       

       

       

      Create Table3(

      [t1_Name] [varchar] (30) NULL,

      [t2_Name] [varchar] (30) NULL,

      [t1_Table] [varchar] (30) NULL,

      [t2_Table] [varchar] (30) NULL,

      [t1_Title] [varchar] (30) NULL,

      [t2_Title] [varchar] (30) NULL,

      [t1_Data] [varchar] (30) NULL,

      [t2_Data] [varchar] (30) NULL)


    Thanks for your help!
    Charles P.

  • Correction on Table2:

    Create Table2(

    [t2_Name] [varchar] (30) NULL,

    [t2_Title] [varchar] (30) NULL,

    [t2_Table] [varchar] (30) NULL,

    [t2_Data] [varchar] (30) NULL)

    Insert [Table2]

    ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'ABC_M',  N'Timber',N'B14', N '1,2,4,6')

    ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'XYZ',  N'Cars', N'B17', N '1,2,4')

    ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'ABC',  N'Ramp', N'B19',N '1,3,5')

    ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'GHY',  N'Timber', N'B9', NULL)

    ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'ABN',  N'Cranes', N'B1',N '1,2,3,9')

    ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'NYU',  N'Berries', N'GP78', N '1,2')

  • There are two major problems with all of this...

    The first problem is that, although you tried, you didn't try to run your the test data code you posted because it has 5 different classes of errors in it.  Here's the redaction of your code that works to entice other folks to give this problem a shot.

    --===== Drop Temp Tables to make runs in SSMS easier.
         IF OBJECT_ID('tempdb..#Table1','U') IS NOT NULL DROP TABLE #Table1;
         IF OBJECT_ID('tempdb..#Table2','U') IS NOT NULL DROP TABLE #Table2;
         IF OBJECT_ID('tempdb..#Table3','U') IS NOT NULL DROP TABLE #Table3;
    ;
    --===== Create and populate Table1
     CREATE TABLE #Table1
            (
             t1_Name    VARCHAR(30) NULL
            ,t1_Title   VARCHAR(30) NULL
            ,t1_Table   VARCHAR(30) NULL
            ,t1_Data    VARCHAR(30) NULL
            )
    ;
     INSERT INTO #Table1
             (t1_Name , t1_Title , t1_Table , t1_Data)
     VALUES  ('ABC', 'Timber', 'B14', '1,2,4,6')
            ,('ABC', 'Ramp'  , 'B19', '1,3,5')
            ,('ABC', 'Timber', 'B9' , '1,2')
            ,('ABC', 'Glass' , 'H1' , NULL)
    ;
    --===== Create and populate Table1
     CREATE TABLE #Table2
            (
             t2_Name    VARCHAR(30) NULL
            ,t2_Title   VARCHAR(30) NULL
            ,t2_Table   VARCHAR(30) NULL
            ,t2_Data    VARCHAR(30) NULL
            )
    ;
     INSERT INTO #Table2
             (t2_Name , t2_Title , t2_Table , t2_Data)
     VALUES  ('ABC_M', 'Timber' , 'B14' , '1,2,4,6')
            ,('XYZ'  , 'Cars'   , 'B17' , '1,2,4')
            ,('ABC'  , 'Ramp'   , 'B19' , '1,3,5')
            ,('GHY'  , 'Timber' , 'B9'  , NULL)
            ,('ABN'  , 'Cranes' , 'B1'  , '1,2,3,9')
            ,('NYU'  , 'Berries', 'GP78', '1,2')
    ;
    --===== Create Table3
     CREATE TABLE #Table3
            (
             t1_Name    VARCHAR(30) NULL
            ,t2_Name    VARCHAR(30) NULL
            ,t1_Table   VARCHAR(30) NULL
            ,t2_Table   VARCHAR(30) NULL
            ,t1_Title   VARCHAR(30) NULL
            ,t2_Title   VARCHAR(30) NULL
            ,t1_Data    VARCHAR(30) NULL
            ,t2_Data    VARCHAR(30) NULL
            )
    ;

    The second problem is that you want to compare all of the rows of the first table with the second with no key to identify which rows should be compared between the two tables.  That will make a Cartesian Product on steroids the output of which will be basically useless because of all the possibilities that will be returned.

    For example, I would imagine from your description that you would want the following row from Table1...

    ('ABC', 'Timber', 'B14', '1,2,4,6')

    ... to match up with the following row from Table2...

    ('ABC_M', 'Timber' , 'B14' , '1,2,4,6')

    ... simply because the 2nd, 3rd, and 4th columns matched up and then show the difference as being only the 1st column and THEN depict that in Table3.

    In theory, you should also include just 2 columns matching as well as just 1 column matching and you're going to end up with quite the hairball both for processing resources and the one that human that has to decipher the output will cough up while trying to figure it all out.  Because of all the duplication that will occur for trinary, binary, and singleton matches, you'll also need to come up with a "weighting" for the best matches and then compare those to a logical copy of the same rows possibly generating even more duplication.

    And that's just for 4 columns.  How many columns and rows does this data have and what would your definition of a minimum satisfactory match actually be?  What, for example, would you expect the output for your sample data to be especially when it comes to the "Timber" rows?  I can certainly guess at that by looking at the data with the Mark-1, Mod-1 eyeball but that would only be because I would mentally assume what a minimum match would be (match on Title and Table, in this case).  If the minimum match is "any two columns", that will require 6 Cartesian Products between the two tables but only if they really have only 4 columns each.

    To summarize my questions...
    1.  What is considered (by number of column matches) to be the minimum requirement for a match?
    2.  How many rows are in each of your real tables?
    3.  How many columns are in each of your real tables?
    4.  What is the name of the person that provided you with such thoughtless data as to have no key?  I want to make sure that I never, ever work with them or for them. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • BTW... the formula for the number of Cartesian Products required for "N" columns of "any two columns match" is ((N-1)^2 + (N-1))/2.  That means that just 10 columns would require 45 Cartesian products.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hello Jeff,

    This is my first time out of the gate with SQL Server Central. Thanks for the tips and advice.

    You are correct: I would like the following row from Table1...

    ('ABC', 'Timber', 'B14', '1,2,4,6')

    ... to match up with the following row from Table2...

    ('ABC_M', 'Timber' , 'B14' , '1,2,4,6')

    The minimum number of columns to match is two(2) - Name and then Title.

    Next columns to match would be Name and then Table. Next columns to match would be Name and then title; and then Name and Data. The name will always be one of the columns to match.

    Table 1 has 1584 rows and Table 2 has 1572 rows. There are 12 columns in Table 1 and Table 2 but I am only looking for the differences in the four columns (name, table, title, and data).  The other columns are not common enough across tables.

    The person that built these tables are long gone. Now I'm stuck with creating this report which is the result in Table 3.   

    I did consider sql rowcount, but did't quite get there yet.  I also looked at substring to see if the characters matched. But some of the field names are short and some are long like the Data can be 180 to 200 characters or just 12 to 20 characters long.  So substring did not work for me.

    Thanks for your advice and suggestions to get me going in the right direction.

    Charles P.

  • I'd do something like this:

    --===== Drop Temp Tables to make runs in SSMS easier.

    IF OBJECT_ID('tempdb..#Table1','U') IS NOT NULL DROP TABLE #Table1;

    IF OBJECT_ID('tempdb..#Table2','U') IS NOT NULL DROP TABLE #Table2;

    IF OBJECT_ID('tempdb..#Table3','U') IS NOT NULL DROP TABLE #Table3;

    IF OBJECT_ID('tempdb..#Table1_Unique','U') IS NOT NULL DROP TABLE #Table1_Unique;

    IF OBJECT_ID('tempdb..#Table2_Unique','U') IS NOT NULL DROP TABLE #Table2_Unique;

    --===== Create and populate Table1

    CREATE TABLE #Table1

    (

    t1_Name VARCHAR(30) NULL

    ,t1_Title VARCHAR(30) NULL

    ,t1_Table VARCHAR(30) NULL

    ,t1_Data VARCHAR(30) NULL

    )

    ;

    INSERT INTO #Table1

    (t1_Name , t1_Title , t1_Table , t1_Data)

    VALUES ('ABC', 'Timber', 'B14', '1,2,4,6')

    ,('ABC', 'Ramp' , 'B19', '1,3,5')

    ,('ABC', 'Timber', 'B9' , '1,2')

    ,('ABC', 'Glass' , 'H1' , NULL)

    ;

    --===== Create and populate Table1

    CREATE TABLE #Table2

    (

    t2_Name VARCHAR(30) NULL

    ,t2_Title VARCHAR(30) NULL

    ,t2_Table VARCHAR(30) NULL

    ,t2_Data VARCHAR(30) NULL

    )

    ;

    INSERT INTO #Table2

    (t2_Name , t2_Title , t2_Table , t2_Data)

    VALUES ('ABC_M', 'Timber' , 'B14' , '1,2,4,6')

    ,('XYZ' , 'Cars' , 'B17' , '1,2,4')

    ,('ABC' , 'Ramp' , 'B19' , '1,3,5')

    ,('GHY' , 'Timber' , 'B9' , NULL)

    ,('ABN' , 'Cranes' , 'B1' , '1,2,3,9')

    ,('NYU' , 'Berries', 'GP78', '1,2')

    ;

    --===== Create Table3

    CREATE TABLE #Table3

    (t1_ID INT NOT NULL

    ,t2_ID INT NOT NULL

    ,t1_Name VARCHAR(30) NULL

    ,t2_Name VARCHAR(30) NULL

    ,t1_Table VARCHAR(30) NULL

    ,t2_Table VARCHAR(30) NULL

    ,t1_Title VARCHAR(30) NULL

    ,t2_Title VARCHAR(30) NULL

    ,t1_Data VARCHAR(30) NULL

    ,t2_Data VARCHAR(30) NULL

    )

    ;

    -- First, eliminate full-row matches. Assign a Row ID to the remaining rows (which don't have an exact match) and copy out to a new table

    SELECT t1_ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)), * INTO #Table1_Unique FROM (SELECT * FROM #Table1 EXCEPT SELECT * FROM #Table2) d

    SELECT t2_ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)), * INTO #Table2_Unique FROM (SELECT * FROM #Table2 EXCEPT SELECT * FROM #Table1) d

    -- Attempt to match on Name & Title

    INSERT INTO #Table3 (t1_ID, t2_ID, t1_Name, t2_Name, t1_Table, t2_Table, t1_Title, t2_Title, t1_Data, t2_Data)

    SELECT t1_ID, t2_ID, t1_Name, t2_Name, t1_Table, t2_Table, t1_Title, t2_Title, t1_Data, t2_Data

    FROM #Table1_Unique t1

    INNER JOIN #Table2_Unique t2

    ON (t1.t1_Name LIKE t2.t2_Name + '%' OR t2.t2_Name LIKE t1.t1_Name + '%')

    AND t1.t1_Title = t2.t2_Title

    WHERE NOT EXISTS (SELECT 1 FROM #Table3 t3 WHERE t3.t1_ID = t1.t1_ID AND t3.t2_ID = t2.t2_ID)

    -- Attempt to match on Name & Table

    INSERT INTO #Table3 (t1_ID, t2_ID, t1_Name, t2_Name, t1_Table, t2_Table, t1_Title, t2_Title, t1_Data, t2_Data)

    SELECT t1_ID, t2_ID, t1_Name, t2_Name, t1_Table, t2_Table, t1_Title, t2_Title, t1_Data, t2_Data

    FROM #Table1_Unique t1

    INNER JOIN #Table2_Unique t2

    ON (t1.t1_Name LIKE t2.t2_Name + '%' OR t2.t2_Name LIKE t1.t1_Name + '%')

    AND t1.t1_Table = t2.t2_Table

    WHERE NOT EXISTS (SELECT 1 FROM #Table3 t3 WHERE t3.t1_ID = t1.t1_ID AND t3.t2_ID = t2.t2_ID)

    -- Attempt to match on Name & Data

    INSERT INTO #Table3 (t1_ID, t2_ID, t1_Name, t2_Name, t1_Table, t2_Table, t1_Title, t2_Title, t1_Data, t2_Data)

    SELECT t1_ID, t2_ID, t1_Name, t2_Name, t1_Table, t2_Table, t1_Title, t2_Title, t1_Data, t2_Data

    FROM #Table1_Unique t1

    INNER JOIN #Table2_Unique t2

    ON (t1.t1_Name LIKE t2.t2_Name + '%' OR t2.t2_Name LIKE t1.t1_Name + '%')

    AND t1.t1_Data = t2.t2_Data

    WHERE NOT EXISTS (SELECT 1 FROM #Table3 t3 WHERE t3.t1_ID = t1.t1_ID AND t3.t2_ID = t2.t2_ID)

    SELECT * FROM #Table1_Unique

    SELECT * FROM #Table2_Unique

    SELECT * FROM #Table3

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • FYI, fwiw, there's some slight differences in the sample data being used and the original sample data picture.  Here's the adjusted data:


    INSERT INTO #Table1
       (t1_Name , t1_Title , t1_Table , t1_Data)
    VALUES ('ABC', 'Timber', 'B14', '1,2,4,6,') --added comma to end of data
       ,('ABC', 'Ramp' , 'B19', '1,3,5')
       ,('GHY', 'Timber', 'B9' , '1,2') --changed name to GHY
       ,('JKL', 'Glass' , 'H1' , NULL) --changed name to JKL
    INSERT INTO #Table2
       (t2_Name , t2_Title , t2_Table , t2_Data)
    VALUES ('ABC_M', 'Timber' , 'B14' , '1,2,4,6')
       ,('XYZ' , 'Cars' , 'B17' , '1,2,3,4') --added ,3 to data
       ,('ABC' , 'Ramp' , 'B19' , '1,3,5')
       ,('GHY' , 'Timber;' , 'B9' , NULL) --semicolon at end of title
       ,('ABN' , 'Cranes' , 'B1' , '1,2;3,9') --semicolon rather than comma in data
       ,('NYU' , 'Berries', 'GP78', '1,2')

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply