Find Differences Between Two Tables

Question

Post reply

Find Differences Between Two Tables

Charles_P

Old Hand

Points: 352
More actions
December 5, 2017 at 5:48 pm

#324999
Hello,
I’m new to SQL and need some help please. This is such a mind bender for me.
I am looking for differences between Table 1 and Table 2, and then putthose differences in Table 3. Table 3 isthe final report table that shows both Table 1 and Table 2 fields so I can seethe differences across both tables.

My Code:
Select
[t1_Name],
[t2_Name] ,
[t1_Title] ,
[t2_Title] ,
[t1_Table] ,
[t2_Table] ,
[t1_Data] ,
[t2_Data]
From Table1
Left Join Table 2 on t1_Name = t2_Name
Where t1_Name <> t2_Name or
[t1_Title] <> [t2_Title] or
[t1_Table] <> [t2_Table] , or
[t1_Data] <> [t2_Data]

A Look at the Tables: (See picture attached)
- The first row of Table 1 equals the first row of Table 2 except theName. So, will be inserted into Table 3as different
- The 2nd row of Table 1 equals the 3rd row of Table 2. So an exact match isfound and not inserted into Table 3
- The 3rd row of Table 1 equals the 4th row of Table 2 except for the Title and Data in Table2. So, will be inserted into Table 3 asdifferent
- The 4th row of Table 1 is not found in Table 2, So, will be inserted intoTable 3 as different
- Table 5th and 6th rows of Table 2 are not in Table 1 . So will be insertedinto Table 3 as different
The Problem:
1. Neither Table 1 or Table 2 have primary keys. So, I have had to look at the field names tofind differences.
2. The table 1 is not in the same order as Table 2
3. Table 2 has more rows that Table 1
4. if there is data in Table 2 that is not in Table 1, need to show Table 1fields as "null"
5. if there is data in Table 1 that is not in Table 2, need to show Table 2fields as "null"
6. There can be multiple different columns that are different(for instance theTitle and the Data can be different, but the Name and the Table are the same.
  
  Create Table1(
  [t1_Name] [varchar] (30) NULL,
  [t1_Title] [varchar] (30) NULL,
  [t1_Table] [varchar] (30) NULL,
  [t1_Data] [varchar] (30) NULL)
  Insert [Table1]
  ([t1_Name] , [t1_Title] , [t1_Table] , [t1_Data], ) Values ( N'ABC', N'Timber',N'B14', N '1,2,4,6')
  ([t1_Name] , [t1_Title] , [t1_Table] , [t1_Data], ) Values ( N'ABC', N'Ramp', N'B19',N '1,3,5')
  ([t1_Name] , [t1_Title] , [t1_Table] , [t1_Data], ) Values ( N'ABC', N'Timber',N'B9', N '1,2')
  ([t1_Name] , [t1_Title] , [t1_Table] , [t1_Data], ) Values ( N'ABC', N'Glass', N'H1',NULL)
  
  Create Table2(
  [t1_Name] [varchar] (30) NULL,
  [t1_Title] [varchar] (30) NULL,
  [t1_Table] [varchar] (30) NULL,
  [t1_Data] [varchar] (30) NULL)
  Insert [Table2]
  ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'ABC_M', N'Timber',N'B14', N '1,2,4,6')
  ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'XYZ', N'Cars', N'B17', N '1,2,4')
  ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'ABC', N'Ramp', N'B19',N '1,3,5')
  ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'GHY', N'Timber', N'B9', NULL)
  ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'ABN', N'Cranes', N'B1',N '1,2,3,9')
  ([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'NYU', N'Berries', N'GP78', N '1,2')
  
  Create Table3(
  [t1_Name] [varchar] (30) NULL,
  [t2_Name] [varchar] (30) NULL,
  [t1_Table] [varchar] (30) NULL,
  [t2_Table] [varchar] (30) NULL,
  [t1_Title] [varchar] (30) NULL,
  [t2_Title] [varchar] (30) NULL,
  [t1_Data] [varchar] (30) NULL,
  [t2_Data] [varchar] (30) NULL)
Thanks for your help!
Charles P.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Charles_P Old Hand Points: 352 More actions · Answer 1

Correction on Table2:

Create Table2(

[t2_Name] [varchar] (30) NULL,

[t2_Title] [varchar] (30) NULL,

[t2_Table] [varchar] (30) NULL,

[t2_Data] [varchar] (30) NULL)

Insert [Table2]

([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'ABC_M', N'Timber',N'B14', N '1,2,4,6')

([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'XYZ', N'Cars', N'B17', N '1,2,4')

([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'ABC', N'Ramp', N'B19',N '1,3,5')

([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'GHY', N'Timber', N'B9', NULL)

([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'ABN', N'Cranes', N'B1',N '1,2,3,9')

([t2_Name] , [t2_Title] , [t2_Table] , [t2_Data], ) Values ( N'NYU', N'Berries', N'GP78', N '1,2')

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 2

There are two major problems with all of this...

The first problem is that, although you tried, you didn't try to run your the test data code you posted because it has 5 different classes of errors in it. Here's the redaction of your code that works to entice other folks to give this problem a shot.

--===== Drop Temp Tables to make runs in SSMS easier. IF OBJECT_ID('tempdb..#Table1','U') IS NOT NULL DROP TABLE #Table1; IF OBJECT_ID('tempdb..#Table2','U') IS NOT NULL DROP TABLE #Table2; IF OBJECT_ID('tempdb..#Table3','U') IS NOT NULL DROP TABLE #Table3; ; --===== Create and populate Table1 CREATE TABLE #Table1 ( t1_Name VARCHAR(30) NULL ,t1_Title VARCHAR(30) NULL ,t1_Table VARCHAR(30) NULL ,t1_Data VARCHAR(30) NULL ) ; INSERT INTO #Table1 (t1_Name , t1_Title , t1_Table , t1_Data) VALUES ('ABC', 'Timber', 'B14', '1,2,4,6') ,('ABC', 'Ramp' , 'B19', '1,3,5') ,('ABC', 'Timber', 'B9' , '1,2') ,('ABC', 'Glass' , 'H1' , NULL) ; --===== Create and populate Table1 CREATE TABLE #Table2 ( t2_Name VARCHAR(30) NULL ,t2_Title VARCHAR(30) NULL ,t2_Table VARCHAR(30) NULL ,t2_Data VARCHAR(30) NULL ) ; INSERT INTO #Table2 (t2_Name , t2_Title , t2_Table , t2_Data) VALUES ('ABC_M', 'Timber' , 'B14' , '1,2,4,6') ,('XYZ' , 'Cars' , 'B17' , '1,2,4') ,('ABC' , 'Ramp' , 'B19' , '1,3,5') ,('GHY' , 'Timber' , 'B9' , NULL) ,('ABN' , 'Cranes' , 'B1' , '1,2,3,9') ,('NYU' , 'Berries', 'GP78', '1,2') ; --===== Create Table3 CREATE TABLE #Table3 ( t1_Name VARCHAR(30) NULL ,t2_Name VARCHAR(30) NULL ,t1_Table VARCHAR(30) NULL ,t2_Table VARCHAR(30) NULL ,t1_Title VARCHAR(30) NULL ,t2_Title VARCHAR(30) NULL ,t1_Data VARCHAR(30) NULL ,t2_Data VARCHAR(30) NULL ) ;

The second problem is that you want to compare all of the rows of the first table with the second with no key to identify which rows should be compared between the two tables. That will make a Cartesian Product on steroids the output of which will be basically useless because of all the possibilities that will be returned.

For example, I would imagine from your description that you would want the following row from Table1...

('ABC', 'Timber', 'B14', '1,2,4,6')

... to match up with the following row from Table2...

('ABC_M', 'Timber' , 'B14' , '1,2,4,6')

... simply because the 2nd, 3rd, and 4th columns matched up and then show the difference as being only the 1st column and THEN depict that in Table3.

In theory, you should also include just 2 columns matching as well as just 1 column matching and you're going to end up with quite the hairball both for processing resources and the one that human that has to decipher the output will cough up while trying to figure it all out. Because of all the duplication that will occur for trinary, binary, and singleton matches, you'll also need to come up with a "weighting" for the best matches and then compare those to a logical copy of the same rows possibly generating even more duplication.

And that's just for 4 columns. How many columns and rows does this data have and what would your definition of a minimum satisfactory match actually be? What, for example, would you expect the output for your sample data to be especially when it comes to the "Timber" rows? I can certainly guess at that by looking at the data with the Mark-1, Mod-1 eyeball but that would only be because I would mentally assume what a minimum match would be (match on Title and Table, in this case). If the minimum match is "any two columns", that will require 6 Cartesian Products between the two tables but only if they really have only 4 columns each.

To summarize my questions...
1. What is considered (by number of column matches) to be the minimum requirement for a match?
2. How many rows are in each of your real tables?
3. How many columns are in each of your real tables?
4. What is the name of the person that provided you with such thoughtless data as to have no key? I want to make sure that I never, ever work with them or for them. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 3

BTW... the formula for the number of Cartesian Products required for "N" columns of "any two columns match" is ((N-1)^2 + (N-1))/2. That means that just 10 columns would require 45 Cartesian products.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Charles_P Old Hand Points: 352 More actions · Answer 4

Hello Jeff,

This is my first time out of the gate with SQL Server Central. Thanks for the tips and advice.

You are correct: I would like the following row from Table1...

('ABC', 'Timber', 'B14', '1,2,4,6')

... to match up with the following row from Table2...

('ABC_M', 'Timber' , 'B14' , '1,2,4,6')

The minimum number of columns to match is two(2) - Name and then Title.

Next columns to match would be Name and then Table. Next columns to match would be Name and then title; and then Name and Data. The name will always be one of the columns to match.

Table 1 has 1584 rows and Table 2 has 1572 rows. There are 12 columns in Table 1 and Table 2 but I am only looking for the differences in the four columns (name, table, title, and data). The other columns are not common enough across tables.

The person that built these tables are long gone. Now I'm stuck with creating this report which is the result in Table 3.

I did consider sql rowcount, but did't quite get there yet. I also looked at substring to see if the characters matched. But some of the field names are short and some are long like the Data can be 180 to 200 characters or just 12 to 20 characters long. So substring did not work for me.

Thanks for your advice and suggestions to get me going in the right direction.

Charles P.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

I'd do something like this:

--===== Drop Temp Tables to make runs in SSMS easier.

IF OBJECT_ID('tempdb..#Table1','U') IS NOT NULL DROP TABLE #Table1;

IF OBJECT_ID('tempdb..#Table2','U') IS NOT NULL DROP TABLE #Table2;

IF OBJECT_ID('tempdb..#Table3','U') IS NOT NULL DROP TABLE #Table3;

IF OBJECT_ID('tempdb..#Table1_Unique','U') IS NOT NULL DROP TABLE #Table1_Unique;

IF OBJECT_ID('tempdb..#Table2_Unique','U') IS NOT NULL DROP TABLE #Table2_Unique;

--===== Create and populate Table1

CREATE TABLE #Table1

(

t1_Name VARCHAR(30) NULL

,t1_Title VARCHAR(30) NULL

,t1_Table VARCHAR(30) NULL

,t1_Data VARCHAR(30) NULL

)

;

INSERT INTO #Table1

(t1_Name , t1_Title , t1_Table , t1_Data)

VALUES ('ABC', 'Timber', 'B14', '1,2,4,6')

,('ABC', 'Ramp' , 'B19', '1,3,5')

,('ABC', 'Timber', 'B9' , '1,2')

,('ABC', 'Glass' , 'H1' , NULL)

;

--===== Create and populate Table1

CREATE TABLE #Table2

(

t2_Name VARCHAR(30) NULL

,t2_Title VARCHAR(30) NULL

,t2_Table VARCHAR(30) NULL

,t2_Data VARCHAR(30) NULL

)

;

INSERT INTO #Table2

(t2_Name , t2_Title , t2_Table , t2_Data)

VALUES ('ABC_M', 'Timber' , 'B14' , '1,2,4,6')

,('XYZ' , 'Cars' , 'B17' , '1,2,4')

,('ABC' , 'Ramp' , 'B19' , '1,3,5')

,('GHY' , 'Timber' , 'B9' , NULL)

,('ABN' , 'Cranes' , 'B1' , '1,2,3,9')

,('NYU' , 'Berries', 'GP78', '1,2')

;

--===== Create Table3

CREATE TABLE #Table3

(t1_ID INT NOT NULL

,t2_ID INT NOT NULL

,t1_Name VARCHAR(30) NULL

,t2_Name VARCHAR(30) NULL

,t1_Table VARCHAR(30) NULL

,t2_Table VARCHAR(30) NULL

,t1_Title VARCHAR(30) NULL

,t2_Title VARCHAR(30) NULL

,t1_Data VARCHAR(30) NULL

,t2_Data VARCHAR(30) NULL

)

;

-- First, eliminate full-row matches. Assign a Row ID to the remaining rows (which don't have an exact match) and copy out to a new table

SELECT t1_ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)), * INTO #Table1_Unique FROM (SELECT * FROM #Table1 EXCEPT SELECT * FROM #Table2) d

SELECT t2_ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)), * INTO #Table2_Unique FROM (SELECT * FROM #Table2 EXCEPT SELECT * FROM #Table1) d

-- Attempt to match on Name & Title

INSERT INTO #Table3 (t1_ID, t2_ID, t1_Name, t2_Name, t1_Table, t2_Table, t1_Title, t2_Title, t1_Data, t2_Data)

SELECT t1_ID, t2_ID, t1_Name, t2_Name, t1_Table, t2_Table, t1_Title, t2_Title, t1_Data, t2_Data

FROM #Table1_Unique t1

INNER JOIN #Table2_Unique t2

ON (t1.t1_Name LIKE t2.t2_Name + '%' OR t2.t2_Name LIKE t1.t1_Name + '%')

AND t1.t1_Title = t2.t2_Title

WHERE NOT EXISTS (SELECT 1 FROM #Table3 t3 WHERE t3.t1_ID = t1.t1_ID AND t3.t2_ID = t2.t2_ID)

-- Attempt to match on Name & Table

INSERT INTO #Table3 (t1_ID, t2_ID, t1_Name, t2_Name, t1_Table, t2_Table, t1_Title, t2_Title, t1_Data, t2_Data)

SELECT t1_ID, t2_ID, t1_Name, t2_Name, t1_Table, t2_Table, t1_Title, t2_Title, t1_Data, t2_Data

FROM #Table1_Unique t1

INNER JOIN #Table2_Unique t2

ON (t1.t1_Name LIKE t2.t2_Name + '%' OR t2.t2_Name LIKE t1.t1_Name + '%')

AND t1.t1_Table = t2.t2_Table

WHERE NOT EXISTS (SELECT 1 FROM #Table3 t3 WHERE t3.t1_ID = t1.t1_ID AND t3.t2_ID = t2.t2_ID)

-- Attempt to match on Name & Data

INSERT INTO #Table3 (t1_ID, t2_ID, t1_Name, t2_Name, t1_Table, t2_Table, t1_Title, t2_Title, t1_Data, t2_Data)

SELECT t1_ID, t2_ID, t1_Name, t2_Name, t1_Table, t2_Table, t1_Title, t2_Title, t1_Data, t2_Data

FROM #Table1_Unique t1

INNER JOIN #Table2_Unique t2

ON (t1.t1_Name LIKE t2.t2_Name + '%' OR t2.t2_Name LIKE t1.t1_Name + '%')

AND t1.t1_Data = t2.t2_Data

WHERE NOT EXISTS (SELECT 1 FROM #Table3 t3 WHERE t3.t1_ID = t1.t1_ID AND t3.t2_ID = t2.t2_ID)

SELECT * FROM #Table1_Unique

SELECT * FROM #Table2_Unique

SELECT * FROM #Table3

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ScottPletcher SSC Guru Points: 100951 More actions · Answer 6

FYI, fwiw, there's some slight differences in the sample data being used and the original sample data picture. Here's the adjusted data:

INSERT INTO #Table1 (t1_Name , t1_Title , t1_Table , t1_Data) VALUES ('ABC', 'Timber', 'B14', '1,2,4,6,') --added comma to end of data ,('ABC', 'Ramp' , 'B19', '1,3,5') ,('GHY', 'Timber', 'B9' , '1,2') --changed name to GHY ,('JKL', 'Glass' , 'H1' , NULL) --changed name to JKL INSERT INTO #Table2 (t2_Name , t2_Title , t2_Table , t2_Data) VALUES ('ABC_M', 'Timber' , 'B14' , '1,2,4,6') ,('XYZ' , 'Cars' , 'B17' , '1,2,3,4') --added ,3 to data ,('ABC' , 'Ramp' , 'B19' , '1,3,5') ,('GHY' , 'Timber;' , 'B9' , NULL) --semicolon at end of title ,('ABN' , 'Cranes' , 'B1' , '1,2;3,9') --semicolon rather than comma in data ,('NYU' , 'Berries', 'GP78', '1,2')

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.