RE: Anti Join question – SQLServerCentral

SSC Guru

Points: 61136

December 8, 2016 at 2:36 pm

#1916916

sgmunson (12/8/2016)
Alan.B (12/8/2016)
Ok, say I have two tables table1 and table2 and I want to find records from table1 that don't exist in table2 or vise-versa. I could use EXCEPT like this:
-- sample data
DECLARE @table1 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));
DECLARE @table2 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));
INSERT @table1(content) VALUES (12), (15), (20);
INSERT @table2(content) SELECT content FROM @table1;
--DELETE TOP(1) FROM @table1;
--DELETE TOP(1) FROM @table2;
--solution
(
SELECT id, content FROM @table1
EXCEPT
SELECT id, content FROM @table2
)
UNION ALL
(
SELECT id, content FROM @table2
EXCEPT
SELECT id, content FROM @table1
);
I know there is a more efficient way to do this where I don't need to make two trips to each table. I saw someone here (SQLServerCentral) do this differently (I'm 99% sure it was Scott Peltcher). Anyone have a better way to do this?
Just wondering if you can determine that if the PK value in each table is identical, for a given pair of records (1 from each table), that you can then guarantee that both records are identical? The reason I ask is that if not all fields are identical for a given primary key value, then this query might not be entirely what you actually want, as it will supply values for records that don't have a primary key field match. If you could guarantee that a record in one table is always identical to one in the other table with the same PK value, then this becomes a lot easier and probably a lot better performance-wise. Think something like this:
WITH SOURCE_DATA AS (
SELECT DISTINCT COALESCE(T1.PK_FLD, T2.PK_FLD) AS PK_FIELD
FROM TABLE_1 AS T1
FULL OUTER JOIN TABLE_2 AS T2
ON T1.PK_FLD = T2.PK_FLD
WHERE T1.PK_FLD IS NULL
OR T2.PK_FLD IS NULL
)
SELECT D.PK_FIELD,
COALESCE(T3.ADDL_FLD1, T4.ADDL_FLD1) AS ADDL_FLD1,
...
FROM SOURCE_DATA AS D
LEFT OUTER JOIN TABLE_1 AS T3
ON D.PK_FIELD = T3.PK_FLD
LEFT OUTER JOIN TABLE_2 AS T4
ON D.PK_FIELD = T4.PK_FLD
ORDER BY D.PK_FIELD;
If you do need to see records that are different, AND you need to know the difference between EXISTS but is different, and NOT EXISTS, then you may only need to add some logic to what you already have. Just figured another idea to look at might not hurt...
EDIT: realized my thinking was way ahead of my typing and corrected a rather obvious dumb statement.

Thank you sir - interesting.

I only included the primary key to create a clustered index on my table variable but you gave me something to think about. Thanks again!

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001