Anti Join question

Question

Post reply

Anti Join question

Alan Burstein

SSC Guru

Points: 61152
More actions
December 8, 2016 at 1:11 pm

#323944

Ok, say I have two tables table1 and table2 and I want to find records from table1 that don't exist in table2 or vise-versa. I could use EXCEPT like this:
-- sample data
DECLARE @table1 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));
DECLARE @table2 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));
INSERT @table1(content) VALUES (12), (15), (20);
INSERT @table2(content) SELECT content FROM @table1;
--DELETE TOP(1) FROM @table1;
--DELETE TOP(1) FROM @table2;
--solution
(
SELECT id, content FROM @table1
EXCEPT
SELECT id, content FROM @table2
)
UNION ALL
(
SELECT id, content FROM @table2
EXCEPT
SELECT id, content FROM @table1
);
I know there is a more efficient way to do this where I don't need to make two trips to each table. I saw someone here (SQLServerCentral) do this differently (I'm 99% sure it was Scott Peltcher). Anyone have a better way to do this?
"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."
-- Itzik Ben-Gan 2001

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183706 More actions · Answer 1

Would a FULL OUTER JOIN help?

I changed your sample data to get some results.

-- sample data

DECLARE @table1 TABLE (id int identity, content int NOT NULL, primary key(id, content));

DECLARE @table2 TABLE (id int identity, content int NOT NULL, primary key(id, content));

INSERT @table1(content) VALUES (12), (15), (20);

INSERT @table2(content) VALUES (13), (15), (20);

--DELETE TOP(1) FROM @table1;

--DELETE TOP(1) FROM @table2;

--solution

(

SELECT id, content FROM @table1

EXCEPT

SELECT id, content FROM @table2

)

UNION ALL

(

SELECT id, content FROM @table2

EXCEPT

SELECT id, content FROM @table1

);

SELECT ISNULL( t1.id, t2.id) AS id,

ISNULL( t1.content, t2.content) AS content

FROM @table1 t1

FULL OUTER

JOIN @table2 t2 ON t1.id = t2.id AND t1.content = t2.content

WHERE t1.id IS NULL

OR t2.id IS NULL;

Or maybe using HAVING:

SELECT *

FROM (

SELECT id, content FROM @table1

UNION ALL

SELECT id, content FROM @table2

) x

GROUP BY id, content

HAVING COUNT(*) = 1;

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Alan Burstein SSC Guru Points: 61152 More actions · Answer 2

That FULL JOIN solution was what I'd seen before - The HAVING solution works just as well. Excellent work, thanks Luis!

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

sgmunson SSC Guru Points: 110651 More actions · Answer 3

Alan.B (12/8/2016)
Ok, say I have two tables table1 and table2 and I want to find records from table1 that don't exist in table2 or vise-versa. I could use EXCEPT like this:
-- sample data
DECLARE @table1 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));
DECLARE @table2 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));
INSERT @table1(content) VALUES (12), (15), (20);
INSERT @table2(content) SELECT content FROM @table1;
--DELETE TOP(1) FROM @table1;
--DELETE TOP(1) FROM @table2;
--solution
(
SELECT id, content FROM @table1
EXCEPT
SELECT id, content FROM @table2
)
UNION ALL
(
SELECT id, content FROM @table2
EXCEPT
SELECT id, content FROM @table1
);
I know there is a more efficient way to do this where I don't need to make two trips to each table. I saw someone here (SQLServerCentral) do this differently (I'm 99% sure it was Scott Peltcher). Anyone have a better way to do this?

Just wondering if you can determine that if the PK value in each table is identical, for a given pair of records (1 from each table), that you can then guarantee that both records are identical? The reason I ask is that if not all fields are identical for a given primary key value, then this query might not be entirely what you actually want, as it will supply values for records that don't have a primary key field match. If you could guarantee that a record in one table is always identical to one in the other table with the same PK value, then this becomes a lot easier and probably a lot better performance-wise. Think something like this:

WITH SOURCE_DATA AS (

SELECT DISTINCT COALESCE(T1.PK_FLD, T2.PK_FLD) AS PK_FIELD

FROM TABLE_1 AS T1

FULL OUTER JOIN TABLE_2 AS T2

ON T1.PK_FLD = T2.PK_FLD

WHERE T1.PK_FLD IS NULL

OR T2.PK_FLD IS NULL

)

SELECT D.PK_FIELD,

COALESCE(T3.ADDL_FLD1, T4.ADDL_FLD1) AS ADDL_FLD1,

...

FROM SOURCE_DATA AS D

LEFT OUTER JOIN TABLE_1 AS T3

ON D.PK_FIELD = T3.PK_FLD

LEFT OUTER JOIN TABLE_2 AS T4

ON D.PK_FIELD = T4.PK_FLD

ORDER BY D.PK_FIELD;

If you do need to see records that are different, AND you need to know the difference between EXISTS but is different, and NOT EXISTS, then you may only need to add some logic to what you already have. Just figured another idea to look at might not hurt...

EDIT: realized my thinking was way ahead of my typing and corrected a rather obvious dumb statement.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Alan Burstein SSC Guru Points: 61152 More actions · Answer 4

sgmunson (12/8/2016)
Alan.B (12/8/2016)
Ok, say I have two tables table1 and table2 and I want to find records from table1 that don't exist in table2 or vise-versa. I could use EXCEPT like this:
-- sample data
DECLARE @table1 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));
DECLARE @table2 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));
INSERT @table1(content) VALUES (12), (15), (20);
INSERT @table2(content) SELECT content FROM @table1;
--DELETE TOP(1) FROM @table1;
--DELETE TOP(1) FROM @table2;
--solution
(
SELECT id, content FROM @table1
EXCEPT
SELECT id, content FROM @table2
)
UNION ALL
(
SELECT id, content FROM @table2
EXCEPT
SELECT id, content FROM @table1
);
I know there is a more efficient way to do this where I don't need to make two trips to each table. I saw someone here (SQLServerCentral) do this differently (I'm 99% sure it was Scott Peltcher). Anyone have a better way to do this?
Just wondering if you can determine that if the PK value in each table is identical, for a given pair of records (1 from each table), that you can then guarantee that both records are identical? The reason I ask is that if not all fields are identical for a given primary key value, then this query might not be entirely what you actually want, as it will supply values for records that don't have a primary key field match. If you could guarantee that a record in one table is always identical to one in the other table with the same PK value, then this becomes a lot easier and probably a lot better performance-wise. Think something like this:
WITH SOURCE_DATA AS (
SELECT DISTINCT COALESCE(T1.PK_FLD, T2.PK_FLD) AS PK_FIELD
FROM TABLE_1 AS T1
FULL OUTER JOIN TABLE_2 AS T2
ON T1.PK_FLD = T2.PK_FLD
WHERE T1.PK_FLD IS NULL
OR T2.PK_FLD IS NULL
)
SELECT D.PK_FIELD,
COALESCE(T3.ADDL_FLD1, T4.ADDL_FLD1) AS ADDL_FLD1,
...
FROM SOURCE_DATA AS D
LEFT OUTER JOIN TABLE_1 AS T3
ON D.PK_FIELD = T3.PK_FLD
LEFT OUTER JOIN TABLE_2 AS T4
ON D.PK_FIELD = T4.PK_FLD
ORDER BY D.PK_FIELD;
If you do need to see records that are different, AND you need to know the difference between EXISTS but is different, and NOT EXISTS, then you may only need to add some logic to what you already have. Just figured another idea to look at might not hurt...
EDIT: realized my thinking was way ahead of my typing and corrected a rather obvious dumb statement.

Thank you sir - interesting.

I only included the primary key to create a clustered index on my table variable but you gave me something to think about. Thanks again!

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

sgmunson SSC Guru Points: 110651 More actions · Answer 5

Alan.B (12/8/2016)
sgmunson (12/8/2016)
Alan.B (12/8/2016)
Ok, say I have two tables table1 and table2 and I want to find records from table1 that don't exist in table2 or vise-versa. I could use EXCEPT like this:
-- sample data
DECLARE @table1 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));
DECLARE @table2 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));
INSERT @table1(content) VALUES (12), (15), (20);
INSERT @table2(content) SELECT content FROM @table1;
--DELETE TOP(1) FROM @table1;
--DELETE TOP(1) FROM @table2;
--solution
(
SELECT id, content FROM @table1
EXCEPT
SELECT id, content FROM @table2
)
UNION ALL
(
SELECT id, content FROM @table2
EXCEPT
SELECT id, content FROM @table1
);
I know there is a more efficient way to do this where I don't need to make two trips to each table. I saw someone here (SQLServerCentral) do this differently (I'm 99% sure it was Scott Peltcher). Anyone have a better way to do this?
Just wondering if you can determine that if the PK value in each table is identical, for a given pair of records (1 from each table), that you can then guarantee that both records are identical? The reason I ask is that if not all fields are identical for a given primary key value, then this query might not be entirely what you actually want, as it will supply values for records that don't have a primary key field match. If you could guarantee that a record in one table is always identical to one in the other table with the same PK value, then this becomes a lot easier and probably a lot better performance-wise. Think something like this:
WITH SOURCE_DATA AS (
SELECT DISTINCT COALESCE(T1.PK_FLD, T2.PK_FLD) AS PK_FIELD
FROM TABLE_1 AS T1
FULL OUTER JOIN TABLE_2 AS T2
ON T1.PK_FLD = T2.PK_FLD
WHERE T1.PK_FLD IS NULL
OR T2.PK_FLD IS NULL
)
SELECT D.PK_FIELD,
COALESCE(T3.ADDL_FLD1, T4.ADDL_FLD1) AS ADDL_FLD1,
...
FROM SOURCE_DATA AS D
LEFT OUTER JOIN TABLE_1 AS T3
ON D.PK_FIELD = T3.PK_FLD
LEFT OUTER JOIN TABLE_2 AS T4
ON D.PK_FIELD = T4.PK_FLD
ORDER BY D.PK_FIELD;
If you do need to see records that are different, AND you need to know the difference between EXISTS but is different, and NOT EXISTS, then you may only need to add some logic to what you already have. Just figured another idea to look at might not hurt...
EDIT: realized my thinking was way ahead of my typing and corrected a rather obvious dumb statement.
Thank you sir - interesting.
I only included the primary key to create a clustered index on my table variable but you gave me something to think about. Thanks again!

Glad I could help. Enjoy!

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)