Anti Join question

  • Ok, say I have two tables table1 and table2 and I want to find records from table1 that don't exist in table2 or vise-versa. I could use EXCEPT like this:

    -- sample data

    DECLARE @table1 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));

    DECLARE @table2 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));

    INSERT @table1(content) VALUES (12), (15), (20);

    INSERT @table2(content) SELECT content FROM @table1;

    --DELETE TOP(1) FROM @table1;

    --DELETE TOP(1) FROM @table2;

    --solution

    (

    SELECT id, content FROM @table1

    EXCEPT

    SELECT id, content FROM @table2

    )

    UNION ALL

    (

    SELECT id, content FROM @table2

    EXCEPT

    SELECT id, content FROM @table1

    );

    I know there is a more efficient way to do this where I don't need to make two trips to each table. I saw someone here (SQLServerCentral) do this differently (I'm 99% sure it was Scott Peltcher). Anyone have a better way to do this?

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Would a FULL OUTER JOIN help?

    I changed your sample data to get some results.

    -- sample data

    DECLARE @table1 TABLE (id int identity, content int NOT NULL, primary key(id, content));

    DECLARE @table2 TABLE (id int identity, content int NOT NULL, primary key(id, content));

    INSERT @table1(content) VALUES (12), (15), (20);

    INSERT @table2(content) VALUES (13), (15), (20);

    --DELETE TOP(1) FROM @table1;

    --DELETE TOP(1) FROM @table2;

    --solution

    (

    SELECT id, content FROM @table1

    EXCEPT

    SELECT id, content FROM @table2

    )

    UNION ALL

    (

    SELECT id, content FROM @table2

    EXCEPT

    SELECT id, content FROM @table1

    );

    SELECT ISNULL( t1.id, t2.id) AS id,

    ISNULL( t1.content, t2.content) AS content

    FROM @table1 t1

    FULL OUTER

    JOIN @table2 t2 ON t1.id = t2.id AND t1.content = t2.content

    WHERE t1.id IS NULL

    OR t2.id IS NULL;

    Or maybe using HAVING:

    SELECT *

    FROM (

    SELECT id, content FROM @table1

    UNION ALL

    SELECT id, content FROM @table2

    ) x

    GROUP BY id, content

    HAVING COUNT(*) = 1;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • That FULL JOIN solution was what I'd seen before - The HAVING solution works just as well. Excellent work, thanks Luis!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (12/8/2016)


    Ok, say I have two tables table1 and table2 and I want to find records from table1 that don't exist in table2 or vise-versa. I could use EXCEPT like this:

    -- sample data

    DECLARE @table1 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));

    DECLARE @table2 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));

    INSERT @table1(content) VALUES (12), (15), (20);

    INSERT @table2(content) SELECT content FROM @table1;

    --DELETE TOP(1) FROM @table1;

    --DELETE TOP(1) FROM @table2;

    --solution

    (

    SELECT id, content FROM @table1

    EXCEPT

    SELECT id, content FROM @table2

    )

    UNION ALL

    (

    SELECT id, content FROM @table2

    EXCEPT

    SELECT id, content FROM @table1

    );

    I know there is a more efficient way to do this where I don't need to make two trips to each table. I saw someone here (SQLServerCentral) do this differently (I'm 99% sure it was Scott Peltcher). Anyone have a better way to do this?

    Just wondering if you can determine that if the PK value in each table is identical, for a given pair of records (1 from each table), that you can then guarantee that both records are identical? The reason I ask is that if not all fields are identical for a given primary key value, then this query might not be entirely what you actually want, as it will supply values for records that don't have a primary key field match. If you could guarantee that a record in one table is always identical to one in the other table with the same PK value, then this becomes a lot easier and probably a lot better performance-wise. Think something like this:

    WITH SOURCE_DATA AS (

    SELECT DISTINCT COALESCE(T1.PK_FLD, T2.PK_FLD) AS PK_FIELD

    FROM TABLE_1 AS T1

    FULL OUTER JOIN TABLE_2 AS T2

    ON T1.PK_FLD = T2.PK_FLD

    WHERE T1.PK_FLD IS NULL

    OR T2.PK_FLD IS NULL

    )

    SELECT D.PK_FIELD,

    COALESCE(T3.ADDL_FLD1, T4.ADDL_FLD1) AS ADDL_FLD1,

    ...

    FROM SOURCE_DATA AS D

    LEFT OUTER JOIN TABLE_1 AS T3

    ON D.PK_FIELD = T3.PK_FLD

    LEFT OUTER JOIN TABLE_2 AS T4

    ON D.PK_FIELD = T4.PK_FLD

    ORDER BY D.PK_FIELD;

    If you do need to see records that are different, AND you need to know the difference between EXISTS but is different, and NOT EXISTS, then you may only need to add some logic to what you already have. Just figured another idea to look at might not hurt...

    EDIT: realized my thinking was way ahead of my typing and corrected a rather obvious dumb statement.

  • sgmunson (12/8/2016)


    Alan.B (12/8/2016)


    Ok, say I have two tables table1 and table2 and I want to find records from table1 that don't exist in table2 or vise-versa. I could use EXCEPT like this:

    -- sample data

    DECLARE @table1 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));

    DECLARE @table2 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));

    INSERT @table1(content) VALUES (12), (15), (20);

    INSERT @table2(content) SELECT content FROM @table1;

    --DELETE TOP(1) FROM @table1;

    --DELETE TOP(1) FROM @table2;

    --solution

    (

    SELECT id, content FROM @table1

    EXCEPT

    SELECT id, content FROM @table2

    )

    UNION ALL

    (

    SELECT id, content FROM @table2

    EXCEPT

    SELECT id, content FROM @table1

    );

    I know there is a more efficient way to do this where I don't need to make two trips to each table. I saw someone here (SQLServerCentral) do this differently (I'm 99% sure it was Scott Peltcher). Anyone have a better way to do this?

    Just wondering if you can determine that if the PK value in each table is identical, for a given pair of records (1 from each table), that you can then guarantee that both records are identical? The reason I ask is that if not all fields are identical for a given primary key value, then this query might not be entirely what you actually want, as it will supply values for records that don't have a primary key field match. If you could guarantee that a record in one table is always identical to one in the other table with the same PK value, then this becomes a lot easier and probably a lot better performance-wise. Think something like this:

    WITH SOURCE_DATA AS (

    SELECT DISTINCT COALESCE(T1.PK_FLD, T2.PK_FLD) AS PK_FIELD

    FROM TABLE_1 AS T1

    FULL OUTER JOIN TABLE_2 AS T2

    ON T1.PK_FLD = T2.PK_FLD

    WHERE T1.PK_FLD IS NULL

    OR T2.PK_FLD IS NULL

    )

    SELECT D.PK_FIELD,

    COALESCE(T3.ADDL_FLD1, T4.ADDL_FLD1) AS ADDL_FLD1,

    ...

    FROM SOURCE_DATA AS D

    LEFT OUTER JOIN TABLE_1 AS T3

    ON D.PK_FIELD = T3.PK_FLD

    LEFT OUTER JOIN TABLE_2 AS T4

    ON D.PK_FIELD = T4.PK_FLD

    ORDER BY D.PK_FIELD;

    If you do need to see records that are different, AND you need to know the difference between EXISTS but is different, and NOT EXISTS, then you may only need to add some logic to what you already have. Just figured another idea to look at might not hurt...

    EDIT: realized my thinking was way ahead of my typing and corrected a rather obvious dumb statement.

    Thank you sir - interesting.

    I only included the primary key to create a clustered index on my table variable but you gave me something to think about. Thanks again!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (12/8/2016)


    sgmunson (12/8/2016)


    Alan.B (12/8/2016)


    Ok, say I have two tables table1 and table2 and I want to find records from table1 that don't exist in table2 or vise-versa. I could use EXCEPT like this:

    -- sample data

    DECLARE @table1 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));

    DECLARE @table2 TABLE (SomeId int identity, content int NOT NULL, primary key(id, content));

    INSERT @table1(content) VALUES (12), (15), (20);

    INSERT @table2(content) SELECT content FROM @table1;

    --DELETE TOP(1) FROM @table1;

    --DELETE TOP(1) FROM @table2;

    --solution

    (

    SELECT id, content FROM @table1

    EXCEPT

    SELECT id, content FROM @table2

    )

    UNION ALL

    (

    SELECT id, content FROM @table2

    EXCEPT

    SELECT id, content FROM @table1

    );

    I know there is a more efficient way to do this where I don't need to make two trips to each table. I saw someone here (SQLServerCentral) do this differently (I'm 99% sure it was Scott Peltcher). Anyone have a better way to do this?

    Just wondering if you can determine that if the PK value in each table is identical, for a given pair of records (1 from each table), that you can then guarantee that both records are identical? The reason I ask is that if not all fields are identical for a given primary key value, then this query might not be entirely what you actually want, as it will supply values for records that don't have a primary key field match. If you could guarantee that a record in one table is always identical to one in the other table with the same PK value, then this becomes a lot easier and probably a lot better performance-wise. Think something like this:

    WITH SOURCE_DATA AS (

    SELECT DISTINCT COALESCE(T1.PK_FLD, T2.PK_FLD) AS PK_FIELD

    FROM TABLE_1 AS T1

    FULL OUTER JOIN TABLE_2 AS T2

    ON T1.PK_FLD = T2.PK_FLD

    WHERE T1.PK_FLD IS NULL

    OR T2.PK_FLD IS NULL

    )

    SELECT D.PK_FIELD,

    COALESCE(T3.ADDL_FLD1, T4.ADDL_FLD1) AS ADDL_FLD1,

    ...

    FROM SOURCE_DATA AS D

    LEFT OUTER JOIN TABLE_1 AS T3

    ON D.PK_FIELD = T3.PK_FLD

    LEFT OUTER JOIN TABLE_2 AS T4

    ON D.PK_FIELD = T4.PK_FLD

    ORDER BY D.PK_FIELD;

    If you do need to see records that are different, AND you need to know the difference between EXISTS but is different, and NOT EXISTS, then you may only need to add some logic to what you already have. Just figured another idea to look at might not hurt...

    EDIT: realized my thinking was way ahead of my typing and corrected a rather obvious dumb statement.

    Thank you sir - interesting.

    I only included the primary key to create a clustered index on my table variable but you gave me something to think about. Thanks again!

    Glad I could help. Enjoy!

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply