Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Intersect/Except/Join Comparison Question


Intersect/Except/Join Comparison Question

Author
Message
DB_Newbie2007
DB_Newbie2007
SSC Veteran
SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)

Group: General Forum Members
Points: 204 Visits: 1329
OK, maybe I am missing something here. I was reviewing an forum discussion about ignoring duplicates,
http://www.sqlservercentral.com/Forums/Topic856838-392-2.aspx. One of the posts appeared to be interesting, so I thought I would try some speed/IO tests comparing the various insert methods. However, I appear to be having problems with comparing the EXCEPT to the LEFT OUTER JOIN and the results I get back.

Here is the modified version of the post from the forum:

---- 0: Create some tables, data in tables
   -- New Transactions
   CREATE TABLE #Transactions
      (
         TranId BIGINT PRIMARY KEY,
         TranDate DATETIME,
         TranCode VARCHAR(4),
         Amount MONEY
      Wink;

   -- Existing Transactions
   CREATE TABLE #VendorExport
      (
         TranId BIGINT PRIMARY KEY,
         TranDate DATETIME,
         TranCode VARCHAR(4),
         Amount MONEY
      Wink;

   -- Generate one million existing transactions (takes about three seconds)
   WITH Data (rn)
   AS (
   SELECT TOP (1000000)
         ROW_NUMBER() OVER (ORDER BY C1.object_id, C1.column_id)
   FROM master.sys.columns C1,
         master.sys.columns C2,
         master.sys.columns C3
         Wink
   INSERT #VendorExport(TranID, TranDate, TranCode, Amount)
   SELECT rn,
         DATEADD(HOUR, Data.rn, '19950101'),
         CASE rn % 2 WHEN 0 THEN 'room' WHEN 1 THEN 'tax' ELSE NULL END,
         RAND(CHECKSUM(NEWID())) * 25 + 1
   FROM Data;

   -- Generate new data, with some overlaps
   -- (The original generate script used TOP (10000) )
   WITH Data (rn)
   AS (
   SELECT TOP (1000)
         ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
   FROM master.sys.columns C1,
         master.sys.columns C2
         Wink
   INSERT #Transactions (TranId, TranDate, TranCode, Amount)
   SELECT rn * 100,
         DATEADD(HOUR, Data.rn, '19950101'),
         CASE rn % 2 WHEN 0 THEN 'room' WHEN 1 THEN 'tax' ELSE NULL END,
         rn % 25
   FROM Data;
      -- Generate new data, with some overlaps
      --- (Added this, since the original script did not appear to generate "new" data?)
   WITH Data (rn)
   AS (
   SELECT TOP (1000)
         ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
   FROM master.sys.columns C1,
         master.sys.columns C2
         Wink
   INSERT #Transactions (TranId, TranDate, TranCode, Amount)
   SELECT rn * 10002,
         DATEADD(HOUR, Data.rn, '19950101'),
         CASE rn % 2 WHEN 0 THEN 'room' WHEN 1 THEN 'tax' ELSE NULL END,
         rn % 25
   FROM Data;
   
---- 1: Select results using EXCEPT
   SELECT TranID, TranDate, TranCode, Amount
   FROM #Transactions
   EXCEPT
   SELECT TranID, TranDate, TranCode, Amount
   FROM #VendorExport;
---- 2: Select results using LEFT OUTER JOIN
   SELECT T.TranID, T.TranDate, T.TranCode, T.Amount
   FROM #Transactions T
   LEFT OUTER JOIN #VendorExport V
      ON T.TranID = V.TranID
   WHERE V.TranID IS NULL
---- 3: Select results using INNER JOIN
   SELECT T.TranID, T.TranDate, T.TranCode, T.Amount
   FROM #Transactions T
   INNER JOIN #VendorExport V
      ON T.TranID = V.TranID
---- 4: Select results using INTERSECT
   SELECT TranID, TranDate, TranCode, Amount
   FROM #Transactions
   INTERSECT
   SELECT TranID, TranDate, TranCode, Amount
   FROM #VendorExport;
      
---- Tidy up
   DROP TABLE
      #Transactions,
      #VendorExport;

Step 1 (EXCEPT) generates 2000 records;
Step 2 (LEFT OUTER JOIN) generates 901 records;
Step 3 (INNER JOIN) generates 1099 records;
Step 4 (INTERSECT) has 0 records.

The questions:
a) Why the 1099 records in Step 3 are the same as the first 1099 records in Step 1?
b) Why does the INNER JOIN return 1099 records but the INTERSECT returns 0 records?

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)
DB_Newbie2007
DB_Newbie2007
SSC Veteran
SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)

Group: General Forum Members
Points: 204 Visits: 1329
Just to make sure I understood the Intersect/Except operators, I tried a variation of the examples from BOL. I put debug(s) around it all just to run various tests without having to re-generate the TEMP tables all the time:

DECLARE @DEBUG INT;
SET @DEBUG = 1;

IF (@DEBUG = 0)
BEGIN
   CREATE TABLE #TableA (col1 int);
   CREATE TABLE #TableB (col1 int);
   CREATE TABLE #TableC (col1 int);

   INSERT INTO #TableA
   SELECT NULL UNION
   SELECT 1 UNION
   SELECT 2 UNION
   SELECT 2 UNION
   SELECT 2 UNION
   SELECT 2 UNION
   SELECT 3 UNION
   SELECT 4 UNION
   SELECT 4;
   INSERT INTO #TableA VALUES (NULL);
   INSERT INTO #TableA VALUES (NULL);

   INSERT INTO #TableB
   SELECT NULL UNION
   SELECT 1 UNION
   SELECT 3 UNION
   SELECT 4 UNION
   SELECT 4;

   INSERT INTO #Tablec
   SELECT 2 UNION
   SELECT 2 UNION
   SELECT 2 UNION
   SELECT 4 UNION
   SELECT 4;

   SELECT * from #TableA;
   SELECT * FROM #TableB;
   SELECT * FROM #TableC;
END;

IF (@Debug = 1)
BEGIN
   SELECT col1
   FROM #TableA
   EXCEPT
   SELECT col1 FROM #TableB;
   
   SELECT A.Col1
   FROM #TableA A
   LEFT OUTER JOIN #TableB B
      ON A.Col1 = B.Col1
   WHERE B.Col1 IS NULL;

   SELECT *
   FROM #TableA
   INTERSECT
   SELECT * FROM #TableB;

   SELECT A.Col1
   FROM #TableA A
   INNER JOIN #TableB B
      ON A.Col1 = B.Col1
END

IF (@DEBUG = 10)
DROP TABLE #TableA, #TableB, #TableC;

All of the results here are as expected. But that still leaves the questions as to why the first tests had differences..... Sad

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)
kramaswamy
kramaswamy
SSC Eights!
SSC Eights! (859 reputation)SSC Eights! (859 reputation)SSC Eights! (859 reputation)SSC Eights! (859 reputation)SSC Eights! (859 reputation)SSC Eights! (859 reputation)SSC Eights! (859 reputation)SSC Eights! (859 reputation)

Group: General Forum Members
Points: 859 Visits: 1788
The reason your results are not corresponding to what you would have expected them to be, is that you're not understanding the operations correctly.

EXCEPT and INTERSECT compare the *entire* data set.

EXCEPT returns all the rows which exist in Table 1 but do NOT exist in table 2 - for *ALL* columns.

Take the following information as example data for what I'm going to describe below:


SELECT * FROM #Transactions WHERE tranID = 100

TranId   TranDate   TranCode   Amount
100   1995-01-01 01:00:00.000   tax   1.00

-----------------

SELECT * FROM #VendorExport WHERE tranID = 100

TranId   TranDate   TranCode   Amount
100   1995-01-05 04:00:00.000   room   3.5587



When doing the EXCEPT query, the row would be returned. This is because although they have the same TranID, they are different in their TranDate, TranCode, and Amount fields.

When doing the LEFT OUTER JOIN query, the row would not be returned. This is because the LEFT OUTER JOIN query is looking to match on TranID alone, and since both tables share the TranID in question, the "WHERE v.TranID IS NULL" part of the query would cause it to be excluded.

When doing the INNER JOIN query, the row would be returned. This is because the INNER JOIN is joining only when the TranID matches. The reason why you are seeing the "same" information as the records from Query 1, is because you are looking only at the columns from #Transactions. If you were to show the columns from #VendorExport, then you would see the different results brought back by the JOIN.

Finally, when doing the INTERSECT query, the row would not be returned. The reason why the INNER JOIN query returns 1099 rows while the INTERSECT query returns 0 rows, is because of the way INTERSECT (and EXCEPT, for that matter) works. INTERSECT returns the opposite result set of EXCEPT. It returns for you all the rows which are *exactly* the same between the two tables (while EXCEPT returns the rows which are *not* exactly the same).
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search