SQLServerCentral Article

Caution with EXCEPT

,

The EXCEPT operation is a wonderful new feature introduced in SQL Server 2005, but you really have to understand it fully if you want to get the results you except...er, I mean, expect.

Why use EXCEPT?

Consider the situation where you have two tables of the same name in two different databases, and you want to know if they have the same contents. Maybe they are tables in your Staging and Production databases, and you want to see if they are in sync. So, if there is ANY column in ANY row that is different, or if one table has a different number of rows, you want to know. Since EXCEPT "returns any distinct values from the left query that are not also found on the right query" (http://msdn.microsoft.com/en-us/library/ms188055%28SQL.90%29.aspx), this sounds like a good place to use EXCEPT, right? Let's try it out:

SELECT * FROM Staging.dbo.WIDGET
EXCEPT
SELECT * FROM Production.dbo.WIDGET

A good start, but if we have any additional WIDGETs in Production (the right table) that we don't have in Staging, we're going to miss finding that difference using the above query. This is because EXCEPT -- as the documentation says -- will only return results from the left which are not also found on the right. To achieve our goal, we might try comparing the tables in both directions, via something like this:

SELECT * FROM Staging.dbo.WIDGET
EXCEPT
SELECT * FROM Production.dbo.WIDGET
UNION
SELECT * FROM Production.dbo.WIDGET
EXCEPT
SELECT * FROM Staging.dbo.WIDGET

OK, we're getting closer, but we are still only going to see the differences in the Staging table using this construct because this is just bad SQL. Let's clean it up:

SELECT * FROM
(
SELECT * FROM Staging.dbo.WIDGET
EXCEPT
SELECT * FROM Production.dbo.WIDGET
) LEFT_DIFFS
UNION
SELECT * FROM
(
SELECT * FROM Production.dbo.WIDGET
EXCEPT
SELECT * FROM Staging.dbo.WIDGET
) RIGHT_DIFFS

Now we'll get our differences, no matter where they are. We will get rows that differ in values, and we will get rows that are in one table but not in the other. Actually, we're going to caveat the second part of that last statement in just a moment, but first, let's finish using what we've learned so far about EXCEPT.

So, the above is all well and good, but once you try this, you'll notice that you have your differences, but no information on where they came from. Let's just make one final addition to the query in order to fix this. Using our knowledge that EXCEPT returns values from the LEFT side which are not also on the right (remember?), let's prefix the results with what table the different row was found in:

SELECT 'STAGING' as src, * FROM
(
SELECT * FROM Staging.dbo.WIDGET
EXCEPT
SELECT * FROM Production.dbo.WIDGET
) LEFT_DIFFS
UNION
SELECT 'PRODUCTION' as src, * FROM
(
SELECT * FROM Production.dbo.WIDGET
EXCEPT
SELECT * FROM Staging.dbo.WIDGET
) RIGHT_DIFFS

Now we've got something more useful. If you order by the primary key and then 'src', you can get rows that differ to fall next to each other in the result set.

The Caveat

Speaking of primary keys, it is certainly helpful if you have one in the tables you are comparing with EXCEPT. This is because, as the documentation says, EXCEPT "returns any DISTINCT values from the left QUERY that are not also found on the right QUERY" (my emphasis). Now, our goal is to compare tables, but we are really comparing query results -- the query SELECT * in our case. And we are only going to get back DISTINCT results. So, if the tables you are comparing allow for duplicate rows, EXCEPT will return only one of those duplicate rows, if it not found in the other table. So, I have to qualify my earlier statement above that you would get the rows that are in one table, but not in the other. Strictly speaking, there may be more rows in your table which differ than are shown in the EXCEPT results, because EXCEPT returns only DISTINCT differences.

A Real World Experience

I recently used EXCEPT as shown above to compare two tables in two different databases, in order to make sure that they were exactly the same, and found that the query was returning all rows all the time. I completely deleted the data in one of the tables (Staging) and filled it again with the data from the other table (Production). Same result, the query returned everything in both tables as differences. What was going on?

As it turns out, there is another very important part of the documentation of EXCEPT that one needs to be aware of: "The number and the order of the columns must be the same in all queries." A closer inspection of the two tables in both databases revealed that their column order was, for unknown reasons, just slightly different. (They were probably originally created by different SQL scripts.) When I replaced SELECT * with an explicit SELECT WIDGET_id, WIDGET_name, etc., the results were what I expected, no differences.

 

SELECT 'STAGING' as src, * FROM
(
SELECT
WIDGET_ID,
WIDGET_NAME,
WIDGET_DESC,
...
FROM Staging.dbo.WIDGET
EXCEPT
SELECT
WIDGET_ID,
WIDGET_NAME,
WIDGET_DESC,
...
FROM Production.dbo.WIDGET
) LEFT_DIFFS
UNION
SELECT 'PRODUCTION' as src, * FROM
(
SELECT
WIDGET_ID,
WIDGET_NAME,
WIDGET_DESC,
...
FROM Production.dbo.WIDGET
EXCEPT
SELECT
WIDGET_ID,
WIDGET_NAME,
WIDGET_DESC,
...
FROM Staging.dbo.WIDGET
) RIGHT_DIFFS

The Lesson

The lesson here was, when using EXCEPT, be careful to state your columns explicitly, in the same order. Or, I suppose you might argue that the lesson was "never use SELECT *". The SELECT * construct just seemed so appropriate to my problem here, though, since I wanted to check ALL columns for differences. So, if you're tempted to use SELECT * with EXCEPT, learn from my experience to be EXPLICIT with EXCEPT to get the results you EXPECT!

Rate

4.58 (108)

You rated this post out of 5. Change rating

Share

Share

Rate

4.58 (108)

You rated this post out of 5. Change rating