SQLServerCentral Article

Intersect, Except, Union, All and Any

,

I was reviewing the Microsoft training courses for SQL2008 for a set of application developers who wanted to learn more about SQL Server. In doing so I came across some commands that I either had not heard of or had not used before.

  • INTERSECT
  • EXCEPT
  • ALL
  • ANY

ALL and ANY have been around for donkeys years but INTERSECT and EXCEPT were new to me.

Whenever I find a new set of commands in T-SQL I fire up my developer edition and go straight to Adventureworks to start playing around.

INTERSECT, EXCEPT and UNION

To experiment with these commands I decided to look at two sets of CustomerID values

  • Customers in sales territory 10 (United Kingdom)
  • Sales orders in July 2004, the last month of orders in Adventureworks

Perhaps the best way to show what these commands do when comparing two sets of data is to look at the diagrams below

Predicate Illustration Description
EXCEPT image001 Customers from the UK who have not bought in the last month (July 2004)
INTERSECT INTERSECT Customers from the UK

AND

who have made a purchase in the last month (July 2004)

UNION UNION Customers from the UK

OR

who have made a purchase in the last month (July 2004)

Different ways of writing an EXCEPT query

Although I had not used the EXCEPT before I had achieved the same ends by more traditional methods. As any DBA will know there are many ways of achieving exactly the same thing so I will illustrate 3 solutions.

LEFT JOIN

The following query yielded the results I needed

        SELECT C.CustomerID
        FROM Sales.Customer AS C
        LEFT JOIN Sales.SalesOrderHeader AS OH
        ON C.CustomerID = OH.CustomerID
        AND OrderDate>='2004-07-01' 
        WHERE OH.CustomerID IS NULL 
        AND C.TerritoryID=10
    

Except - LEFT JOIN

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 196 ms.
     

The overall query cost was 0.0517279

WHERE CustomerID NOT IN(…)

Being a diligent DBA I wanted to try the query a different way

        SELECT CustomerID
        FROM Sales.Customer
        WHERE TerritoryID=10
        AND CustomerID NOT IN(
        SELECT customerid
        FROM  Sales.SalesOrderHeader
        WHERE OrderDate>='2004-07-01' 
        )
    

EXCEPT NOT IN

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 160 ms.

This time the overall query cost was 0.0500904 offering a slight improvement.

EXCEPT

Finally I tried the EXCEPT predicate

        SELECT CustomerID
        FROM Sales.Customer
        WHERE TerritoryID=10
        EXCEPT
        SELECT customerid
        FROM Sales.SalesOrderHeader
        WHERE OrderDate>='2004-07-01' 
    

image007

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 202 ms.
    

The time for the EXCEPT predicate was 0.0500904. Exactly the same as for the WHERE NOT IN(…) query. You can also see that the execution plans are the same.

Different ways of writing an INTERSECT query

I decided to take the same approach with the INTERSECT query and write the equivalentquery in 3 separate ways.

INNER JOIN

As any customer can have more than one order I am going to have to make a distinct list of CustomerID values. I decided to try a couple of approaches.

        SELECT DISTINCT C.CustomerID
        FROM Sales.Customer AS C
        INNER JOIN Sales.SalesOrderHeader AS OH
        ON C.CustomerID = OH.CustomerID
        WHERE
        C.TerritoryID=10
        AND OH.OrderDate>='2004-07-01' 
        SELECT C.CustomerID
        FROM Sales.Customer AS C
        INNER JOIN (SELECT DISTINCT CustomerID FROM Sales.SalesOrderHeader WHERE OrderDate>='2004-07-01' 
        )AS OH
        ON C.CustomerID = OH.CustomerID
        WHERE C.TerritoryID=10
    

This produced two differing execution plans with an overall cost of 0.0500859 and 0.0513978 respectively.

INNER JOIN

WHERE CustomerID IN(…)

This time the query below yielded an execution cost of 0.0499945

        SELECT CustomerID
        FROM Sales.Customer
        WHERE TerritoryID=10
        AND CustomerID IN(
        SELECT customerid
        FROM  Sales.SalesOrderHeader
        WHERE OrderDate>='2004-07-01' 
        )
    

INTERSECT IN

INTERSECT

Finally, as before the INTERSECT query presented an identical execution plan to the WHERE CustomerID IN ( query

        SELECT CustomerID
        FROM Sales.Customer
        WHERE TerritoryID=10
        INTERSECT
        SELECT customerid
        FROM Sales.SalesOrderHeader
        WHERE OrderDate>='2004-07-01' 
    

Conclusion for INTERSECT and EXCEPT

Granted Adventureworks is a pretty contrived example and under such conditions the example I have given here INTERSECT and EXCEPT are as good as the best of the rest.

INTERSECT
Predicate Result
INNER JOIN 0.0500859
INNER JOIN (with DISTINCT derived query) 0.0513978
IN 0.0499945
INTERSECT 0.0499945
      
EXCEPT
Predicate Result
LEFT JOIN 0.0517279
NOT IN 0.0500904
EXCEPT 0.0500904

At this stage I cannot think of a compelling reason to use these new predicates over and above the more usual WHERE field NOT IN (…) particularly as the query on both sides of an EXCEPT or INTERSECT has to return the same recordset structure.

I did consider using it as a means of deduplicating two address record sets but that would be an even more contrived example.

The test does emphasize that it is worthwhile trying to write the query more than one way in order to determine the best option for the particular scenario.

The ANY and ALL Predicate

ANY and ALL are predicates I have never needed to use.

ANY

The two queries below offer up the same results and execution plan

        SELECT *
        FROM Sales.SalesPerson
        WHERE TerritoryID = ANY(
        SELECT TerritoryID FROM Sales.SalesTerritory WHERE CountryRegionCode='US'
        )
        SELECT *
        FROM Sales.SalesPerson
        WHERE TerritoryID IN(
        SELECT TerritoryID FROM Sales.SalesTerritory WHERE CountryRegionCode='US'
        )
        

To look at how ANY may be used in a more practical sense I decided to create a Sales.TopSales table in Adventureworks using the query below

        SELECT TOP 5 SalesOrderID,TotalDue
        INTO Sales.TopSales
        FROM Sales.SalesOrderHeader
        ORDER BY Sales.TotalDue DESC
        ALTER TABLE Sales.TopSales
        ADD CONSTRAINT PK_TopSales PRIMARY KEY CLUSTERED( SalesOrderID)
        CREATE INDEX idx_TopSales_TotalDue ON Sales.TopSales(TotalDue DESC)
    

For my example I decided to retrieve sales that exceeded the minimum in my Sales.TopSales table.

        SELECT *
        FROM Sales.SalesOrderHeader
        WHERE TotalDue > ANY(SELECT TotalDue FROM Sales.TopSales)
        ORDER BY Sales.TotalDue DESC
    

An alternative way of writing this query could be as follows

        SELECT *
        FROM Sales.SalesOrderHeader
        WHERE TotalDue > (SELECT MIN(TotalDue) FROM Sales.TopSales)
        ORDER BY Sales.TotalDue DESC
    

The execution plans show considerable differences

  • ANY query cost = 3.47092
  • Alternative query cost = 1.24152

image012

ALL

The ALL predicate allows a comparison to be made against all the values in a SELECT list. The following two queries are identical

        SELECT *
        FROM Sales.SalesOrderHeader
        WHERE TotalDue > ALL(SELECT TotalDue FROM Sales.TopSales)
        ORDER BY Sales.TotalDue DESC 
        SELECT *
        FROM Sales.SalesOrderHeader
        WHERE TotalDue > (SELECT MAX(TotalDue) FROM Sales.TopSales)
        ORDER BY Sales.TotalDue DESC 

Once again the execution plans tell the full story

  • ALL query cost 3.47068
  • Alternative query cost 1.24152

image014

Conclusion for ALL and ANY

ALL and ANY certainly date back to SQL Server 7.0 and possibly before that. I have never found a reason to use them and given my findings I see no benefit to them.

Both predicates insist that their contents are produced from SELECT statements so where an IN predicate can contain a static list or a SELECT statement, ANY can only contain a SELECT statement.

The best I could results I could achieve for the ANY predicate simply equaled those for the more flexible IN predicate.

Rounding it all up

ALL and ANY do not seem to offer any benefits to offset their disadvantages

INTERSECT and EXCEPT do at least offer seem to offer performance on par with alternatives but so far I cannot see a benefit to using them. They are however part of the ANSI standard SQL.

Rate

4.13 (158)

You rated this post out of 5. Change rating

Share

Share

Rate

4.13 (158)

You rated this post out of 5. Change rating