Blog Post

UNION vs UNION ALL

,

You might be wondering why I’m going into such a simple subject. Well the way I see it there are four options here.

  • You already know the difference, it seems really obvious and you are probably wondering why I’m mentioning it.
  • You think you know the difference but it turns out you are wrong (don’t worry, it happens).
  • You don’t know the difference and once I’ve pointed it out you will wonder why on earth you never thought of it before.
  • You don’t care.

 

If you don’t care there isn’t much help I can give you. If you already know what I have to say then you don’t need help. That leaves a 50% chance that you will find this interesting. So here goes.

At its simplest the difference is that UNION returns a distinct list of rows and UNION ALL returns all rows.

--Table setup
CREATE TABLE UnionTable1 (Id Int)
CREATE TABLE UnionTable2 (Id Int)
INSERT INTO UnionTable1 VALUES (2), (4), (6), (8), (10), (12)
INSERT INTO UnionTable2 VALUES (3), (6), (9), (12)
--Union example
SELECT Id AS [UNION] FROM UnionTable1
UNION 
SELECT Id FROM UnionTable2
SELECT Id AS [UNION ALL] FROM UnionTable1
UNION ALL
SELECT Id FROM UnionTable2

Union1Union2

Where things get a little bit interesting is how UNION handles generating that distinct list. You will notice that the UNION output is in order while the UNION ALL is not. In order to generate the distinct list from the queries UNION sorts the values. This means an additional sort operator in the execution plan.

Union3

For comparison here is the execution plan for UNION ALL.

Union4

Notice that the sort operator for the UNION is by far the most expensive part of the whole process.

So what does that mean for you? Unless you actually need to use UNION (Ie you need to get rid of duplicates) then you want to use UNION ALL as it’s the much cheaper and faster option.

There are a couple of exceptions. If you are doing a UNION in an EXISTS clause then SQL knows enough that it doesn’t bother with the sort and the execution times are the same. Also if you are already sorting the output (using an ORDER BY) then most of the cost is already taken care of.

Like I said, this is all fairly simple, and straight forward, but you would be surprised how often people don’t think about it.

Filed under: Microsoft SQL Server, SQLServerPedia Syndication, T-SQL Tagged: code language, language sql, T-SQL

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating