Union

,

The union statement probably isn't the most used statement in TSQL, but it is

useful. What's more interesting is that it's often used in a way that might not

return the results you would expect. If you're a TSQL master none of this will

be new to you, but for the rest of us - it's worth a few minutes review!

We'll start with a somewhat contrived example, let's say we want to invite 10

employees with the lastname of Adams and 10  customers with the last name of Smith

to our SomethingOrOther.com launch.All things being equal, most people expect

this statement to return 20 rows if there are 10 rows that match each separate

query - would you agree?

select top 10 firstname, lastname from person.contact where lastname='adams'
union
select top 10 firstname, lastname from dbo.customers where lastname='smith'

When I run each statement separately I get 10 rows each, as follows:

If I run the entire query, I only get back 18 rows:

If you look carefully you'll see that our first query for last name 'Smith'

return a duplicate row for Samantha and another for Denise. Union by design

removes duplicates from the final result set, even if the duplicates were within

a single statement as in our example. It can be very useful behavior, or

problematic if you're not expecting it! In this case are the two rows we

eliminated truly duplicate people, or more likely they have different ID's and

addresses, meaning we managed to exclude two people by accident.

If we change to use UNION ALL we get the expected behavior of returning 20

rows, duplicates and all.

Depending on our needs either could be correct. But it's so important and so

often misunderstood that I make it a standard practice to follow up any time I

see UNION to see if they know the difference. If they indeed meant UNION I add a

comment to make life easier for the next DBA, like thisL

select top 10 firstname, lastname from person.contact where lastname='adams'
union	--8/10/08 verified UNION is correct
select top 10 firstname, lastname from dbo.customers where lastname='smith'

There's also a difference from a performance perspective. UNION ALL requires

little additional work besides running the combined queries, but just UNION

requires an additional step to remove the duplicates - and that could be

expensive, it depends on how many rows you're checking. The top query plan is

UNION, the bottom is UNION ALL. We can see that the UNION example includes a

SORT operator that changes the plan, but always check Profiler to see the actual

difference in cost.

So, there's a little trivia you can test your developers with, and it might

save you from an embarrassing mistake someday too! If you're new to UNION you

might want to take a look at both EXCEPT and INTERSECT, both were added to SQL

2005.

Visit my blog at

http://blogs.sqlservercentral.com/andy_warren/default.aspx

 

 

Rate

3.75 (60)

Share

Share

Rate

3.75 (60)