Combining union and union all

Question

Post reply

Combining union and union all

R.P.Rozema

SSChampion

Points: 12317
More actions
March 7, 2012 at 10:02 pm

#257107

Comments posted to this topic are about the item Combining union and union all[/url]
Thank you for reading the discussion on my QotD. As you may have noticed, the explanation I gave was wrong, even though 'Option 3' was still the correct answer. Please read on to find the correct explanation.
(oops, I broke the link to the QotD by editing the opening post...)

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

Viewing 15 posts - 1 through 15 (of 50 total)

You must be logged in to reply to this topic. Login to reply

baabhu SSCertifiable Points: 6205 More actions · Answer 1

baabhu

SSCertifiable

Points: 6205

March 7, 2012 at 10:02 pm

#1456346

Nice 🙂

Koen Verbeeck SSC Guru Points: 259215 More actions · Answer 2

Another great back to basics question. Thanks!

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Carlo Romagnano SSC-Insane Points: 22933 More actions · Answer 3

I get it right, but I disagree with explanation:

If at least one 'Union' is used, duplicates will be removed from the entire final result set, no matter where the 'Union' occurs

It depends on precedence or parathesis and this script demonstrates it:

-- return two row

select 1

UNION select 1

UNION ALL select 1

-- return one row

select 1

UNION ALL select 1

UNION select 1

-- return two row

select 1

UNION ALL (select 1

UNION select 1

)

R.P.Rozema SSChampion Points: 12317 More actions · Answer 4

You have left off half of the explanation in your quote. In the next sentence the use of parenthesis to override this behavior is explained:

To preserve duplicates in only a part of the final result set, parenthesis must be used to separate the 'union'-ed statement(s) from the 'union all'-ed statements.

That's exactly what this QotD was meant to test: If at least one union (not all) is used in the statement, all duplicates are removed. If you want to preserve (some) duplicates in the final resultset and at least one "union" (not all) is used in the statement, you will have to use parenthesis to achieve this. The query in the QotD deliberatly did not have such parenthesis to illustrate the behavior. If you follow the link to the article in the answer, you will find under "Example D" an example how to do this.

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

Carlo Romagnano SSC-Insane Points: 22933 More actions · Answer 5

R.P.Rozema (3/8/2012)
You have left off half of the explanation in your quote. In the next sentence the use of parenthesis to override this behavior is explained:
To preserve duplicates in only a part of the final result set, parenthesis must be used to separate the 'union'-ed statement(s) from the 'union all'-ed statements.
That's exactly what this QotD was meant to test: If at least one union (not all) is used in the statement, all duplicates are removed. If you want to preserve (some) duplicates in the final resultset and at least one "union" (not all) is used in the statement, you will have to use parenthesis to achieve this. The query in the QotD deliberatly did not have such parenthesis to illustrate the behavior. If you follow the link to the article in the answer, you will find under "Example D" an example how to do this.

No, it depends by the position. Run the code in the previos post.

R.P.Rozema SSChampion Points: 12317 More actions · Answer 6

I see your point. I have to get back on this or maybe someone else sees what's wrong?

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

archie flockhart SSCrazy Points: 2339 More actions · Answer 7

Explanation is wrong: you can get multiple rows even if you use UNION in the query.

You can try it using the tables in the question:

select col from #t2

union

select col from #t3

union all

select col from #t1

Results:

col

2

3

1

archie flockhart SSCrazy Points: 2339 More actions · Answer 8

It's because of the order of operations: when written this way the UNION happens first and removes duplicates in t2 and t3, then the result is combined with t1 using UNION ALL which allows duplicates.

select col from #t2

union

select col from #t3

union all

select col from #t1

If you change round the UNION and UNION ALL above you get the result set without duplicates, because the last operation performed is a UNION.

select col from #t2

union all

select col from #t3

union

select col from #t1

Skanker Hall of Fame Points: 3059 More actions · Answer 9

Got it right and thought that I understood - now looking at the other posts I am slightly confused.

Koen Verbeeck SSC Guru Points: 259215 More actions · Answer 10

UNION queries are evaluated from left to right. If the last query contains duplicates and is preceded with UNION ALL, you will have duplicates in your result set.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Michael Riemer SSCertifiable Points: 5190 More actions · Answer 11

Thanks for the question - and once again great discussion afterwards - learnt from that!

R.P.Rozema SSChampion Points: 12317 More actions · Answer 12

tim.kay (3/8/2012)
Got it right and thought that I understood - now looking at the other posts I am slightly confused.

So am I, as I was very sure I had tested both situations: union followed by union all and union all followed by union... I see what happens, but it contradicts with my previous results. So I need to find out what I did wrong previously.

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

Carlo Romagnano SSC-Insane Points: 22933 More actions · Answer 13

R.P.Rozema (3/8/2012)
tim.kay (3/8/2012)
Got it right and thought that I understood - now looking at the other posts I am slightly confused.
So am I, as I was very sure I had tested both situations: union followed by union all and union all followed by union... I see what happens, but it contradicts with my previous results. So I need to find out what I did wrong previously.

Your test is wrong because of values used:

1, 2 and 3 are different and they are not affected by the DISTINCT of UNION.

You should use same values!

😉

R.P.Rozema SSChampion Points: 12317 More actions · Answer 14

To finalize this: My explanation is wrong and my question + answer was correct only by luck.

The proper explanation has been given in this thread. To be sure I am getting it now I'll try to summarize it:

Union queries are interpreted left to right. If "union all" is followed by "union", the "union all" will return duplicates, but these will be filtered by the following "union". Other way around, if "union" is followed by "union all", any duplicates from the first 2 statements are filtered, but new duplicates may be introduced by the following "union all".
Parenthesis can be used to override the left-to-right evaluation.

An illustration can be given by putting more rows in the test tables:

create table #t1 (col int not null);

create table #t2 (col int not null);

create table #t3 (col int not null);

insert #t1 (col) values(1), (1);

insert #t2 (col) values(2), (2);

insert #t3 (col) values(3), (3);

select col from #t1

UNION

select col from #t2

UNION ALL

select col from #t3;

select col from #t1

UNION ALL

select col from #t2

UNION

select col from #t3;

And now the results are:

col

-----------

1

2

3

(4 row(s) affected)

col

-----------

1

2

3

(3 row(s) affected)

Seems like I was the first to learn something from my own question :).

Thanks for all the feedback!

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?