Combining union and union all

  • Comments posted to this topic are about the item Combining union and union all[/url]

    Thank you for reading the discussion on my QotD. As you may have noticed, the explanation I gave was wrong, even though 'Option 3' was still the correct answer. Please read on to find the correct explanation.

    (oops, I broke the link to the QotD by editing the opening post...)



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • Nice 🙂

  • Another great back to basics question. Thanks!

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I get it right, but I disagree with explanation:

    If at least one 'Union' is used, duplicates will be removed from the entire final result set, no matter where the 'Union' occurs

    It depends on precedence or parathesis and this script demonstrates it:

    -- return two row

    select 1

    UNION select 1

    UNION ALL select 1

    -- return one row

    select 1

    UNION ALL select 1

    UNION select 1

    -- return two row

    select 1

    UNION ALL (select 1

    UNION select 1

    )

  • You have left off half of the explanation in your quote. In the next sentence the use of parenthesis to override this behavior is explained:

    To preserve duplicates in only a part of the final result set, parenthesis must be used to separate the 'union'-ed statement(s) from the 'union all'-ed statements.

    That's exactly what this QotD was meant to test: If at least one union (not all) is used in the statement, all duplicates are removed. If you want to preserve (some) duplicates in the final resultset and at least one "union" (not all) is used in the statement, you will have to use parenthesis to achieve this. The query in the QotD deliberatly did not have such parenthesis to illustrate the behavior. If you follow the link to the article in the answer, you will find under "Example D" an example how to do this.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • R.P.Rozema (3/8/2012)


    You have left off half of the explanation in your quote. In the next sentence the use of parenthesis to override this behavior is explained:

    To preserve duplicates in only a part of the final result set, parenthesis must be used to separate the 'union'-ed statement(s) from the 'union all'-ed statements.

    That's exactly what this QotD was meant to test: If at least one union (not all) is used in the statement, all duplicates are removed. If you want to preserve (some) duplicates in the final resultset and at least one "union" (not all) is used in the statement, you will have to use parenthesis to achieve this. The query in the QotD deliberatly did not have such parenthesis to illustrate the behavior. If you follow the link to the article in the answer, you will find under "Example D" an example how to do this.

    No, it depends by the position. Run the code in the previos post.

  • I see your point. I have to get back on this or maybe someone else sees what's wrong?



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • Explanation is wrong: you can get multiple rows even if you use UNION in the query.

    You can try it using the tables in the question:

    select col from #t2

    union

    select col from #t3

    union all

    select col from #t1

    Results:

    col

    2

    3

    1

    1

  • It's because of the order of operations: when written this way the UNION happens first and removes duplicates in t2 and t3, then the result is combined with t1 using UNION ALL which allows duplicates.

    select col from #t2

    union

    select col from #t3

    union all

    select col from #t1

    If you change round the UNION and UNION ALL above you get the result set without duplicates, because the last operation performed is a UNION.

    select col from #t2

    union all

    select col from #t3

    union

    select col from #t1

  • Got it right and thought that I understood - now looking at the other posts I am slightly confused.

  • UNION queries are evaluated from left to right. If the last query contains duplicates and is preceded with UNION ALL, you will have duplicates in your result set.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Thanks for the question - and once again great discussion afterwards - learnt from that!

  • tim.kay (3/8/2012)


    Got it right and thought that I understood - now looking at the other posts I am slightly confused.

    So am I, as I was very sure I had tested both situations: union followed by union all and union all followed by union... I see what happens, but it contradicts with my previous results. So I need to find out what I did wrong previously.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • R.P.Rozema (3/8/2012)


    tim.kay (3/8/2012)


    Got it right and thought that I understood - now looking at the other posts I am slightly confused.

    So am I, as I was very sure I had tested both situations: union followed by union all and union all followed by union... I see what happens, but it contradicts with my previous results. So I need to find out what I did wrong previously.

    Your test is wrong because of values used:

    1, 2 and 3 are different and they are not affected by the DISTINCT of UNION.

    You should use same values!

    😉

  • To finalize this: My explanation is wrong and my question + answer was correct only by luck.

    The proper explanation has been given in this thread. To be sure I am getting it now I'll try to summarize it:

    Union queries are interpreted left to right. If "union all" is followed by "union", the "union all" will return duplicates, but these will be filtered by the following "union". Other way around, if "union" is followed by "union all", any duplicates from the first 2 statements are filtered, but new duplicates may be introduced by the following "union all".

    Parenthesis can be used to override the left-to-right evaluation.

    An illustration can be given by putting more rows in the test tables:

    create table #t1 (col int not null);

    create table #t2 (col int not null);

    create table #t3 (col int not null);

    insert #t1 (col) values(1), (1);

    insert #t2 (col) values(2), (2);

    insert #t3 (col) values(3), (3);

    select col from #t1

    UNION

    select col from #t2

    UNION ALL

    select col from #t3;

    select col from #t1

    UNION ALL

    select col from #t2

    UNION

    select col from #t3;

    And now the results are:

    col

    -----------

    1

    2

    3

    3

    (4 row(s) affected)

    col

    -----------

    1

    2

    3

    (3 row(s) affected)

    Seems like I was the first to learn something from my own question :).

    Thanks for all the feedback!



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

Viewing 15 posts - 1 through 15 (of 49 total)

You must be logged in to reply to this topic. Login to reply