UNION and Duplicate values

  • Comments posted to this topic are about the item UNION and Duplicate values

    God is real, unless declared integer.

  • Nice, easy one to start the day off on, thanks, Frans
    Using a DISTINCT with a UNION add about as much value as using it in a subquery (especially a correlated subquery)
    e.g. SELECT Col1, Col2 FROM dbo.TheTable WHERE Col3 IN (SELECT Col1, Col2 FROM dbo.TheTable WHERE Col3 IN (SELECT DISTINCT Col2 FROM dbo.Table2 WHERE ...) Col2 FROM dbo.Table2 WHERE ...)

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • Nice easy Question, but again it makes me think...
    Which of these the the better performs better, and what factors determine that?
    SELECT DISTINCT *
    FROM (VALUES (1), (2), (3), (1)) AS t1(id)
    UNION ALL
    SELECT DISTINCT *
    FROM (VALUES (4), (5))    AS t2(i

    OR THIS

    SELECT *
    FROM (VALUES (1), (2), (3), (1)) AS t1(id)
    UNION
    SELECT *
    FROM (VALUES (4), (5))    AS t2(id)d)

  • Budd - Tuesday, June 12, 2018 7:30 AM

    Nice easy Question, but again it makes me think...
    Which of these performs better, and what factors determine that?
    SELECT DISTINCT *
    FROM (VALUES (1), (2), (3), (1)) AS t1(id)
    UNION ALL
    SELECT DISTINCT *
    FROM (VALUES (4), (5))    AS t2(i

    OR THIS

    SELECT *
    FROM (VALUES (1), (2), (3), (1)) AS t1(id)
    UNION
    SELECT *
    FROM (VALUES (4), (5))    AS t2(id)d)

    sorry about the TYPO's

  • it depends :-), but both queries could return different results when are used with real tables.

    Your  first query (with UNION ALL) could return duplicates, when you have the same ids in the first and second subquery (e.g. when you add a (2) into the value list after the (5)). The second query (with UNION) would eliminate this duplicates too.

    So it is not valid to compare the performance, since you are comparing different (but similar) things.

    PS:  if you are SURE, that there are only different IDs in both subqueries, the UNION ALL query would be slightly faster, since it is easier to eliminate duplicates in a e.g. 10 and another 20 items list, than in a 30 items list (in a 10 item list, you have to compare the first with the second until the 10th, the second value with 3rd until 10th and so on, so you have 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = 45 possible combinations; in a 20 items list there are 209 combinations and in a 30 items list 464)

    God is real, unless declared integer.

  • t.franz - Tuesday, June 12, 2018 7:42 AM

    it depends :-), but both queries could return different results when are used with real tables.

    Your  first query (with UNION ALL) could return duplicates, when you have the same ids in the first and second subquery (e.g. when you add a (2) into the value list after the (5)). The second query (with UNION) would eliminate this duplicates too.

    So it is not valid to compare the performance, since you are comparing different (but similar) things.

    DUH, You are so correct.  and now I see it. I should have thought that through a little better before posting.  Time for more coffee.. 🙂

  • Thanks for the deceptively simple question. I know I'll use this reminder going forward. Until I forget again..

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply