Performance Impact of using Union in where clause

  • please comment , I am not able to track what optimizer is doing differently in both

    Scenario 1 takes longer and generates massive I/O

    scenario 2 takes fraction of scenario1

    Scenario 1

    =========================================================

    select top 10 * from XYZ

    where Col1 IN ( select Col2 from ABC where Col3>date

    union

    select Col2 from ABC where Col4>date

    union

    select Col2 from ABC where Col5>date

    )

    =========================================================

    Scenario 2

    =========================================================

    declare @tblvar (tblcol1 int)

    insert into @tblvar

    select Col2 from ABC where Col3>date

    union

    select Col2 from ABC where Col4>date

    union

    select Col2 from ABC where Col5>date

    select top 10 * from XYZ

    where Col1 IN (select tblcol1 from @tblvar)

    =========================================================

  • To start with UNION means remove duplicates, and IN doesn't care if you have duplicates, so UNION ALL rather.

    Make sure you test both multiple times. If you only test once, then scenario 1 pays the overhead of loading the data from disk, 2 does not,

    Other than that, we'd have to see the execution plans to advise.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • In addition to Gail's fine points, the key difference between the two is that for one you materialize the result of the UNION query, which means that query only gets executed once.

    I won't be surprised if you post the actual execution plans that the TOP 10 is leading SQL Server to assume incorrectly that a nested loop join will work acceptably, and that for the poorly performing query the UNION subquery is getting executed a large number of times as a result. One indication of that would be if the performance difference between the two versions disappeared when the queries were run without the TOP clause.

    As Gail said, though, we'll need to see actual execution plans of the two queries posted as .sqlplan files to be sure.

    Just run the queries in SSMS with "Include Actual Execution Plan" on, right-click the plan, save it as a .sqlplan file, and post it here as an attachment.

    Cheers!

    EDIT: As a side point, even the faster query might be suboptimal because of the use of a table variable. Because of when execution plans for queries referencing table variables are usually generated (namely, when the table variable is empty), the plans often assume 1 row in the table variable. Table variables also don't have distribution statistics, which can again lead to inaccurate estimates.

    Depending again on the actual execution plans we see, you might be better off using a temp table instead of a table variable.

  • Thank You Gail and Jacob for you reply's

    Gail

    I tried running query multiple times every time massive I/O on scenario 1

    Jacob

    I looked at the number of execution the queries (part of union) in scenario 1 are huge equal to number of rows each subquery returns .

    it is a query used in App so removing top 10 will not help , neverthless I tried it ran over a min so stopped it scenario 1 or scenario 2

    yes Table variable was producing inconsistent results i switched it to temp table work like a charm and consistent resilts

    I will try to post the execution plan

  • SELECT top 10 * from XYZ x

    WHERE EXISTS (SELECT * FROM ABC a

    WHERE a.Col2 = x.Col1

    AND (Col3>date OR Col4>date OR Col5>date)

    )

    For this query (any version of it) to perform well the table ABC must have indexes on (Col3, Col2), (Col4, Col2), (Col5, Col2).

    _____________
    Code for TallyGenerator

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply