Quick question regarding UNION

  • Hi all,

    A colleague of mine has just asked me if I can explain why a SQL statement he's written that uses UNION to join together 5 different SQL statements is running quicker than if he runs each of the separate statements individually.

    Is there a general reason for this, or would it be specific to his query?

    The SQL he's written isn't great (he's got a CASE statement in the JOINS in each of the 5 statements he's unioning together), but fundamentally I cannot see how under any circumstance why one part of an UNION statement would run slower than the whole thing because it's got to run that one part as part of the whole thing anyway.

    Or am I missing something?

    I can get details of the SQL code and tables being used if necessary.

    Thanks

    Steve

    Regards

    Steve

  • There could be a number of reasons. Have you had a look at the query plan? This will more than likely give you the answer quite promptly 😎

    If you're not sure after looking at them, you can save and upload the query plans in a reply, and then many of us here will be more than happy to see where the performance changes are.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Steve

    DDL, code and execution plans would indeed be helpful, please.

    When you do a UNION, the result sets all have to be sorted together to eliminate duplicates, so it does seem surprising that the UNIONed query goes faster. Maybe the query optimizer is making some efficiency savings in the background somewhere. Hard to say without seeing the plans.

    John

  • Thanks guys, I've asked him to get the execution plan.

    If I struggle to see the issue, I'll get some DDL together.

    Regards

    Steve

  • I'd love to see the execution plan for this.

    Guessing. If he's doing UNION and not UNION ALL, the UNION acts as an aggregator and it might work to filter differently than when running each of the queries separately.

    Get us the plans, please.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I think i've figured this out. The execution plan suggested the JOINs were the issue so I asked for the TSQL code.

    He's got the following no no's as far as I can see.

    Mix of LEFT and RIGHT JOINS

    He's converting the datatypes in the joins

    He has a CASE statement in one of the joins.

    And I've also found out that 2 of the 4 tables being referenced have no indexes and have no foreign key link back to the tables they are being joined to!!! (hence the data type conversion taking place)

    SELECT

    __clients.id [XPLAN CLIENT ID]

    , Clients.ClientRef AS [AO CLIENT ID]

    , Policies.PolicyRef AS [AO POLICY REF]

    , __entity_asset.id AS [XPLAN POLICY ID]

    , 'Asset' AS [POLICY TYPE]

    FROM

    __clients

    LEFT JOIN Clients ON CONVERT(VARCHAR, Clients.ClientRef) + '_100' = otherid OR CONVERT(VARCHAR, Clients.ClientRef) + '_0' = __clients.otherid

    RIGHT JOIN Policies ON

    CASE

    WHEN Policies.Owner <> 50 THEN CONVERT(Varchar,Policies.ClientRef) + '_' + CONVERT(VARCHAR, Policies.Owner)

    ELSE CONVERT(VARCHAR, Policies.ClientRef) + '_100'

    END = __clients.otherid

    LEFT JOIN __entity_asset ON (__entity_asset.section like '%Policies%') AND (CONVERT(VARCHAR,__entity_asset.otherid) = CONVERT(VARCHAR,Policies.PolicyRef))

    WHERE

    __entity_asset.id IS NOT NULL

    :crazy:

    Regards

    Steve

  • smw147 (12/21/2016)


    I think i've figured this out. The execution plan suggested the JOINs were the issue so I asked for the TSQL code.

    He's got the following no no's as far as I can see.

    Mix of LEFT and RIGHT JOINS

    He's converting the datatypes in the joins

    He has a CASE statement in one of the joins.

    And I've also found out that 2 of the 4 tables being referenced have no indexes and have no foreign key link back to the tables they are being joined to!!! (hence the data type conversion taking place)

    SELECT

    __clients.id [XPLAN CLIENT ID],

    , Clients.ClientRef AS [AO CLIENT ID],

    , Policies.PolicyRef AS [AO POLICY REF],

    , __entity_asset.id AS [XPLAN POLICY ID],

    ,'Asset' AS [POLICY TYPE]

    FROM

    __clients

    LEFT JOIN Clients ON CONVERT(VARCHAR, Clients.ClientRef) + '_100' = otherid OR CONVERT(VARCHAR, Clients.ClientRef) + '_0' = __clients.otherid

    RIGHT JOIN Policies ON

    CASE

    WHEN Policies.Owner <> 50 THEN CONVERT(Varchar,Policies.ClientRef) + '_' + CONVERT(VARCHAR, Policies.Owner)

    ELSE CONVERT(VARCHAR, Policies.ClientRef) + '_100'

    END = __clients.otherid

    LEFT JOIN __entity_asset ON (__entity_asset.section like '%Policies%') AND (CONVERT(VARCHAR,__entity_asset.otherid) = CONVERT(VARCHAR,Policies.PolicyRef))

    WHERE

    __entity_asset.id IS NOT NULL

    :crazy:

    This query isn't viable, the SELECT list isn't properly formed. Just as a matter of interest, what is the table source __clients derived from?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • OOPs, that's my fault, ignore the commas at the end (i've edited my above post now)

    The __clients and __entity_asset tables have no indexes. They get created by a python tool which inserts CSVs into the SQL database as tables.

    Regards

    Steve

  • smw147 (12/21/2016)


    OOPs, that's my fault, ignore the commas at the end (i've edited my above post now)

    The __clients and __entity_asset tables have no indexes. They get created by a python tool which inserts CSVs into the SQL database as tables.

    Ask your python to create these tables with ClientRef and '_100' or whatever in different columns and with the same datatype as the native tables, so that __clients can join to clients on two columns, ClientRef and something else.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • That's exactly what I'm going to ask the Python developers to do.

    Essentially, the whole process that gets us to this stage is a SQL databse being prepared and migrated from one system to another. The python program outputs the results of records it creates in the target database as CSV files. It then inserts these CSVs into the source databse as tables. The trouble is the outpu CSVs have no ClientRef column to link back to the client, or a PolicyRef column to link back to the Policies table.

    It's a legacy system that we're slowly trying to "fix" and improve.

    Regards

    Steve

  • Joy. That's going to run great... For certain values of great.

    Still, curious about how the UNION resolves vs. the individual queries.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply