sql performance question

  • Hi All, 

    Recently, I have undergone a computer based exam where in they asked below question.Need some inputs from experts.

    1. Below correlated query is causing performance issue. Please explain why it is causing performance issue and optimize the query.

    SELECT DISTINCT col1 FROM tab1 WHERE col2 = 'A' AND NOT EXISTS (SELECT 'x’ from tab2 where tab2.col2 = 'B’ and tab2.col1 = tab1.col1)

    I don't any idea on how many rows each table has or what indexes does exist on the tables. Looking at the above question can anyone tell this is a bad query or how can we re-write the query to perform better ?

    Thanks,

    Sam

  • Maybe the DISTINCT, which just gets more expensive the more rows and columns you have? More than likely the lack of indexes which we have no details on. I can't really see why the correlated query would be the cause of the performance issue unless there weren't indexes on the tables; and then the problem is that there aren't good indexes, not the correlated query.

    You could, however, alternatively, right is as a LEFT JOIN query:
    SELECT DISTINCT T1.Col1
    FROM dbo.Tab1 T1
        LEFT JOIN dbo.Tab2 T2 ON T1.Col1 = T2.Col1
                             AND T2.Col2 = 'B'
    WHERE T1.Col1 = 'A'
    AND T2.Col1 IS NULL;

    There's very little between these 2 execution plans though.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Thursday, November 29, 2018 2:21 AM

    Maybe the DISTINCT, which just gets more expensive the more rows and columns you have? More than likely the lack of indexes which we have no details on. I can't really see why the correlated query would be the cause of the performance issue unless there weren't indexes on the tables; and then the problem is that there aren't good indexes, not the correlated query.

    You could, however, alternatively, right is as a LEFT JOIN query:
    SELECT DISTINCT T1.Col1
    FROM dbo.Tab1 T1
        LEFT JOIN dbo.Tab2 T2 ON T1.Col1 = T2.Col1
                             AND T2.Col2 = 'B'
    WHERE T1.Col1 = 'A'
    AND T2.Col1 IS NULL;

    There's very little between these 2 execution plans though.

    I'd agree with Thom here. Often the plans for the two query forms are indistinguishable. 
    There's an optimization in the NOT EXISTS version which under some circumstances can make this form of the query much faster than the LEFT JOIN version.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thank you all.

  • I'd experiment with making it a JOIN instead of an exists, but there's nothing inherently evil in the code that I can see. Did they give you a "correct" answer?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Thursday, November 29, 2018 6:21 AM

    I'd experiment with making it a JOIN instead of an exists, but there's nothing inherently evil in the code that I can see. Did they give you a "correct" answer?

    No Sir.

  • samantha.sqldba - Thursday, November 29, 2018 6:38 AM

    Grant Fritchey - Thursday, November 29, 2018 6:21 AM

    I'd experiment with making it a JOIN instead of an exists, but there's nothing inherently evil in the code that I can see. Did they give you a "correct" answer?

    No Sir.

    Did you ask them for one, or could you? Considering how open ended the question is, one would hope that their answer is quite broad.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • To avoid having to lookup every duplicate tab1.col1 value in tab2, which is pointless, I'd try this query:


    SELECT t1.col1
    FROM (
      SELECT DISTINCT col1
      FROM tab1
      WHERE col2 = 'A'
    ) AS t1
    LEFT OUTER JOIN (
      SELECT col1
      FROM tab2
      WHERE col2 = 'B'
    ) AS t2 ON t2.col1 = t1.col1
    WHERE t2.col1 IS NULL

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Just my humble opinion but (provided I'm not missing some awesome trick) my initial thought is that this is is a "manhole cover" question.  It's littered with supposed "worst practices" (DISTINCT, not using SELECT * in the correlated subquery, which are both just a distraction when it comes to this particular code).  If the correct indexes aren't available... or if they are... I believe you can write this code 7 different ways from Sunday and still not improve the performance so as to not be labeled a performance problem because, as it stands, it's not a performance problem to begin with.

    My answer on the test would have been...

    "The code is fine and requires no changes.  Whomever reported the problem needed to check on whether or not the correct indexes are available to increase the speed of this already fast code.  Once that's done (which would substantially improve the performance and reduce resource usage), you could squeeze out a tiny bit more duration performance (or not and with a bit more CPU usage) and seriously increase the readability using the following nicely symmetrical code, which is functionally equivalent to the original code.  The bottom line is you shouldn't waste your time trying to fix something you've not tested.  And, no... doing things like adding a DISTINCT to the subquery, changing the "X" to a * (NOT EXISTS doesn't even look at those "placeholders"), using a TOP 1 in the subquery, or trying to pre-isolate the rows of tab1 or tab2 won't improve anything and can sometimes make them worse because the existing code actually already does that.  And, no... an outer join with NULL detection will be slower, as well."

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • samantha.sqldba - Wednesday, November 28, 2018 11:44 PM

    Hi All, 

    Recently, I have undergone a computer based exam where in they asked below question.Need some inputs from experts.

    1. Below correlated query is causing performance issue. Please explain why it is causing performance issue and optimize the query.

    SELECT DISTINCT col1 FROM tab1 WHERE col2 = 'A' AND NOT EXISTS (SELECT 'x’ from tab2 where tab2.col2 = 'B’ and tab2.col1 = tab1.col1)

    I don't any idea on how many rows each table has or what indexes does exist on the tables. Looking at the above question can anyone tell this is a bad query or how can we re-write the query to perform better ?

    Thanks,

    Sam

    I'd say that's a good query, in fact it's almost exactly how I'd write it.
    If there is a performance problem with it it will be due to missing indexes.
    The following indexes would help with this query.
    CREATE NONCLUSTERED INDEX IX_tab1_col2_col1 ON [dbo].[tab1] ([col2],[col1])
    CREATE NONCLUSTERED INDEX IX_tab2_col2_col1 ON [dbo].[tab2] ([col2],[col1])

    Some other possible ways to rewrite the query are:

    SELECT DISTINCT col1
      FROM tab1
     WHERE col2 = 'A'
    EXCEPT
    SELECT col1
      FROM tab2
     WHERE tab2.col2 = 'B'

    GO

    SELECT DISTINCT tab1.col1
      FROM tab1
      LEFT JOIN tab2
             ON tab2.col2 = 'B'
            AND tab2.col1 = tab1.col1
     WHERE tab2.col1 IS NULL

    GO

    SELECT DISTINCT tab1.col1
      FROM tab1
     WHERE tab1.col2 = 'A'
       AND tab1.col1 NOT IN (SELECT tab2.col1 FROM tab2 WHERE tab2.col2 = 'B')

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply