Sub query expressions vs. Joins.

  • I agree with Johannes Krackowizer... the two queries are not the same. The correlated subquery is effectively an outer join. After a bit of repair to make both queries do exactly the same thing and a bit of code to take the display time out of the picture, let's run the code and take some statistics to see what the real story is...

    DECLARE @Bitbucket INT,

    @BITBUCKET1 [varchar](100)

    SET STATISTICS TIME,IO ON

    --Test 3

    SELECT@Bitbucket = EmployeeID,

    --Salary,

    --RActive,

    @BITBUCKET1 = (SELECT ManagerName

    FROMManagers

    WHEREManagers.ManagerID = Employees.ManagerID)

    FROMEmployees

    WHEREEmployees.ManagerID = 1

    --Test 4

    SELECT@Bitbucket = EmployeeID,

    --Salary,

    --RActive,

    @BITBUCKET1 = ManagerName

    FROMEmployees

    LEFT JOIN Managers ON (Managers.ManagerID = Employees.ManagerID)

    WHEREEmployees.ManagerID = 1

    SET STATISTICS TIME,IO OFF

    Here are the results... note that the correlated sub-query actually wins in this particular run...

    Table 'Worktable'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Managers'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Employees'. Scan count 1, logical reads 121, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 47 ms, elapsed time = 56 ms.

    Table 'Worktable'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Managers'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Employees'. Scan count 1, logical reads 121, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 62 ms, elapsed time = 57 ms.

    As you can see... no difference except for minor differences in CPU and Duration. If you run the code several times, you'll find the each takes turns winning the race.

    Contrary to what this article states, there is no difference between correlated subqueries that use an equi-join (equals sign) and the equivalent outer join in SQL Server. Even if you make the mistake of using an inner join, the correlated sub-query frequently ties the inner join for CPU and usually beats it for duration. Try it and see.

    The moral of the story is DO NOT RELY ON ESTIMATED OR ACTUAL EXECUTION PLAN COSTS OR PERCENT-OF-BATCH TO SELECT THE MOST PERFORMANT QUERY BECAUSE THE EXECUTION PLAN LIES! A sub-lesson includes "make sure both tests are actually the same".

    In Oracle? The differences between correlated sub-quireies and outer joins is a totally different story because the optimizer in Oracle is quite different. But one thing remains the same even in Oracle... when you test, you have to test things that work the same and you have to actually test for performance instead of relying on what the execution/explain plan shows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Another missing point here is that all of this depends on table size, indexes in place and data distribution the plans can change depending all of this

    It is never a clear shot to simply compare (sub-queries vs joins) the right answer is "it depends"

    Just my $0.02


    * Noel

  • Hmm learned a few things here. Thanks folks.

    - arjun

    https://sqlroadie.com/

  • Spot on Jeff as always. For those who want to explore this topic a bit more, Gail Shaw wrote a series of blog posts about Joins vs. In vs Exists. Here's a link to the summary post which links back to all her other research on the topic. http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/[/url]

    Cheers,

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • Mike Carey-249880 (7/22/2010)


    would like to see how these queries perform using CTE instead

    Heh... so write the CTE's and let's see the test results, Mike. The author did provide the table setup and population code. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Steve Jones,

    If you see this, I believe this article needs an "Interesting Discussion" flag.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/22/2010)

    The moral of the story is DO NOT RELY ON ESTIMATED OR ACTUAL EXECUTION PLAN COSTS OR PERCENT-OF-BATCH TO SELECT THE MOST PERFORMANT QUERY BECAUSE THE EXECUTION PLAN LIES! A sub-lesson includes "make sure both tests are actually the same".

    .

    Spot on , though Lie is a bit strong ;). 'Is based on estimates' is better. I recent done a 'cuppa corner' on that very subject from a table variable point of view

    http://www.screencast.com/users/SQLServerFAQ/folders/Cuppa+Corner/media/72b48209-5715-4352-8a59-445824377e83



    Clear Sky SQL
    My Blog[/url]

  • Spot on I was trying to capture that it depends as part of my response, and that you need to have a good underlying idea of how the database works to know if you should use a join or sub query.

  • I liked the article but love the discussion! I've learned a lot from it. Thanks!

    ---------------------------------------------------------------------
    Use Full Links:
    KB Article from Microsoft on how to ask a question on a Forum

  • Edit whoops, I missed all the discussion on page two... I guess that's already covered...

    I want to point out more forcefully than Tim did that these two queries are _NOT_ equivalent!

    A correlated sub-query like in tests 1 and 3 is equivalent to a _LEFT_ join, not an inner join. If you change the query 4 accordingly, you'll find that both will use the same plan and perform equivalently.

    The reason your join performs better than the correlated sub-query is that it is not asking for the same results.

    Granted, you've pointed out the performance problems that people often experience by creating a left join using a sub-query when what they meant was an inner join, but that should have been the focus of the article, not the supposed performance problems with sub-queries, of which I'm not aware of any.



    Dan Guzman - Not the MVP (7/22/2010)
    All questions have to be prefaced by Server version and 'according to MS Docs' or 'my own personal opinion based on how much detail I felt like digging into at the time.'

  • Interesting article, and even more interesting discussion, thanks everyone!

  • Although interesting, your coverage of SUBQUERIES is far from complete.

    how would you query:

    1) list employe-name for the employee(s) with highest salary

    2) who is the manager with the highest salary employee(s)

    and again, the displayed tables do not offer many more 'questions' we can ask using sub-queries

    Sam

  • Some of the comments posted seem to be following the lines of "what if this, what if that, this isn't quite right".

    I would agree that the article may leave some discrepancies but I think it highlights a novice error that I've seen at every company I've worked for.

    If you're querying tables that have a few thousand, maybe even tens of thousands of records, with ever lowering costs of high performing hardware, you probably don't need to care about how the query is constructed or executed. 50ms or 500ms means nothing to an end user and you'll get no reward for your troubles.

    But if you are dealing with millions/billions of records this is definitely a lesson to be learnt. You'll achieve massive performance increase by helping the query optimizer and giving it a well structured query before it has to process it. Then just sit back and enjoy the praise from your boss.:-P

  • gary.strange-1058508 (7/22/2010)


    Some of the comments posted seem to be following the lines of "what if this, what if that, this isn't quite right".

    I would agree that the article may leave some discrepancies but I think it highlights a novice error that I've seen at every company I've worked for.

    If you're querying tables that have a few thousand, maybe even tens of thousands of records, with ever lowering costs of high performing hardware, you probably don't need to care about how the query is constructed or executed. 50ms or 500ms means nothing to an end user and you'll get no reward for your troubles.

    But if you are dealing with millions/billions of records this is definitely a lesson to be learnt. You'll achieve massive performance increase by helping the query optimizer and giving it a well structured query before it has to process it. Then just sit back and enjoy the praise from your boss.:-P

    Yes, however, you also need to be sure you understand the query that you are rewriting. If you fall into the trap of this article, you may well replace a performance problem with a correctness problem. See how long the praise lasts when the bug reports come in...

    In that sense this article highlights _two_ novice errors.



    Dan Guzman - Not the MVP (7/22/2010)
    All questions have to be prefaced by Server version and 'according to MS Docs' or 'my own personal opinion based on how much detail I felt like digging into at the time.'

  • paul-876346 (7/22/2010)


    Thank you - I also wonder how the sub-query or join would compare to Cross (or Outer) Apply - as in the following query?

    SELECT EmployeeID, Salary, RActive, CAM.ManagerName FROM Employees

    CROSS APPLY (SELECT ManagerName FROM Managers WHERE Managers.ManagerID = Employees.ManagerID) CAM

    WHERE Employees.ManagerID = 1

    @paul-2:

    The CROSS-APPLY as written will produce essentially the same query plan as the INNER JOIN, with the same results, at the same cost.

Viewing 15 posts - 16 through 30 (of 56 total)

You must be logged in to reply to this topic. Login to reply