Subquery vs join

  • Hi All, 

    We have a new junior developer who insists on looking up descriptions using Subqueries rather than joins.. 
    Simple Example below... 

    JOIN
    SELECT a.AccountName, d.Amount, d.EntryDate, d.TransDescription  FROM DetailTable d 
    INNER JOIN Account a on d.AccountId = a.AccountId
    WHERE d.EntryDate >= DATEADD(dd,-1,getdate())

    SUBQUERY
    SELECT(SELECT a.AccountName FROM Account a WHERE a.AccountId = d.AccountId)  , d.Amount, d.EntryDate, d.TransDescription  FROM DetailTable d WHERE d.EntryDate >= DATEADD(dd,-1,getdate())

    My gut says to me that the join has to be more efficient -  and the execution plan does look slightly better...  but in practice - on real data.. up to 100k rows in the result...   the 2nd query seems to use less IO etc...    both queries plans only see to scan the Account table once ?  Guess the optimizer is clever enough to deal with the fact that the subselect will run once per row.. and hence can run it only once?  

    Am I going crazy trying to suggest the join is more efficient - should I just wait till performance degrades before considering -  or should I stick with letting the subqueries stay? 

    Any advise appreciated. Thanks
    Steve

  • Stephen Knott - Tuesday, November 28, 2017 6:15 AM

    Hi All, 

    We have a new junior developer who insists on looking up descriptions using Subqueries rather than joins.. 
    Simple Example below... 

    JOIN
    SELECT a.AccountName, d.Amount, d.EntryDate, d.TransDescription  FROM DetailTable d 
    INNER JOIN Account a on d.AccountId = a.AccountId
    WHERE d.EntryDate >= DATEADD(dd,-1,getdate())

    SUBQUERY
    SELECT(SELECT a.AccountName FROM Account a WHERE a.AccountId = d.AccountId)  , d.Amount, d.EntryDate, d.TransDescription  FROM DetailTable d WHERE d.EntryDate >= DATEADD(dd,-1,getdate())

    My gut says to me that the join has to be more efficient -  and the execution plan does look slightly better...  but in practice - on real data.. up to 100k rows in the result...   the 2nd query seems to use less IO etc...    both queries plans only see to scan the Account table once ?  Guess the optimizer is clever enough to deal with the fact that the subselect will run once per row.. and hence can run it only once?  

    Am I going crazy trying to suggest the join is more efficient - should I just wait till performance degrades before considering -  or should I stick with letting the subqueries stay? 

    Any advise appreciated. Thanks
    Steve

    That subquery thing is crazy. There is little to no chance that is going to outperform a join. If you have 100,000 rows it is going to run 100,001 queries against the database instead of 1.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange - Tuesday, November 28, 2017 7:33 AM

    Stephen Knott - Tuesday, November 28, 2017 6:15 AM

    Hi All, 

    We have a new junior developer who insists on looking up descriptions using Subqueries rather than joins.. 
    Simple Example below... 

    JOIN
    SELECT a.AccountName, d.Amount, d.EntryDate, d.TransDescription  FROM DetailTable d 
    INNER JOIN Account a on d.AccountId = a.AccountId
    WHERE d.EntryDate >= DATEADD(dd,-1,getdate())

    SUBQUERY
    SELECT(SELECT a.AccountName FROM Account a WHERE a.AccountId = d.AccountId)  , d.Amount, d.EntryDate, d.TransDescription  FROM DetailTable d WHERE d.EntryDate >= DATEADD(dd,-1,getdate())

    My gut says to me that the join has to be more efficient -  and the execution plan does look slightly better...  but in practice - on real data.. up to 100k rows in the result...   the 2nd query seems to use less IO etc...    both queries plans only see to scan the Account table once ?  Guess the optimizer is clever enough to deal with the fact that the subselect will run once per row.. and hence can run it only once?  

    Am I going crazy trying to suggest the join is more efficient - should I just wait till performance degrades before considering -  or should I stick with letting the subqueries stay? 

    Any advise appreciated. Thanks
    Steve

    That subquery thing is crazy. There is little to no chance that is going to outperform a join. If you have 100,000 rows it is going to run 100,001 queries against the database instead of 1.

    My thoughts exactly....   going to try setup a repro using  system tables and post...

  • It's possible that the optimizer will just change it to a join anyway. If you can, post the execution plans, preferably the actual plans.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • There actually is, in theory, a fundamental difference between those queries:

    The first query will drop/ignore any row that does not have a matching row in the Account table, and
    ....if there are multiple matches to Account, the Detail rows will be duplicated for every match.
    The second query will include all qualified rows from the DetailTable, with a NULL value for the AccountName if it's not found, and will never duplicate the Detail row in the output.

    Functionally they will presumably produce the same results, since there should always be a single matching row in the parent/master table.

    I, too, suspect that SQL will convert it to a JOIN, but it should be a LEFT OUTER JOIN rather than an INNER JOIN.  That can sometimes have quirky side effects on how a query is processed, although for a query as simple and straightforward as that one I wouldn't expect to see quirks for it.

    Edit: Correct typos in wording.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Tuesday, November 28, 2017 10:45 AM

    There actually is, in theory, a fundamental different between those queries:

    The first query will drop/ignore any row that does not have a matching row in the Account table, and
    ....if there are multiple matches to Account, the Detail rows will be duplicated for every match.
    The second query will include all qualified rows from the DetailTable, with a NULL value for the AccountName if it's not found, and will never duplicate the Detail row in the output.

    Functionally they will presumably produce the same results, since this should always be a single matching row in the parent/master table.

    I, too, suspect that SQL will convert it to a JOIN, but it should be a LEFT OUTER JOIN rather than an INNER JOIN.  That can sometimes have quirky side effects on how a query is processed, although for a query as simple and straightforward as that one I wouldn't expect to see quirks for it.

    Also, the subquery will fail if it tries to return more than one row of data.

  • You might be able to use the fact that the queries could produce different results as a cudgel to get your co-worker to stop writing his queries that way and use INNER JOIN instead.  If everyone else uses IJ, and there's no formal FK relationship between the tables, his style could produce different results from everyone else,which a company may legitimately want to prevent.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply