RE: Sub query expressions vs. Joins.

SSC Guru

Points: 1003863

July 22, 2010 at 6:50 am

#1196813

glen.Gorman (7/22/2010)
I was thinking something is missing from this article, and whats missing is the reasoning why joins are generally better than sub queries. The reasoning give you more knowledge of why and when you should use joins as opposed sub queries. It would be erroneous otherwise.
Having an Oracle background the answer comes to me quite fast. They are better because they can be used by the optimizing compiler to reduce the amount of data that the database has to go though to get the answer.
Thus if the solution requires that the database go though all of the data then a sub-query might be better because a join may force the use of an index, and in that case you are traversing the index and all of the data, such as joining a large and small data set in addition to requiring all of the data from the larger data set.

Maybe that's true in Oracle but it's not true in SQL Server. Oracle is mostly rule based and SQL Server is cost based. The other thing is that the Execution Plan in SQL Server lies like a rug when it comes to real cost and Grant Fritchey used one of my examples in his Apress book to demonstrate. You should only use the execution plan as a guide, not as gospel when it comes to performance. Here's an example similar to what Grant used in his book...

/****************************************************************************************

Purpose:

This code demonstrates that the estimated and actual execution plans in SQL Server can

be 100% INCORRECT and that the execution plan should only be relied on to provide hints

as to what may be wrong with a query rather than an absolute indication. This code runs

in SQL Server 2005 only.

The code creates a temp table for 10 years worth of dates starting with 2000-01-01 using

two different methods. The first method uses a recursive CTE and the second method uses

a "Tally" table. The output of each method is directed to a "throw-away" variable to

take display delays out of the picture.

Please check both the actual and estimated execution plans and compare the % of batch.

Please see the following article on how to build a Tally table and how they can be used

to replace certain While Loops.

http://www.sqlservercentral.com/articles/T-SQL/62867/

****************************************************************************************/

SET NOCOUNT ON

--=======================================================================================

-- Recursive method shown by (Name with-held)

--=======================================================================================

PRINT '========== Recursive method =========='

--===== Turn on some performance counters ===============================================

SET STATISTICS IO ON

SET STATISTICS TIME ON

DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================

DECLARE @DateVal DATETIME

SET @DateVal = '2000-01-01'

;with mycte as

(

select @DateVal AS DateVal

union all

select DateVal + 1

from mycte

where DateVal + 1 < DATEADD(yy, 10, @DateVal)

)

select @Bitbucket = d.dateval

from mycte d

OPTION (MAXRECURSION 0)

--===== Turn off the performance counters and print a separator =========================

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT REPLICATE('=',90)

GO

--=======================================================================================

-- Tally table method by Jeff Moden

--=======================================================================================

PRINT '========== Tally table method =========='

--===== Turn on some performance counters ===============================================

SET STATISTICS IO ON

SET STATISTICS TIME ON

DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================

DECLARE @StartDate AS DATETIME

SET @StartDate = '2000-01-01'

SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,10,@StartDate)))

@Bitbucket = @StartDate-1+t.N

FROM dbo.Tally t

ORDER BY N

--===== Turn off the performance counters and print a separator =========================

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT REPLICATE('=',90)

GO

In the case of this article, a similar problem occurs simply because the author didn't test identically functioning code.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)