Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Sub query expressions vs. Joins. Expand / Collapse
Author
Message
Posted Thursday, July 22, 2010 4:25 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Monday, August 11, 2014 3:57 AM
Points: 406, Visits: 773
bayudw, i think the row_number() solution will be faster than the query you currently use. check it out and share the execution time/plan.

- arjun
Post #957053
Posted Thursday, July 22, 2010 4:40 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, March 8, 2012 3:31 AM
Points: 1, Visits: 17
There is a big flaw in the article, you take two different statements with the same result but only by accident. The subselect ist quite the same as an outer join. The second statement uses an inner join which is in this example the same because ther is no employe without manager. But if there would be an employe without manager the statments would have different results. so it depends on what you try to accomplish. If you have two tables that could be inner joined, the inner join will be better than a subselect. But if you have to do an outer join, than the subselect will be mostly equal to the join. A third scenario could bethat you need an outer join AND more than one column from the subselect (multiple subselects), than the outer join will be much better than the subselect.

So there is a big difference in which scenario you are using subselects. in some scenarios there will be virtually no difference between both approches.

And if you are using oracle databases than all of this is the exact opposite than in mssql. I tried this example on my oracle db and found out that an subselect has only half the costs than the join even if i do multiple subselects.
Post #957060
Posted Thursday, July 22, 2010 5:59 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, November 25, 2013 6:15 AM
Points: 3, Visits: 7
I was thinking something is missing from this article, and whats missing is the reasoning why joins are generally better than sub queries. The reasoning give you more knowledge of why and when you should use joins as opposed sub queries. It would be erroneous otherwise.

Having an Oracle background the answer comes to me quite fast. They are better because they can be used by the optimizing compiler to reduce the amount of data that the database has to go though to get the answer.

Thus if the solution requires that the database go though all of the data then a sub-query might be better because a join may force the use of an index, and in that case you are traversing the index and all of the data, such as joining a large and small data set in addition to requiring all of the data from the larger data set.
Post #957094
Posted Thursday, July 22, 2010 6:26 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Monday, June 23, 2014 12:45 PM
Points: 73, Visits: 135
would like to see how these queries perform using CTE instead
Post #957121
Posted Thursday, July 22, 2010 6:50 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:11 AM
Points: 36,995, Visits: 31,516
glen.Gorman (7/22/2010)
I was thinking something is missing from this article, and whats missing is the reasoning why joins are generally better than sub queries. The reasoning give you more knowledge of why and when you should use joins as opposed sub queries. It would be erroneous otherwise.

Having an Oracle background the answer comes to me quite fast. They are better because they can be used by the optimizing compiler to reduce the amount of data that the database has to go though to get the answer.

Thus if the solution requires that the database go though all of the data then a sub-query might be better because a join may force the use of an index, and in that case you are traversing the index and all of the data, such as joining a large and small data set in addition to requiring all of the data from the larger data set.


Maybe that's true in Oracle but it's not true in SQL Server. Oracle is mostly rule based and SQL Server is cost based. The other thing is that the Execution Plan in SQL Server lies like a rug when it comes to real cost and Grant Fritchey used one of my examples in his Apress book to demonstrate. You should only use the execution plan as a guide, not as gospel when it comes to performance. Here's an example similar to what Grant used in his book...

/****************************************************************************************
Purpose:
This code demonstrates that the estimated and actual execution plans in SQL Server can
be 100% INCORRECT and that the execution plan should only be relied on to provide hints
as to what may be wrong with a query rather than an absolute indication. This code runs
in SQL Server 2005 only.

The code creates a temp table for 10 years worth of dates starting with 2000-01-01 using
two different methods. The first method uses a recursive CTE and the second method uses
a "Tally" table. The output of each method is directed to a "throw-away" variable to
take display delays out of the picture.

Please check both the actual and estimated execution plans and compare the % of batch.

Please see the following article on how to build a Tally table and how they can be used
to replace certain While Loops.
http://www.sqlservercentral.com/articles/T-SQL/62867/
****************************************************************************************/
SET NOCOUNT ON
--=======================================================================================
-- Recursive method shown by (Name with-held)
--=======================================================================================
PRINT '========== Recursive method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @BitBucket DATETIME --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================
DECLARE @DateVal DATETIME
SET @DateVal = '2000-01-01'

;with mycte as
(
select @DateVal AS DateVal
union all
select DateVal + 1
from mycte
where DateVal + 1 < DATEADD(yy, 10, @DateVal)
)
select @BitBucket = d.dateval
from mycte d
OPTION (MAXRECURSION 0)

--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
GO

--=======================================================================================
-- Tally table method by Jeff Moden
--=======================================================================================
PRINT '========== Tally table method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @BitBucket DATETIME --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================
DECLARE @StartDate AS DATETIME
SET @StartDate = '2000-01-01'

SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,10,@StartDate)))
@BitBucket = @StartDate-1+t.N
FROM dbo.Tally t
ORDER BY N

--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
GO

In the case of this article, a similar problem occurs simply because the author didn't test identically functioning code.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #957143
Posted Thursday, July 22, 2010 7:05 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:11 AM
Points: 36,995, Visits: 31,516
I agree with Johannes Krackowizer... the two queries are not the same. The correlated subquery is effectively an outer join. After a bit of repair to make both queries do exactly the same thing and a bit of code to take the display time out of the picture, let's run the code and take some statistics to see what the real story is...

DECLARE @BITBUCKET INT,
@BITBUCKET1 [varchar](100)

SET STATISTICS TIME,IO ON
--Test 3
SELECT @BITBUCKET = EmployeeID,
--Salary,
--RActive,
@BITBUCKET1 = (SELECT ManagerName
FROM Managers
WHERE Managers.ManagerID = Employees.ManagerID)
FROM Employees
WHERE Employees.ManagerID = 1

--Test 4
SELECT @BITBUCKET = EmployeeID,
--Salary,
--RActive,
@BITBUCKET1 = ManagerName
FROM Employees
LEFT JOIN Managers ON (Managers.ManagerID = Employees.ManagerID)
WHERE Employees.ManagerID = 1

SET STATISTICS TIME,IO OFF


Here are the results... note that the correlated sub-query actually wins in this particular run...

Table 'Worktable'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Managers'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employees'. Scan count 1, logical reads 121, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 56 ms.

Table 'Worktable'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Managers'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employees'. Scan count 1, logical reads 121, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 62 ms, elapsed time = 57 ms.



As you can see... no difference except for minor differences in CPU and Duration. If you run the code several times, you'll find the each takes turns winning the race.

Contrary to what this article states, there is no difference between correlated subqueries that use an equi-join (equals sign) and the equivalent outer join in SQL Server. Even if you make the mistake of using an inner join, the correlated sub-query frequently ties the inner join for CPU and usually beats it for duration. Try it and see.

The moral of the story is DO NOT RELY ON ESTIMATED OR ACTUAL EXECUTION PLAN COSTS OR PERCENT-OF-BATCH TO SELECT THE MOST PERFORMANT QUERY BECAUSE THE EXECUTION PLAN LIES! A sub-lesson includes "make sure both tests are actually the same".

In Oracle? The differences between correlated sub-quireies and outer joins is a totally different story because the optimizer in Oracle is quite different. But one thing remains the same even in Oracle... when you test, you have to test things that work the same and you have to actually test for performance instead of relying on what the execution/explain plan shows.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #957155
Posted Thursday, July 22, 2010 7:18 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Tuesday, May 6, 2014 5:51 AM
Points: 6,266, Visits: 2,028
Another missing point here is that all of this depends on table size, indexes in place and data distribution the plans can change depending all of this
It is never a clear shot to simply compare (sub-queries vs joins) the right answer is "it depends"

Just my $0.02



* Noel
Post #957161
Posted Thursday, July 22, 2010 7:19 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Monday, August 11, 2014 3:57 AM
Points: 406, Visits: 773
Hmm learned a few things here. Thanks folks.

- arjun
Post #957163
Posted Thursday, July 22, 2010 7:20 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 10:05 AM
Points: 2,897, Visits: 5,981
Spot on Jeff as always. For those who want to explore this topic a bit more, Gail Shaw wrote a series of blog posts about Joins vs. In vs Exists. Here's a link to the summary post which links back to all her other research on the topic. http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/

Cheers,
-Luke.


To help us help you read this

For better help with performance problems please read this
Post #957165
Posted Thursday, July 22, 2010 7:22 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:11 AM
Points: 36,995, Visits: 31,516
Mike Carey-249880 (7/22/2010)
would like to see how these queries perform using CTE instead


Heh... so write the CTE's and let's see the test results, Mike. The author did provide the table setup and population code.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #957168
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse