JOIN vs Subquery

Question

JOIN vs Subquery

kramaswamy

SSCoach

Points: 18181
More actions
October 21, 2009 at 4:02 pm

#136629

I realize that I'm probably not providing all the information that might be necessary to answer this question; if so, please let me know what other information you would need.
I have three tables - two of which are storing relatively small numbers of records (<4000), while the third is storing a relatively large number of records (>100000).
All fields in the WHERE clauses are indexed in both versions, and in Query 2, each of the sub-queries is fully indexed.
Query 1:
SELECT
nll_newsletterID,
COUNT (DISTINCT CASE WHEN ad_TypeID != 9 THEN tuc_UserID ELSE NULL END) AS Impressions,
COUNT (DISTINCT CASE WHEN ad_TypeID != 9 AND tuc_Clicks > 0 THEN tuc_UserID ELSE NULL END) AS Clicks,
COUNT (DISTINCT CASE WHEN ad_TypeID = 9 AND tuc_Clicks > 0 THEN tuc_UserID ELSE NULL END) AS Unsubscribes
FROM ct_Newsletter (nolock)
JOIN ct_TrackingUserClick WITH (NOLOCK) ON nll_NewsletterID = tuc_NewsletterID
JOIN ct_Advertisement WITH (NOLOCK) ON ad_adID = tuc_adID
WHEREnll_del = 0
AND nll_dateSent IS NOT NULL
AND nll_dateSent BETWEEN (GetDate() - 90) AND (GetDate() - 21)
GROUP BY nll_newsletterID
Query 2:
SELECT
nll_newsletterID,
(
SELECT COUNT(DISTINCT(tuc_UserID))
FROM ct_TrackingUserClick (NOLOCK)
JOIN ct_Advertisement (NOLOCK) ONad_adID = tuc_adID
AND ad_typeID != 9
WHERE tuc_NewsletterID = nll_newsletterID
) AS Impressions,
(
SELECT COUNT(DISTINCT(tuc_UserID))
FROM ct_TrackingUserClick (NOLOCK)
JOIN ct_Advertisement (NOLOCK) ONad_adID = tuc_adID
AND ad_typeID != 9
WHEREtuc_NewsletterID = nll_newsletterID
AND tuc_Clicks > 0
) AS Clicks,
(
SELECT COUNT(DISTINCT(tuc_UserID))
FROM ct_TrackingUserClick (NOLOCK)
JOIN ct_Advertisement (NOLOCK) ONad_adID = tuc_adID
AND ad_typeID = 9
WHEREtuc_NewsletterID = nll_newsletterID
AND tuc_Clicks > 0
) AS Unsubscribes
FROM ct_Newsletter (nolock)
WHEREnll_del = 0
AND nll_dateSent IS NOT NULL
AND nll_dateSent BETWEEN (GetDate() - 90) AND (GetDate() - 21)
I'm mainly just wondering why Query 2 performs better than Query 1. I have been using Query 2, but was trying to do some optimizations, and figured Query 1 would be a more optimal solution; I was thus fairly surprised to find out Query 1 was performing around 10 times worse than Query 2.
I had expected Query 1 to perform better because it was doing a single query, and aggregating the results, while Query 2 was doing a sub-query for every record of the outer query.

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

john.arnott SSChampion Points: 11882 More actions · Answer 1

You may want to look at the estimated execution plan for each version -- use Ctrl+L or find the icon near the "Execute" button in SSMS. Just because you've coded a query to do "a sub-query for every record of the outer query" doesn't mean the optimizer will create a plan that gathers the data iteratively.

The Dixie Flatline SSC Guru Points: 53353 More actions · Answer 2

I would have to see both query plans to give a definite answer, but this is my supposition.

By JOINING the tables you are adding the overhead of joining all relevant rows in the tables. In addition the aggregations operations necessary to get the distinct counts might change (from a stream aggregate to a hash match, for example.)

On the other hand, the subqueries can just aggregate the individual tables and produce single row results which then are joined together at practically no cost. The individual aggregations will be more efficient because the row sizes will be smaller and possibly because they can take advantage of indexes other than those that were used for the joins.

Some of the real heavyweights might have other opinions, but the final answers lie within the query plans.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

varunragul SSC Eights! Points: 924 More actions · Answer 3

From your r query it seems the Degree of Parallelism has been acheived

check the following link ,

http://msdn.microsoft.com/en-us/library/ms175097.aspx

Grant Fritchey SSC Guru Points: 399418 More actions · Answer 4

Even if these query passed the parallelism threshold with such a small number of tables and a few rows, the splitting and marshalling of the streams would make the query more expensive, not less so. But based on the queries, I doubt they've hit the parallelism threshold. These aren't terribly complicated queries.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning