JOIN vs Subquery

  • I realize that I'm probably not providing all the information that might be necessary to answer this question; if so, please let me know what other information you would need.

    I have three tables - two of which are storing relatively small numbers of records (<4000), while the third is storing a relatively large number of records (>100000).

    All fields in the WHERE clauses are indexed in both versions, and in Query 2, each of the sub-queries is fully indexed.

    Query 1:

    SELECT

    nll_newsletterID,

    COUNT (DISTINCT CASE WHEN ad_TypeID != 9 THEN tuc_UserID ELSE NULL END) AS Impressions,

    COUNT (DISTINCT CASE WHEN ad_TypeID != 9 AND tuc_Clicks > 0 THEN tuc_UserID ELSE NULL END) AS Clicks,

    COUNT (DISTINCT CASE WHEN ad_TypeID = 9 AND tuc_Clicks > 0 THEN tuc_UserID ELSE NULL END) AS Unsubscribes

    FROM ct_Newsletter (nolock)

    JOIN ct_TrackingUserClick WITH (NOLOCK) ON nll_NewsletterID = tuc_NewsletterID

    JOIN ct_Advertisement WITH (NOLOCK) ON ad_adID = tuc_adID

    WHEREnll_del = 0

    AND nll_dateSent IS NOT NULL

    AND nll_dateSent BETWEEN (GetDate() - 90) AND (GetDate() - 21)

    GROUP BY nll_newsletterID

    Query 2:

    SELECT

    nll_newsletterID,

    (

    SELECT COUNT(DISTINCT(tuc_UserID))

    FROM ct_TrackingUserClick (NOLOCK)

    JOIN ct_Advertisement (NOLOCK) ONad_adID = tuc_adID

    AND ad_typeID != 9

    WHERE tuc_NewsletterID = nll_newsletterID

    ) AS Impressions,

    (

    SELECT COUNT(DISTINCT(tuc_UserID))

    FROM ct_TrackingUserClick (NOLOCK)

    JOIN ct_Advertisement (NOLOCK) ONad_adID = tuc_adID

    AND ad_typeID != 9

    WHEREtuc_NewsletterID = nll_newsletterID

    AND tuc_Clicks > 0

    ) AS Clicks,

    (

    SELECT COUNT(DISTINCT(tuc_UserID))

    FROM ct_TrackingUserClick (NOLOCK)

    JOIN ct_Advertisement (NOLOCK) ONad_adID = tuc_adID

    AND ad_typeID = 9

    WHEREtuc_NewsletterID = nll_newsletterID

    AND tuc_Clicks > 0

    ) AS Unsubscribes

    FROM ct_Newsletter (nolock)

    WHEREnll_del = 0

    AND nll_dateSent IS NOT NULL

    AND nll_dateSent BETWEEN (GetDate() - 90) AND (GetDate() - 21)

    I'm mainly just wondering why Query 2 performs better than Query 1. I have been using Query 2, but was trying to do some optimizations, and figured Query 1 would be a more optimal solution; I was thus fairly surprised to find out Query 1 was performing around 10 times worse than Query 2.

    I had expected Query 1 to perform better because it was doing a single query, and aggregating the results, while Query 2 was doing a sub-query for every record of the outer query.

  • You may want to look at the estimated execution plan for each version -- use Ctrl+L or find the icon near the "Execute" button in SSMS. Just because you've coded a query to do "a sub-query for every record of the outer query" doesn't mean the optimizer will create a plan that gathers the data iteratively.

  • I would have to see both query plans to give a definite answer, but this is my supposition.

    By JOINING the tables you are adding the overhead of joining all relevant rows in the tables. In addition the aggregations operations necessary to get the distinct counts might change (from a stream aggregate to a hash match, for example.)

    On the other hand, the subqueries can just aggregate the individual tables and produce single row results which then are joined together at practically no cost. The individual aggregations will be more efficient because the row sizes will be smaller and possibly because they can take advantage of indexes other than those that were used for the joins.

    Some of the real heavyweights might have other opinions, but the final answers lie within the query plans.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • From your r query it seems the Degree of Parallelism has been acheived

    check the following link ,

    http://msdn.microsoft.com/en-us/library/ms175097.aspx

  • Even if these query passed the parallelism threshold with such a small number of tables and a few rows, the splitting and marshalling of the streams would make the query more expensive, not less so. But based on the queries, I doubt they've hit the parallelism threshold. These aren't terribly complicated queries.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore Roosevelt
    The Scary DBA
    Author of: SQL Server 2022 Query Performance Tuning, 6th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply