Improving performance on joins to large tables

  • I liked the scripts provided by Cade Bryant, but have some suggestions that could improve things further.

    First, Cade's original scripts and some comments about it. It is bascally a case of missing predicates...

    SELECT *

    FROM Contact c

    LEFT JOIN CONAddress a

    ON c.ContactID = a.ContactID

    LEFT JOIN CONPhoneEmail p

    ON c.ContactID = p.ContactID

    WHERE c.ContactID = @ContactID

    The big issue is the WHERE clause, and when this is applied. According to the ANSI rules (as I understand them) a WHERE clause will be applied after the ON clauses have been processed.

    Therefore, in Cade's script, SQL is building an intermediate result set by joining all rows of CONAddress where CONAddress.ContactID = Contact.ContactID. This is repeated with CONPhoneEmail. After these joins are done, the intermediate result set x is filtered by x.ContactID = @ContactID and returned to the caller.

    If additional predicates are added then the filtering is done when the rows are retrieved from CONAddress and CONPhoneEmail. The additional predicates are the AND clauses below...

    SELECT *

    FROM Contact c

    LEFT JOIN CONAddress a

    ON c.ContactID = a.ContactID

    AND c.ContactID = @ContactID

    AND a.ContactID = @ContactID

    LEFT JOIN CONPhoneEmail p

    ON c.ContactID = p.ContactID

    AND c.ContactID = @ContactID

    AND p.ContactID = @ContactID

    WHERE c.ContactID = @ContactID

    OK, so why are we doing both

    AND c.ContactID = @ContactID

    AND a.ContactID = @ContactID

    This is because SQL Server does not do something known as 'predicate transitive closure' This means that if we say c = 5 AND a = c, SQL can NOT work out that a = 5. If you know that a is also 5, then you have to tell SQL Server!

    Therefore, predicates for both a.ContactID and C.ContactID are given so that SQL Server has maximum flexibility in choosing if table a or table c is the best choice for the inner table of a nested join.

    The main point of this posting is that you need to specify all possible predicates if you want your SQL to run in the fastest time.

    All information provided is a personal opinion that may not match reality.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • This was removed by the editor as SPAM

  • Cool! I'll have to try this one!

    BTW, when people on here say 'large tables'...do they mean 10 million+ records or what?

  • quote:


    According to the ANSI rules (as I understand them) a WHERE clause will be applied after the ON clauses have been processed.


    Remember this is only the logical order (it makes a difference with outer joins), and not necessarily the physical one.

    E.g. if a WHERE clause references one column in a covering index, SQL Server is likely to filter the records when accessing the table(index) data, rather than applying the filter oinly after processing the join.

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply