Improving performance on joins to large tables

Question

Post reply

Improving performance on joins to large tables

EdVassie

SSC Guru

Points: 60527
More actions
June 23, 2003 at 6:23 am

#82390

I liked the scripts provided by Cade Bryant, but have some suggestions that could improve things further.
First, Cade's original scripts and some comments about it. It is bascally a case of missing predicates...
SELECT *
FROM Contact c
LEFT JOIN CONAddress a
ON c.ContactID = a.ContactID
LEFT JOIN CONPhoneEmail p
ON c.ContactID = p.ContactID
WHERE c.ContactID = @ContactID
The big issue is the WHERE clause, and when this is applied. According to the ANSI rules (as I understand them) a WHERE clause will be applied after the ON clauses have been processed.
Therefore, in Cade's script, SQL is building an intermediate result set by joining all rows of CONAddress where CONAddress.ContactID = Contact.ContactID. This is repeated with CONPhoneEmail. After these joins are done, the intermediate result set x is filtered by x.ContactID = @ContactID and returned to the caller.
If additional predicates are added then the filtering is done when the rows are retrieved from CONAddress and CONPhoneEmail. The additional predicates are the AND clauses below...
SELECT *
FROM Contact c
LEFT JOIN CONAddress a
ON c.ContactID = a.ContactID
AND c.ContactID = @ContactID
AND a.ContactID = @ContactID
LEFT JOIN CONPhoneEmail p
ON c.ContactID = p.ContactID
AND c.ContactID = @ContactID
AND p.ContactID = @ContactID
WHERE c.ContactID = @ContactID
OK, so why are we doing both
AND c.ContactID = @ContactID
AND a.ContactID = @ContactID
This is because SQL Server does not do something known as 'predicate transitive closure' This means that if we say c = 5 AND a = c, SQL can NOT work out that a = 5. If you know that a is also 5, then you have to tell SQL Server!
Therefore, predicates for both a.ContactID and C.ContactID are given so that SQL Server has maximum flexibility in choosing if table a or table c is the best choice for the inner table of a nested join.
The main point of this posting is that you need to specify all possible predicates if you want your SQL to run in the fastest time.
All information provided is a personal opinion that may not match reality.
Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.
When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Fubak Ten Centuries Points: 1174 More actions · Answer 1

Cool! I'll have to try this one!

BTW, when people on here say 'large tables'...do they mean 10 million+ records or what?

stax68 SSChampion Points: 11761 More actions · Answer 2

quote:
According to the ANSI rules (as I understand them) a WHERE clause will be applied after the ON clauses have been processed.

Remember this is only the logical order (it makes a difference with outer joins), and not necessarily the physical one.

E.g. if a WHERE clause references one column in a covering index, SQL Server is likely to filter the records when accessing the table(index) data, rather than applying the filter oinly after processing the join.

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

Improving performance on joins to large tables

Cookies on SQLServerCentral