SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Fun with Outer Joins


Fun with Outer Joins

Author
Message
mike.sansone
mike.sansone
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 20
...
peterf-761017
peterf-761017
Grasshopper
Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)

Group: General Forum Members
Points: 19 Visits: 33
Thanks for this article. It brings to the fore my lack of understanding of the difference between the join and where clauses.
I still don't understand why the clause 'AND Class.ClassYear >= 2011' produces a different result when used as part of the join.
It seems to me that it can't contribute to the join because it refers only to the values in one of the tables in the join, so on the face of it, it acts as an ersatz where, cutting down the rows in the Classes table involved in the join.
But, that being the case, why is the result different when it's transplanted to the real where clause? Why are the rows with NULL values in the ClassYear retained?

Would greatly appreciate it if you could clarify that for me.
chiesa.alberto
chiesa.alberto
SSC Journeyman
SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)

Group: General Forum Members
Points: 87 Visits: 171
Look at my previous post for an explanation: it's a matter of order of execution.
Basically, what happens (from a logical point), is this:
1. A cartesian product of the two tables is produced (the query optimizer will probably use some other algorithm in practice, but you can ignore that to understand HOW the rows are included/excluded/calculated).
2. Each pair of rows is evaluated by the ON clause. So, the ON eliminates unneeded rows from the join.
3. The OUTER kicks in and gets all the rows that where excluded by the ON (from one table or both).
4. The WHERE excludes rows from the new list.

In case of INNER joins, step 3 is not executed, so conditionals can be switched from the ON clause to the WHERE freely.
No magic, no secrets. Only execution order.
pg53
pg53
SSC Veteran
SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)

Group: General Forum Members
Points: 209 Visits: 253
Try it like this:

[b]so on the face of it, it acts as an ersatz where, cutting down the rows in the Classes table involved in the join.




Yes, you are right, this is what it does - but it does it before the OUTER restores the rows from the Professor table, which also adds the null parts on the right. Thus the nulls are being added in, as it were, after the WHERE clause equivalent.
Conversely, using the actual WHERE clause means that it is applied after the nulls have been added, and so it removes them.
peterf-761017
peterf-761017
Grasshopper
Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)

Group: General Forum Members
Points: 19 Visits: 33
Thank you. That's what I wasn't getting. The difference between the ON and the OUTER. I was conflating the two (yes, even while reading the steps as described by chiesa.alberto).

Thanks to both of you. I'm going to pin the execution order up somewhere prominent, so I take it into account from now on.
Kenneth Fisher
Kenneth Fisher
SSCertifiable
SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)

Group: General Forum Members
Points: 6125 Visits: 2058
sneumersky (1/17/2014)
Charles Kincaid (1/17/2014)
Just always, ALWAYS remember the order of execution.


I added the emphasis. Yes, understanding this execution order is why I have been able to write good performance filtered queries with no entries in the WHERE clause at all. I have shown these around our office and folks are calling "Where is your WHERE?" Further since the WHERE takes place after the JOINS (in the FROM) moving the predicates to the JOINS can reduce memory consumption.

I have a rule that I keep in mind while designing queries: JOINS put stuff into the result set. WHERE takes stuff out of the result set.


Well said.


Honestly, while I agree that knowing the execution order is important, the idea that all queries can exclude entries in the WHERE clause makes me nervious as *&@#. If you are doing INNER JOINs then no problem, it will have the same effect. I just don't see how you can manage restrictions on a query with an OUTER or CROSS JOIN without a WHERE clause.

Kenneth FisherI strive to live in a world where a chicken can cross the road without being questioned about its motives.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/Link to my Blog Post --> www.SQLStudies.com
Patrick Ge
Patrick Ge
SSC Veteran
SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)

Group: General Forum Members
Points: 204 Visits: 415
Couldn't agree more.
Patrick Ge
Patrick Ge
SSC Veteran
SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)SSC Veteran (204 reputation)

Group: General Forum Members
Points: 204 Visits: 415
chiesa.alberto (1/17/2014)
It seems to me that a lot of people is not getting really HOW a query is executed.
But that should be querying 101.

I would just point out the T-SQL Querying book by Itzik Ben-Gan, but let me recap.

A T-SQL query is composed by clauses, those are:

SELECT TOP(x) ....
FROM ...
WHERE ...
GROUP BY ....
ORDER BY ...

But this is a description of the query, not the real execution order.

If we where writing queries with clauses in the exact execution order, it would be in this way:

FROM ...
WHERE ...
GROUP BY ...
SELECT ...
ORDER BY ...
TOP ...

This is why you can write "Order by 1" and have the data sorted by the first column: because when the order by gets executed, the selected data is already there. However, you cannot use the expressions in the SELECT clause in the WHERE, FROM or GROUP BY clauses, because when those are executed, the select has still to be materialized.

What seems to trouble a lot of people is failing to grasp that the ON clause and the WHERE clause are executed in different moments. And, in between them, the OUTER kicks in.

So, the ON is used to match data for the join, and AFTER the join is evaluated, 1 of 4 things happens:
- if the join is INNER, nothing happens and no data is appended. This is why an INNER join will be always at least as fast as an outer
- if the join is LEFT, RIGHT or FULL OUTER, the non-joining data from the related table(s) will be added to the result set.

After this result set given by the FROM clause is built, the WHERE kicks in. No magic or difficult considerations.

Just always, ALWAYS remember the order of execution.

Everything else in the article is, IMHO, unnecessary clutter.



Couldn't agree more
chiesa.alberto
chiesa.alberto
SSC Journeyman
SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)SSC Journeyman (87 reputation)

Group: General Forum Members
Points: 87 Visits: 171
Kenneth.Fisher (1/19/2014)

Honestly, while I agree that knowing the execution order is important, the idea that all queries can exclude entries in the WHERE clause makes me nervious as *&@#. If you are doing INNER JOINs then no problem, it will have the same effect. I just don't see how you can manage restrictions on a query with an OUTER or CROSS JOIN without a WHERE clause.


You should not be nervous. There is no magic in the query syntax. This is the problem I see with the article: it seems some sort of documentary on the strange habits of SQL queries.

Talking about your example, you have three concepts:
1. you are interested in professors with courses (a conditional on the year)
2. but you want also the ones without courses (an Outer, because that is the only tool you have to "revive" a set of records previously excluded by a conditional)
3. but all the results should have tenure (another conditional)

1 and 2 are to be executed one after the other: you firstly select, and then pick up also the ones that where excluded. So 1 must go in the ON.
Where should 3 go? Well, it depends on your requirement:
If you want a list of ALL the professors, it must go before the OUTER, so in the ON clause.
If, as is the case, you want only professors with Tenure, you must be sure that the clause is in the WHERE.

Think it this way:
1. ON is a tool to pick record pairs in a cartesian product. So it's a way to discard records.
2. Outer is a way to "revive" discarded records from one or two tables.
3. After the Outer, the result set will be composed by two sub sets: the paired rows produced by the ON clause, and the unpaired, revived rows picked up by the OUTER.
4. The WHERE will be applied to everything, even the unpaired rows.

So you NEED both tools, and you decide where a conditional has to be based on if it must be applied only to paired rows or also to unpaired ones.

Only a problem of execution order, as I said.

Edit: fixing typos
rhydian
rhydian
Grasshopper
Grasshopper (17 reputation)Grasshopper (17 reputation)Grasshopper (17 reputation)Grasshopper (17 reputation)Grasshopper (17 reputation)Grasshopper (17 reputation)Grasshopper (17 reputation)Grasshopper (17 reputation)

Group: General Forum Members
Points: 17 Visits: 57
Surely the following is an easier way to accomplish this...

SELECT PR.id, PR.ProfessorName, ClassName, ClassYear, ClassSemester
FROM Professor PR
LEFT JOIN
(
SELECT ProfessorID, ClassName, ClassYear, ClassSemester
FROM Class
WHERE ClassYear>=2011
) SQ ON SQ.ProfessorID = PR.Id
WHERE PR.HasTenure = 1
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search