Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««34567»»

Fun with Outer Joins Expand / Collapse
Author
Message
Posted Friday, January 17, 2014 3:17 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, October 30, 2014 9:01 AM
Points: 1, Visits: 20
...
Post #1532309
Posted Sunday, January 19, 2014 3:15 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Sunday, September 21, 2014 5:24 PM
Points: 15, Visits: 33
Thanks for this article. It brings to the fore my lack of understanding of the difference between the join and where clauses.
I still don't understand why the clause 'AND Class.ClassYear >= 2011' produces a different result when used as part of the join.
It seems to me that it can't contribute to the join because it refers only to the values in one of the tables in the join, so on the face of it, it acts as an ersatz where, cutting down the rows in the Classes table involved in the join.
But, that being the case, why is the result different when it's transplanted to the real where clause? Why are the rows with NULL values in the ClassYear retained?

Would greatly appreciate it if you could clarify that for me.

Post #1532470
Posted Sunday, January 19, 2014 3:42 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, November 13, 2014 9:12 AM
Points: 16, Visits: 118
Look at my previous post for an explanation: it's a matter of order of execution.
Basically, what happens (from a logical point), is this:
1. A cartesian product of the two tables is produced (the query optimizer will probably use some other algorithm in practice, but you can ignore that to understand HOW the rows are included/excluded/calculated).
2. Each pair of rows is evaluated by the ON clause. So, the ON eliminates unneeded rows from the join.
3. The OUTER kicks in and gets all the rows that where excluded by the ON (from one table or both).
4. The WHERE excludes rows from the new list.

In case of INNER joins, step 3 is not executed, so conditionals can be switched from the ON clause to the WHERE freely.
No magic, no secrets. Only execution order.
Post #1532471
Posted Sunday, January 19, 2014 4:25 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, August 20, 2014 2:22 PM
Points: 71, Visits: 251
Try it like this:

[b]so on the face of it, it acts as an ersatz where, cutting down the rows in the Classes table involved in the join.




Yes, you are right, this is what it does - but it does it before the OUTER restores the rows from the Professor table, which also adds the null parts on the right. Thus the nulls are being added in, as it were, after the WHERE clause equivalent.
Conversely, using the actual WHERE clause means that it is applied after the nulls have been added, and so it removes them.
Post #1532472
Posted Sunday, January 19, 2014 4:43 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Sunday, September 21, 2014 5:24 PM
Points: 15, Visits: 33
Thank you. That's what I wasn't getting. The difference between the ON and the OUTER. I was conflating the two (yes, even while reading the steps as described by chiesa.alberto).

Thanks to both of you. I'm going to pin the execution order up somewhere prominent, so I take it into account from now on.
Post #1532474
Posted Sunday, January 19, 2014 7:55 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 12:36 PM
Points: 3,467, Visits: 1,827
sneumersky (1/17/2014)
Charles Kincaid (1/17/2014)
Just always, ALWAYS remember the order of execution.


I added the emphasis. Yes, understanding this execution order is why I have been able to write good performance filtered queries with no entries in the WHERE clause at all. I have shown these around our office and folks are calling "Where is your WHERE?" Further since the WHERE takes place after the JOINS (in the FROM) moving the predicates to the JOINS can reduce memory consumption.

I have a rule that I keep in mind while designing queries: JOINS put stuff into the result set. WHERE takes stuff out of the result set.


Well said.


Honestly, while I agree that knowing the execution order is important, the idea that all queries can exclude entries in the WHERE clause makes me nervious as *&@#. If you are doing INNER JOINs then no problem, it will have the same effect. I just don't see how you can manage restrictions on a query with an OUTER or CROSS JOIN without a WHERE clause.


Kenneth Fisher
I strive to live in a world where a chicken can cross the road without being questioned about its motives.
--------------------------------------------------------------------------------
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Link to my Blog Post --> www.SQLStudies.com
Post #1532486
Posted Sunday, January 19, 2014 9:33 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, November 17, 2014 10:00 PM
Points: 12, Visits: 374
Couldn't agree more.
Post #1532493
Posted Sunday, January 19, 2014 9:35 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, November 17, 2014 10:00 PM
Points: 12, Visits: 374
chiesa.alberto (1/17/2014)
It seems to me that a lot of people is not getting really HOW a query is executed.
But that should be querying 101.

I would just point out the T-SQL Querying book by Itzik Ben-Gan, but let me recap.

A T-SQL query is composed by clauses, those are:

SELECT TOP(x) ....
FROM ...
WHERE ...
GROUP BY ....
ORDER BY ...

But this is a description of the query, not the real execution order.

If we where writing queries with clauses in the exact execution order, it would be in this way:

FROM ...
WHERE ...
GROUP BY ...
SELECT ...
ORDER BY ...
TOP ...

This is why you can write "Order by 1" and have the data sorted by the first column: because when the order by gets executed, the selected data is already there. However, you cannot use the expressions in the SELECT clause in the WHERE, FROM or GROUP BY clauses, because when those are executed, the select has still to be materialized.

What seems to trouble a lot of people is failing to grasp that the ON clause and the WHERE clause are executed in different moments. And, in between them, the OUTER kicks in.

So, the ON is used to match data for the join, and AFTER the join is evaluated, 1 of 4 things happens:
- if the join is INNER, nothing happens and no data is appended. This is why an INNER join will be always at least as fast as an outer
- if the join is LEFT, RIGHT or FULL OUTER, the non-joining data from the related table(s) will be added to the result set.

After this result set given by the FROM clause is built, the WHERE kicks in. No magic or difficult considerations.

Just always, ALWAYS remember the order of execution.

Everything else in the article is, IMHO, unnecessary clutter.



Couldn't agree more
Post #1532494
Posted Monday, January 20, 2014 1:35 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, November 13, 2014 9:12 AM
Points: 16, Visits: 118
Kenneth.Fisher (1/19/2014)

Honestly, while I agree that knowing the execution order is important, the idea that all queries can exclude entries in the WHERE clause makes me nervious as *&@#. If you are doing INNER JOINs then no problem, it will have the same effect. I just don't see how you can manage restrictions on a query with an OUTER or CROSS JOIN without a WHERE clause.


You should not be nervous. There is no magic in the query syntax. This is the problem I see with the article: it seems some sort of documentary on the strange habits of SQL queries.

Talking about your example, you have three concepts:
1. you are interested in professors with courses (a conditional on the year)
2. but you want also the ones without courses (an Outer, because that is the only tool you have to "revive" a set of records previously excluded by a conditional)
3. but all the results should have tenure (another conditional)

1 and 2 are to be executed one after the other: you firstly select, and then pick up also the ones that where excluded. So 1 must go in the ON.
Where should 3 go? Well, it depends on your requirement:
If you want a list of ALL the professors, it must go before the OUTER, so in the ON clause.
If, as is the case, you want only professors with Tenure, you must be sure that the clause is in the WHERE.

Think it this way:
1. ON is a tool to pick record pairs in a cartesian product. So it's a way to discard records.
2. Outer is a way to "revive" discarded records from one or two tables.
3. After the Outer, the result set will be composed by two sub sets: the paired rows produced by the ON clause, and the unpaired, revived rows picked up by the OUTER.
4. The WHERE will be applied to everything, even the unpaired rows.

So you NEED both tools, and you decide where a conditional has to be based on if it must be applied only to paired rows or also to unpaired ones.

Only a problem of execution order, as I said.

Edit: fixing typos
Post #1532521
Posted Monday, January 20, 2014 3:49 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, March 4, 2014 4:34 AM
Points: 3, Visits: 55
Surely the following is an easier way to accomplish this...

SELECT PR.id, PR.ProfessorName, ClassName, ClassYear, ClassSemester
FROM Professor PR
LEFT JOIN
(
SELECT ProfessorID, ClassName, ClassYear, ClassSemester
FROM Class
WHERE ClassYear>=2011
) SQ ON SQ.ProfessorID = PR.Id
WHERE PR.HasTenure = 1
Post #1532560
« Prev Topic | Next Topic »

Add to briefcase «««34567»»

Permissions Expand / Collapse