Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

Fun with Outer Joins Expand / Collapse
Author
Message
Posted Sunday, September 9, 2012 5:47 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 3:14 PM
Points: 3,466, Visits: 1,820
Comments posted to this topic are about the item Fun with Outer Joins

Kenneth Fisher
I strive to live in a world where a chicken can cross the road without being questioned about its motives.
--------------------------------------------------------------------------------
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Link to my Blog Post --> www.SQLStudies.com
Post #1356525
Posted Sunday, September 9, 2012 11:14 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, October 22, 2014 1:27 PM
Points: 1,945, Visits: 3,068
The correct terms are "preserved table " and "unpreserved table" and your design is wrong. You have no keys, we do not use BIT flags in SQL, you believe in a magical "id" that changes from table to table, becoming a professor, a squid, a class, an automobile, etc. That is Kabahlah magice and nto RDBMS.

You had only one professor, no key, a name that violates USPS standards and an assembly language bit flag. IDENTITY properties? And no DRI because of the bad non-schema. Yes, without keys and DRI, this is not RDBMS. You also do not understand temporal data.

CREATE TABLE Professors
(emp_id INTEGER NOT NULL PRIMARY KEY,
professor_name VARCHAR(35) NOT NULL,
tenure_date DATE);

Your life will be easier if you learn to use row constructors:

INSET INTO Professors
VALUES (1, 'Dr Coke', '2011-01-01'),
(2, 'Dr Sprite', '2011-01-01'),
(3, 'Dr 7-up', '2011-01-01'),
(4, 'Mr Pepper', NULL),
(5, 'Mr Fanta', NULL);

Notice the use of a NULL for non-tenured professors. This still stinks and we ought to have a status and date range. But let's skip that design flaw for now.

CREATE TABLE Classes
(course_nbr INTEGER NOT NULL
REFERENCES Catalog,
semester_name CHAR(7) NOT NULL
REFERENCES Semesters,
PRIMARY KEY (course_nbr, semester_name)
emp_id INTEGER
REFERENCES Professors);

The common temporal idiom in SQL is a look up table for reporting periods. I prefer a name that can be sorted with ISO-8601 data:

CREATE TABLE Semesters
(semester_name CHAR(7) NOT NULL PRIMARY KEY
CHECK (semester_name LIKEB '[12][0-9][0-9][0-9]-S[1-3]'),
semester_start_date DATE NOT NULL,
semester_end_date DATE NOT NULL,
CHECK(semester_start_date < semester_end_date);

You can get the class name from the catalog and not waste space and time on it. What you had was not normalized. Courses have numbers; look at a college catalog. The NULL is my favorite instructor “TBD” (I do adjunct teaching and that means work).

INSERT INTO Class_Schedule (class_nbr, semester_name, emp_id)
VALUES (101 '2010-S1', 4),
(101, '2010-S2', 4),
(101, '2011-S1', 4),
(101, '2012-S1', NULL),
(101, '2012-S2', NULL),
Etc.

Now try your queries with a proper schema.


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1356557
Posted Monday, September 10, 2012 2:44 AM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, August 13, 2014 1:12 AM
Points: 27, Visits: 117
Thanks for the article. In my experience (15 years SQL Server, 25 Oracle) there have been so many instances where despite hours of research and hacking the T-SQL, an outer join fails to deliver the keys for non matched rows. This appears to be a not uncommon finding. Maybe there's scope for a follow on article explaining with why this happens and how to fix it without resorting to #tables.

Men who wish to know about the world must learn about it in its particular details. - Heraclitus
Post #1356593
Posted Monday, September 10, 2012 2:52 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, October 3, 2014 1:55 PM
Points: 199, Visits: 735
hi there,

maybe this little trick helps:

I personally understood this behaviour of outer joins quite a time ago when it was explained to me as follows (yes I know, very simplified...):
there is a logical order of processing the a query. At some time the outer join with its ON-clause is processed, first as an inner join, then adding the outer rows and also the NULL-values on the missing inner rows. Some time after that the WHERE-clause is processed maybe "removing" null-values.

Since I'm saying to myself "first the ON is processed, after that the WHERE is processed" it's no problem for me any more writing my outer joins in a way they are doing what I expect.
Post #1356597
Posted Monday, September 10, 2012 7:23 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 5:04 PM
Points: 2,702, Visits: 3,411
I suppose I don't understand the point. We know that when records do not exist in the RIGHT table that NULLs are returned. So why would we expect a filter condition on the RIGHT table to treat NULLs differently than we would otherwise?

i.e. Simply change the filter condition to account for NULLs

WHERE Professor.HasTenure = 'True'
AND (Class.ClassYear >= 2011 OR Class.ClassYear IS NULL)



Thanks,

Jared
SQL Know-It-All

How to post data/code on a forum to get the best help - Jeff Moden
Post #1356716
Posted Monday, September 10, 2012 7:40 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Friday, September 19, 2014 5:16 AM
Points: 887, Visits: 1,774
SQLKnowItAll (9/10/2012)
I suppose I don't understand the point. We know that when records do not exist in the RIGHT table that NULLs are returned. So why would we expect a filter condition on the RIGHT table to treat NULLs differently than we would otherwise?

i.e. Simply change the filter condition to account for NULLs

WHERE Professor.HasTenure = 'True'
AND (Class.ClassYear >= 2011 OR Class.ClassYear IS NULL)



id say because the above query is the classic catch all. if we can avoid the catch all performance should increase. (as always the it depends please test applies).

if we have an index on Class with ProfessorID and ClassYear we can use that index in the join. we also eliminating records we dont need early in execution and have a lower overall row count.


To Kenneth, Great article and a gave me a great way to explain join conditions vs where conditions.



For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw

Need to Split some strings? Jeff Moden's DelimitedSplit8K
Jeff Moden's Cross tab and Pivots Part 1
Jeff Moden's Cross tab and Pivots Part 2

Jeremy Oursler
Post #1356733
Posted Monday, September 10, 2012 7:48 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 5:04 PM
Points: 2,702, Visits: 3,411
capn.hector (9/10/2012)
SQLKnowItAll (9/10/2012)
I suppose I don't understand the point. We know that when records do not exist in the RIGHT table that NULLs are returned. So why would we expect a filter condition on the RIGHT table to treat NULLs differently than we would otherwise?

i.e. Simply change the filter condition to account for NULLs

WHERE Professor.HasTenure = 'True'
AND (Class.ClassYear >= 2011 OR Class.ClassYear IS NULL)



id say because the above query is the classic catch all. if we can avoid the catch all performance should increase. (as always the it depends please test applies).

if we have an index on Class with ProfessorID and ClassYear we can use that index in the join. we also eliminating records we dont need early in execution and have a lower overall row count.


To Kenneth, Great article and a gave me a great way to explain join conditions vs where conditions.
I don't see it as a catch all. I see it as defining the requirements of the query. The join condition is the id, the filter is the tenure and the year. Assessing the question posed, I wouldn't expect to see the results that the author posed; i.e. if class year is NULL, then they didn't teach that year and I don't need them in the results. I just don't understand why the expected results would be different. If someone has been working with LEFT OUTER JOINs for more than 20 minutes... why would they ever expect the query to return rows from the LEFT table when the filter is on the RIGHT? I don't put the filter in the JOIN because it is a filter, not a join condition.


Thanks,

Jared
SQL Know-It-All

How to post data/code on a forum to get the best help - Jeff Moden
Post #1356740
Posted Monday, September 10, 2012 8:07 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Friday, September 19, 2014 5:16 AM
Points: 887, Visits: 1,774
SQLKnowItAll (9/10/2012)
capn.hector (9/10/2012)
SQLKnowItAll (9/10/2012)
I suppose I don't understand the point. We know that when records do not exist in the RIGHT table that NULLs are returned. So why would we expect a filter condition on the RIGHT table to treat NULLs differently than we would otherwise?

i.e. Simply change the filter condition to account for NULLs

WHERE Professor.HasTenure = 'True'
AND (Class.ClassYear >= 2011 OR Class.ClassYear IS NULL)



id say because the above query is the classic catch all. if we can avoid the catch all performance should increase. (as always the it depends please test applies).

if we have an index on Class with ProfessorID and ClassYear we can use that index in the join. we also eliminating records we dont need early in execution and have a lower overall row count.


To Kenneth, Great article and a gave me a great way to explain join conditions vs where conditions.
I don't see it as a catch all. I see it as defining the requirements of the query. The join condition is the id, the filter is the tenure and the year. Assessing the question posed, I wouldn't expect to see the results that the author posed; i.e. if class year is NULL, then they didn't teach that year and I don't need them in the results. I just don't understand why the expected results would be different. If someone has been working with LEFT OUTER JOINs for more than 20 minutes... why would they ever expect the query to return rows from the LEFT table when the filter is on the RIGHT? I don't put the filter in the JOIN because it is a filter, not a join condition.


while i agree with you that the join condition is the id and the filter is tenure and year, i disagree that the join on id should be the only thing in the join conditions. as far as the expected results, well users want what they want and we are there to provide.

(Class.ClassYear >= 2011 OR Class.ClassYear IS NULL) while not exactly a catch all with a passed variable you are still using the form (something = constant or something IS NULL). if we can eliminate the second check for null (even by not putting a "filter" in the where clause but in the join) we should speed up the query.

this of course is a great situation for the it depends, test both ways, or what does your code reviewer say answer to which is correct.



For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw

Need to Split some strings? Jeff Moden's DelimitedSplit8K
Jeff Moden's Cross tab and Pivots Part 1
Jeff Moden's Cross tab and Pivots Part 2

Jeremy Oursler
Post #1356759
Posted Monday, September 10, 2012 9:20 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 3:14 PM
Points: 3,466, Visits: 1,820
First I would like to appologize if my choice of test data is throwing people off. The data and table layout was meant to be simple and easy to follow so it didn't get in the way of the description of the difference between the ON clause and the WHERE clause.

Second I'll address a few points.

The correct terms are "preserved table " and "unpreserved table"

Never heard of either. What are they the correct terms for?


and your design is wrong.

My design is correct for the purpose. It's simple. I didn't need or want a complicated design to demonstrate a subject that can be complicated enough for a lot of people.

we do not use BIT flags in SQL

"We" do use BITs quite often. They are a nice small data type where I can fit 8 of them into a byte. Unless you have a good reason not to use them that I've never heard of I think I'll continue to not waste disk space.

Your life will be easier if you learn to use row constructors:

INSET INTO Professors
VALUES (1, 'Dr Coke', '2011-01-01'),
(2, 'Dr Sprite', '2011-01-01'),
(3, 'Dr 7-up', '2011-01-01'),
(4, 'Mr Pepper', NULL),
(5, 'Mr Fanta', NULL);

I do in fact know how to use this method of inserting. However it only works on 2008 and above and not everyone (believe it or not) is done using 2005 (and even lower).

Now try your queries with a proper schema.

My queries work exactly the same. Which is why I chose to use a very simple table layout. If I had chosen to make a realistic schema I could have added quite a few more columns, tables, defaults, constraints, foreign keys etc. That not being the point of the article I chose not to.

SQLKnowItAll (9/10/2012)
I suppose I don't understand the point. We know that when records do not exist in the RIGHT table that NULLs are returned. So why would we expect a filter condition on the RIGHT table to treat NULLs differently than we would otherwise?

i.e. Simply change the filter condition to account for NULLs

WHERE Professor.HasTenure = 'True'
AND (Class.ClassYear >= 2011 OR Class.ClassYear IS NULL)


You can absolutly do it this way. Personally I prefer the other, and I do consider the ClassYear part of the join in this case. But that comes down to personal preference I think. I will say that you should consider yourself lucky to understand it so well. I still get around 1 or 2 questions a week on this subject. Which is why I wrote the article in the first place. It tends to be a difficult subject for some people. Not everyone, but in my experience the vast majority.


Kenneth Fisher
I strive to live in a world where a chicken can cross the road without being questioned about its motives.
--------------------------------------------------------------------------------
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Link to my Blog Post --> www.SQLStudies.com
Post #1356820
Posted Monday, September 10, 2012 10:04 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Yesterday @ 8:16 AM
Points: 83, Visits: 743
I really learned from this article! I've been working with SQL Server for 7 years now and I had no idea the results were different if I took a filter from the WHERE clause and put it into the ON clause...I always assumed the results were going to be the same. I personally knew the right table results were being filtered with the WHERE clause (Dr Sprite and Dr 7-Up were being excluded) because I've been down that bumpy road many times. I've always dealt with this situation by adding "ISNULL" into the where clause.

AND ISNULL(class.classyear,2011)>=2011

Not a pretty solution but NOW I know there is a better way!

Thanks for the article AND it's simplicity. I get what Joe Celko was driving at, but I find simple examples to be refreshing...though it could probably be argued that is why I'm a 7 year veteran and still learning what many DBA's consider elementary.
Post #1356857
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse