Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Fun with Outer Joins Expand / Collapse
Author
Message
Posted Monday, September 10, 2012 11:00 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Saturday, July 12, 2014 6:44 AM
Points: 441, Visits: 1,799
The way I always remember this is thus: If you refer to the right hand table anywhere is the where clause, you immediately turn that left join into a inner join. That's the effect you see anyway.

Putting the predicate in the ON clause keeps it a left join.

Unless of course you are doing an anti-join, where you want all records from Set A not in Set B, so you use
select cols from Set_A a
left join Set_B b
On a.col1 = b.col1
where b.col1 is NULL

That is the only time I refer to a column from the right hand table involved in a left outer join in the where clause.

HTH

Dave J



http://glossopian.co.uk/
"I don't know what I don't know."
Post #1356893
Posted Monday, September 10, 2012 1:02 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, April 14, 2014 11:00 AM
Points: 123, Visits: 190
thisisfutile (9/10/2012)
I really learned from this article! I've been working with SQL Server for 7 years now and I had no idea the results were different if I took a filter from the WHERE clause and put it into the ON clause...I always assumed the results were going to be the same. I personally knew the right table results were being filtered with the WHERE clause (Dr Sprite and Dr 7-Up were being excluded) because I've been down that bumpy road many times. I've always dealt with this situation by adding "ISNULL" into the where clause.

AND ISNULL(class.classyear,2011)>=2011

Not a pretty solution but NOW I know there is a better way!

Thanks for the article AND it's simplicity. I get what Joe Celko was driving at, but I find simple examples to be refreshing...though it could probably be argued that is why I'm a 7 year veteran and still learning what many DBA's consider elementary.


Using a function on a column in a where clause will not allow your query to use an index. That is why instead of isnull you want to use "Where col1 = 'a' OR Col1 IS NULL..."
Post #1356964
Posted Monday, September 10, 2012 1:04 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, February 6, 2014 12:59 PM
Points: 801, Visits: 1,962
Ken,
It's a great article.
Yep as a simple teaching example I get your design. I also get what Joe is saying, although he has often said it better, in that it winds up validating bad design by being used as an example. The "relationship" table (Catalog) is used to connect [Professor] to [Class]. For your next eaxample you might choose something like [SalesOrder] and [SalesOrderActivity].

Joe,
Keep up the good fight! I had one VP take me to task about my penchant for normalization as being "abnormal". I pointed out that we have a mediumn sized database in production that is small, robust, comprehensive, and quick. I learend a lot from your books.

To everybody else,
Take this to heart in that the real answer here is not so much how the joins work but the fact that the way SQL Server processes your execution can surprise you if you are not careful. The key here is that the WHERE clause seems to be processed after all the joins have taken place.


ATB

Charles Kincaid

Post #1356966
Posted Monday, September 10, 2012 1:08 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, April 14, 2014 11:00 AM
Points: 123, Visits: 190
CELKO (9/9/2012)
The correct terms are "preserved table " and "unpreserved table" and your design is wrong. You have no keys, we do not use BIT flags in SQL, you believe in a magical "id" that changes from table to table, becoming a professor, a squid, a class, an automobile, etc. That is Kabahlah magice and nto RDBMS.

You had only one professor, no key, a name that violates USPS standards and an assembly language bit flag. IDENTITY properties? And no DRI because of the bad non-schema. Yes, without keys and DRI, this is not RDBMS. You also do not understand temporal data.

CREATE TABLE Professors
(emp_id INTEGER NOT NULL PRIMARY KEY,
professor_name VARCHAR(35) NOT NULL,
tenure_date DATE);

Your life will be easier if you learn to use row constructors:

INSET INTO Professors
VALUES (1, 'Dr Coke', '2011-01-01'),
(2, 'Dr Sprite', '2011-01-01'),
(3, 'Dr 7-up', '2011-01-01'),
(4, 'Mr Pepper', NULL),
(5, 'Mr Fanta', NULL);

Notice the use of a NULL for non-tenured professors. This still stinks and we ought to have a status and date range. But let's skip that design flaw for now.

CREATE TABLE Classes
(course_nbr INTEGER NOT NULL
REFERENCES Catalog,
semester_name CHAR(7) NOT NULL
REFERENCES Semesters,
PRIMARY KEY (course_nbr, semester_name)
emp_id INTEGER
REFERENCES Professors);

The common temporal idiom in SQL is a look up table for reporting periods. I prefer a name that can be sorted with ISO-8601 data:

CREATE TABLE Semesters
(semester_name CHAR(7) NOT NULL PRIMARY KEY
CHECK (semester_name LIKEB '[12][0-9][0-9][0-9]-S[1-3]'),
semester_start_date DATE NOT NULL,
semester_end_date DATE NOT NULL,
CHECK(semester_start_date < semester_end_date);

You can get the class name from the catalog and not waste space and time on it. What you had was not normalized. Courses have numbers; look at a college catalog. The NULL is my favorite instructor “TBD” (I do adjunct teaching and that means work).

INSERT INTO Class_Schedule (class_nbr, semester_name, emp_id)
VALUES (101 '2010-S1', 4),
(101, '2010-S2', 4),
(101, '2011-S1', 4),
(101, '2012-S1', NULL),
(101, '2012-S2', NULL),
Etc.

Now try your queries with a proper schema.


wow... beating up the author because of the design correctness of the example in an article about joins? it was just an example by the way. He was giving some example code so people could try it themselves (and so the article made sense). Get off your high horse and understand the article was not about database design. It was about joins....


Post #1356969
Posted Monday, September 10, 2012 3:00 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Today @ 9:18 AM
Points: 80, Visits: 729
adamg 27214 (9/10/2012)


Using a function on a column in a where clause will not allow your query to use an index. That is why instead of isnull you want to use "Where col1 = 'a' OR Col1 IS NULL..."


Roger that. Most of the querying I do is ad-hoc on small datasets and the isnull function is a personal "at-a-glance" way for me to see that I'm including the null results on the RIGHT table (Bad habit? Yes. In production? No). For me it's easier when I'm troubleshooting because the less "OR" and "AND" structures I have to logic my way through during a troubleshooting session the better off I am. Thanks for pointing that out though. I constantly feel like I learn something on this site. I'm still amazed that I never realized two result sets are produced by just moving something from the WHERE clause up to the ON clause. I felt a "click" in my brain today and at 40 years of age I thought I was getting past most of those. It's good to know I can still learn.
Post #1357027
Posted Monday, September 10, 2012 3:02 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 3:47 PM
Points: 3,658, Visits: 7,979
CELKO (9/9/2012)
The correct terms are "preserved table " and "unpreserved table" ...

Can you explain us these terms? I can't find any reference on the web.



Luis C.
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1357028
Posted Monday, September 10, 2012 3:05 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 12:50 PM
Points: 137, Visits: 871
Good article. I just want to present a style of coding that prevents this situation. I often work with queries involving a dozen tables. Notice that each table is basically a separate select statement. If I didn't code in my style, I'd quickly get confused.

select
p.*,
c.*
FROM
(-- professor data
SELECT
Professor.Id AS [Id],
Professor.ProfessorName,
CASE Professor.HasTenure WHEN 1 THEN 'True' WHEN 0 THEN 'False' ELSE NULL END AS [Has Tenure]
FROM Professor
WHERE Professor.HasTenure = 'True'
) as P
left outer join
(-- class data
SELECT
Class.ProfessorId AS [ProfessorId],
Class.ClassName,
Class.ClassYear,
Class.ClassSemester
FROM Class
WHERE Class.ClassYear >= 2011
) as c on p.Id = c.ProfessorId


Post #1357029
Posted Monday, September 10, 2012 4:44 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 7:11 AM
Points: 3,464, Visits: 1,801
Bill Talada (9/10/2012)
Good article. I just want to present a style of coding that prevents this situation. I often work with queries involving a dozen tables. Notice that each table is basically a separate select statement. If I didn't code in my style, I'd quickly get confused.

select
p.*,
c.*
FROM
(-- professor data
SELECT
Professor.Id AS [Id],
Professor.ProfessorName,
CASE Professor.HasTenure WHEN 1 THEN 'True' WHEN 0 THEN 'False' ELSE NULL END AS [Has Tenure]
FROM Professor
WHERE Professor.HasTenure = 'True'
) as P
left outer join
(-- class data
SELECT
Class.ProfessorId AS [ProfessorId],
Class.ClassName,
Class.ClassYear,
Class.ClassSemester
FROM Class
WHERE Class.ClassYear >= 2011
) as c on p.Id = c.ProfessorId




Interestingly enough one of my coworkers suggested exactly the same thing. I tried it while I was writing the article but it came up significantly slower than doing it the other way. Not noticeable with small data sets but if you turn STATISTICS TIME & IO on you can see it.


Kenneth Fisher
I strive to live in a world where a chicken can cross the road without being questioned about its motives.
--------------------------------------------------------------------------------
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Link to my Blog Post --> www.SQLStudies.com
Post #1357056
Posted Monday, September 10, 2012 7:32 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 12:50 PM
Points: 137, Visits: 871
Setting STATISTICS TIME & IO ON will cause severe slowdowns. I generally write each table as a separate SELECT in the FROM clause as above because it always self documents and simultaneously optimizes. Often the sub-selects can be reused in other procs. I'm maintaining 1000+ custom procs in a world class application. Everything is fast although the databases are all under 100 GB.

Give it another try!
Post #1357086
Posted Tuesday, September 11, 2012 6:21 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 7:11 AM
Points: 3,464, Visits: 1,801
Bill Talada (9/10/2012)
Setting STATISTICS TIME & IO ON will cause severe slowdowns. I generally write each table as a separate SELECT in the FROM clause as above because it always self documents and simultaneously optimizes. Often the sub-selects can be reused in other procs. I'm maintaining 1000+ custom procs in a world class application. Everything is fast although the databases are all under 100 GB.

Give it another try!


While I agree setting STATISTICS TIME & IO on will cause slowdowns I would think they would slow both queries down equally wouldn't they? I'll try running it on one of our bigger test servers and see what happens.


Kenneth Fisher
I strive to live in a world where a chicken can cross the road without being questioned about its motives.
--------------------------------------------------------------------------------
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Link to my Blog Post --> www.SQLStudies.com
Post #1357351
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse