Fun with Outer Joins

  • Good article. I just want to present a style of coding that prevents this situation. I often work with queries involving a dozen tables. Notice that each table is basically a separate select statement. If I didn't code in my style, I'd quickly get confused.

    select

    p.*,

    c.*

    FROM

    (-- professor data

    SELECT

    Professor.Id AS [Id],

    Professor.ProfessorName,

    CASE Professor.HasTenure WHEN 1 THEN 'True' WHEN 0 THEN 'False' ELSE NULL END AS [Has Tenure]

    FROM Professor

    WHERE Professor.HasTenure = 'True'

    ) as P

    left outer join

    (-- class data

    SELECT

    Class.ProfessorId AS [ProfessorId],

    Class.ClassName,

    Class.ClassYear,

    Class.ClassSemester

    FROM Class

    WHERE Class.ClassYear >= 2011

    ) as c on p.Id = c.ProfessorId

  • Bill Talada (9/10/2012)


    Good article. I just want to present a style of coding that prevents this situation. I often work with queries involving a dozen tables. Notice that each table is basically a separate select statement. If I didn't code in my style, I'd quickly get confused.

    select

    p.*,

    c.*

    FROM

    (-- professor data

    SELECT

    Professor.Id AS [Id],

    Professor.ProfessorName,

    CASE Professor.HasTenure WHEN 1 THEN 'True' WHEN 0 THEN 'False' ELSE NULL END AS [Has Tenure]

    FROM Professor

    WHERE Professor.HasTenure = 'True'

    ) as P

    left outer join

    (-- class data

    SELECT

    Class.ProfessorId AS [ProfessorId],

    Class.ClassName,

    Class.ClassYear,

    Class.ClassSemester

    FROM Class

    WHERE Class.ClassYear >= 2011

    ) as c on p.Id = c.ProfessorId

    Interestingly enough one of my coworkers suggested exactly the same thing. I tried it while I was writing the article but it came up significantly slower than doing it the other way. Not noticeable with small data sets but if you turn STATISTICS TIME & IO on you can see it.

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

  • Setting STATISTICS TIME & IO ON will cause severe slowdowns. I generally write each table as a separate SELECT in the FROM clause as above because it always self documents and simultaneously optimizes. Often the sub-selects can be reused in other procs. I'm maintaining 1000+ custom procs in a world class application. Everything is fast although the databases are all under 100 GB.

    Give it another try!

  • Bill Talada (9/10/2012)


    Setting STATISTICS TIME & IO ON will cause severe slowdowns. I generally write each table as a separate SELECT in the FROM clause as above because it always self documents and simultaneously optimizes. Often the sub-selects can be reused in other procs. I'm maintaining 1000+ custom procs in a world class application. Everything is fast although the databases are all under 100 GB.

    Give it another try!

    While I agree setting STATISTICS TIME & IO on will cause slowdowns I would think they would slow both queries down equally wouldn't they? I'll try running it on one of our bigger test servers and see what happens.

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

  • Luis Cazares (9/10/2012)


    CELKO (9/9/2012)


    The correct terms are "preserved table " and "unpreserved table" ...

    Can you explain us these terms? I can't find any reference on the web.

    While I'm sure Joe has a deeper, ANSI definition for these terms, I believe their simplest usage is as a reference point. In a LEFT join the LEFT table is the preserved table and the RIGHT is unpreserved. The reverse would be true in a RIGHT join. So in discussion it doesn't matter if we're talking about LEFT or RIGHT joins, we'll know which table is which using these two terms. Like in a car, we don't reference left and right sides because that's relevant to whether or not I'm looking at it from the back or from the front. Of course, referencing driver-side and passenger-side is different based on the country you live in. Now that I think about it, like boats, I'm all for using port and starboard for cars. ๐Ÿ˜›

  • thisisfutile (9/11/2012)


    Luis Cazares (9/10/2012)


    CELKO (9/9/2012)


    The correct terms are "preserved table " and "unpreserved table" ...

    Can you explain us these terms? I can't find any reference on the web.

    While I'm sure Joe has a deeper, ANSI definition for these terms, I believe their simplest usage is as a reference point. In a LEFT join the LEFT table is the preserved table and the RIGHT is unpreserved. The reverse would be true in a RIGHT join. So in discussion it doesn't matter if we're talking about LEFT or RIGHT joins, we'll know which table is which using these two terms. Like in a car, we don't reference left and right sides because that's relevant to whether or not I'm looking at it from the back or from the front. Of course, referencing driver-side and passenger-side is different based on the country you live in. Now that I think about it, like boats, I'm all for using port and starboard for cars. ๐Ÿ˜›

    There you go. In a LEFT OUTER JOIN from now on we will call the left table the "driver's side" or "port" table and the right one the "passenger side" or "starboard" table. Reverse of course for a RIGHT join. :hehe:

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

  • You have a couple of misspelled words there Mr. celko. This did not hinder my understanding of the point you were making. Nor did Kennethโ€™s use of words and schema hinder my understanding of his.

  • Joe,

    Thanks for the lesson.

    A question:

    Regarding "assembly language bits", we have a case where we need to track historic medical data for patients (problems), and track if it's resolved or not. We may not have a date (it could have been 20 years ago, and the patient does not remember). I am guessing that you are promoting a look-up table, with two values (Resolved, Unresolved).

    Also, "Your life will be easier if you learn to use row constructors: " - I guess it depends on the version of the DBMS the author is using. Some people are still on SQL Server 2005.

    David

    CELKO (9/9/2012)


    The correct terms are "preserved table " and "unpreserved table" and your design is wrong. You have no keys, we do not use BIT flags in SQL, you believe in a magical "id" that changes from table to table, becoming a professor, a squid, a class, an automobile, etc. That is Kabahlah magice and nto RDBMS.

    You had only one professor, no key, a name that violates USPS standards and an assembly language bit flag. IDENTITY properties? And no DRI because of the bad non-schema. Yes, without keys and DRI, this is not RDBMS. You also do not understand temporal data.

    CREATE TABLE Professors

    (emp_id INTEGER NOT NULL PRIMARY KEY,

    professor_name VARCHAR(35) NOT NULL,

    tenure_date DATE);

    Your life will be easier if you learn to use row constructors:

    INSET INTO Professors

    VALUES (1, 'Dr Coke', '2011-01-01'),

    (2, 'Dr Sprite', '2011-01-01'),

    (3, 'Dr 7-up', '2011-01-01'),

    (4, 'Mr Pepper', NULL),

    (5, 'Mr Fanta', NULL);

    Notice the use of a NULL for non-tenured professors. This still stinks and we ought to have a status and date range. But let's skip that design flaw for now.

    CREATE TABLE Classes

    (course_nbr INTEGER NOT NULL

    REFERENCES Catalog,

    semester_name CHAR(7) NOT NULL

    REFERENCES Semesters,

    PRIMARY KEY (course_nbr, semester_name)

    emp_id INTEGER

    REFERENCES Professors);

    The common temporal idiom in SQL is a look up table for reporting periods. I prefer a name that can be sorted with ISO-8601 data:

    CREATE TABLE Semesters

    (semester_name CHAR(7) NOT NULL PRIMARY KEY

    CHECK (semester_name LIKEB '[12][0-9][0-9][0-9]-S[1-3]'),

    semester_start_date DATE NOT NULL,

    semester_end_date DATE NOT NULL,

    CHECK(semester_start_date < semester_end_date);

    You can get the class name from the catalog and not waste space and time on it. What you had was not normalized. Courses have numbers; look at a college catalog. The NULL is my favorite instructor โ€œTBDโ€ (I do adjunct teaching and that means work).

    INSERT INTO Class_Schedule (class_nbr, semester_name, emp_id)

    VALUES (101 '2010-S1', 4),

    (101, '2010-S2', 4),

    (101, '2011-S1', 4),

    (101, '2012-S1', NULL),

    (101, '2012-S2', NULL),

    Etc.

    Now try your queries with a proper schema.

  • Kenneth

    Just to pick up on this point you made:

    "We" do use BITs quite often. They are a nice small data type where I can fit 8 of them into a byte. Unless you have a good reason not to use them that I've never heard of I think I'll continue to not waste disk space

    My understanding is that this is the behaviour of the Bit field type anyway - is there a reason why you would do your own packing into a Tinyint?

  • pg53 (9/12/2012)


    Kenneth

    Just to pick up on this point you made:

    "We" do use BITs quite often. They are a nice small data type where I can fit 8 of them into a byte. Unless you have a good reason not to use them that I've never heard of I think I'll continue to not waste disk space

    My understanding is that this is the behaviour of the Bit field type anyway - is there a reason why you would do your own packing into a Tinyint?

    Fair point. In this particular case I only ever put 1s and 2s in my classsemester column (fall/spring) so I could easily have put this into a Bit field. And to be honest since I only had 1 bit field in the Professor table I didn't save anything over a char(1). However I still feel that the appropriate data type for a flag (only ever going to be 2 options) is a bit. In the case of ClassSemester I could have included more semesters in the future (say summer) and was leaving room for it.

    All that being said my big complaint with what Joe had to say is that I used a very simple format. The minimum required to get my point across. While there were some minor points that could be argued (Id vs ProfessorId etc) they don't distract from the point of the article. Mr Celko's structure on the other hand was overly complex for what was needed. This is however just my opinion, and as my family knows my opinions should always be taken with a grain (tablespoon full) of salt.

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

  • thisisfutile (9/11/2012)


    Luis Cazares (9/10/2012)


    CELKO (9/9/2012)


    The correct terms are "preserved table " and "unpreserved table" ...

    Can you explain us these terms? I can't find any reference on the web.

    While I'm sure Joe has a deeper, ANSI definition for these terms, I believe their simplest usage is as a reference point. In a LEFT join the LEFT table is the preserved table and the RIGHT is unpreserved. The reverse would be true in a RIGHT join. So in discussion it doesn't matter if we're talking about LEFT or RIGHT joins, we'll know which table is which using these two terms. Like in a car, we don't reference left and right sides because that's relevant to whether or not I'm looking at it from the back or from the front. Of course, referencing driver-side and passenger-side is different based on the country you live in. Now that I think about it, like boats, I'm all for using port and starboard for cars. ๐Ÿ˜›

    That's a reasonable explanation, but there's no such thing as preserved or unpreserved table. As a matter of fact, it doesn't need to be a table to be in the statement, it could be a view, a TVF, a subquery, a CTE, etc. It could all be treater as a data set.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Very good article. I did get confused trying to follow through the FULL JOIN result set as it seemed to be missing 5 rows for the Selzer class. I eventually ran the code and confirmed that it IS missing 5 rows and the point of the ON clause was finally able to sink in.

  • CELKO (9/9/2012)


    ...we do not use BIT flags in SQL...

    You, sir, couldn't be more wrong. That is all.

  • When executing the query with adding the FULL JOIN with where clause like below -

    SELECT Professor.Id AS [Professor.Id], Professor.ProfessorName,

    CASE Professor.HasTenure WHEN 1 THEN 'True' WHEN 0 THEN 'False' ELSE NULL END AS [Has Tenure],

    Class.ProfessorId AS [Class.ProfessorId], Class.ClassName,

    Class.ClassYear, Class.ClassSemester

    FROM Professor

    FULL JOIN Class

    ON Professor.Id = Class.ProfessorId AND Class.ClassYear >= 2011

    WHERE Professor.HasTenure = 'True'

    Then it is giving same results as adding LEFT OUTER JOIN.

  • vsarade (9/18/2012)


    When executing the query with adding the FULL JOIN with where clause like below -

    SELECT Professor.Id AS [Professor.Id], Professor.ProfessorName,

    CASE Professor.HasTenure WHEN 1 THEN 'True' WHEN 0 THEN 'False' ELSE NULL END AS [Has Tenure],

    Class.ProfessorId AS [Class.ProfessorId], Class.ClassName,

    Class.ClassYear, Class.ClassSemester

    FROM Professor

    FULL JOIN Class

    ON Professor.Id = Class.ProfessorId AND Class.ClassYear >= 2011

    WHERE Professor.HasTenure = 'True'

    Then it is giving same results as adding LEFT OUTER JOIN.

    Not really surprising. This is for the same reason that putting the "Class.ClassYear = 2011" condition into the where clause eliminates the effect of the LEFT OUTER JOIN and makes it act like an INNER JOIN. Because the condition "Professor.HasTenure = 'True'" is in the WHERE clause it will eliminate all of the entries where HasTenure is NULL. In other words it is eliminating all of the entries that the difference between FULL and OUTER join caused.

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

Viewing 15 posts - 16 through 30 (of 63 total)

You must be logged in to reply to this topic. Login to reply