Fun with Outer Joins

  • DB.Duck (9/17/2012)


    CELKO (9/9/2012)


    ...we do not use BIT flags in SQL...

    You, sir, couldn't be more wrong. That is all.

    Actually it's almost a great point but poorly stated. Let me give this a shot. For the most part you should avoid bit flags.

    Often they are used to replicate data. Case in point is the [TerminationDate] of an employee. That column will be NULL for everyone who is still working and have a valid date for folks who have, or were, terminated. Therefore a [Terminated] bit column is not needed.

    Then there is the whole aspect of indexing on bits. SQL Server is a quite fine product (all software has bugs so don't start) but even the greatest software has to make compromises. Indexing on a bit suffers. I'll leave further reading to you. There are many good articles and books.

    Bits are tempting. Recent versions will cram 8 bits into one byte of actual storage. Sounds cool. You create the first one at design time. You are sucked into a false sense as adding the next seven are very fast (only the meta data is changed). Now you add the ninth and wait as every page in your table is rewritten.

    ATBCharles Kincaid

  • Charles Kincaid (9/18/2012)


    DB.Duck (9/17/2012)


    CELKO (9/9/2012)


    ...we do not use BIT flags in SQL...

    You, sir, couldn't be more wrong. That is all.

    Actually it's almost a great point but poorly stated. Let me give this a shot. For the most part you should avoid bit flags.

    Often they are used to replicate data. Case in point is the [TerminationDate] of an employee. That column will be NULL for everyone who is still working and have a valid date for folks who have, or were, terminated. Therefore a [Terminated] bit column is not needed.

    Then there is the whole aspect of indexing on bits. SQL Server is a quite fine product (all software has bugs so don't start) but even the greatest software has to make compromises. Indexing on a bit suffers. I'll leave further reading to you. There are many good articles and books.

    Bits are tempting. Recent versions will cram 8 bits into one byte of actual storage. Sounds cool. You create the first one at design time. You are sucked into a false sense as adding the next seven are very fast (only the meta data is changed). Now you add the ninth and wait as every page in your table is rewritten.

    Ok so I'm willing to accept that bit flags should be used when appropriate. For example when you want to know if a professor has tenure or not, but don't care when they got it. At that point putting in a date field is wasteful. Or possibly a better example would be if you are storing the results of a questionnaire and have a number of true/false or yes/no questions. It's really all about what your data requirements are.

    I haven't read about the indexing problem of bit's but I certainly will at this point. Either way though I don't imagine it being all that helpful as the cardinality of a bit is bound to be terrible.

    As far as that 9th bit column. I could be wrong (and that happens far more frequently than I would like) but wouldn't the 9th bit column (adding 1 byte to the row) be the same effect as adding a char(1), tinyint, or really any data type?

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

  • thisisfutile (9/11/2012)


    Like in a car, we don't reference left and right sides because that's relevant to whether or not I'm looking at it from the back or from the front. Of course, referencing driver-side and passenger-side is different based on the country you live in. Now that I think about it, like boats, I'm all for using port and starboard for cars. 😛

    We use 'near side' for the side nearest the kerb and 'off side' for the other side. Then it doesn't matter which country you are in or where the driver and front seat passenger are in the car. True, this doesn't work for off-road driving but that's something I have no experience of so I'm not bothered.

  • Often they are used to replicate data. Case in point is the [TerminationDate] of an employee. That column will be NULL for everyone who is still working and have a valid date for folks who have, or were, terminated. Therefore a [Terminated] bit column is not needed.

    I would regard this as overloading the [TerminationDate] column by giving it two meanings (terminated yes or no and date thereof). In addition to a column on the main table that states the current status, I would generally have a table with a foreign key of the main table and two columns of 'status' and 'date/time of status' (maybe others such as 'who set the status', etc.) so I could view all changes of status in chronological order, thus allowing for a status going to one value and then to another. For the employee, I could then see the dates of actions such as 'hired', 'suspended', 'terminated', 'rehired'. I can add new status by allowing new values in the 'status' column on the subsidiary table rather than having to add new columns for each new status to the main table.

  • Preserved and Unpreserved:

    I often use the terms "preserved" and "unpreserved" table. I think it's from some ORACLE documentation that I read about 15+ years ago, but I find it useful.

    Bit Columns:

    In designing a truly normalized schema, I would do my best to avoid having a bit column in a table. Joe's example of Termination_Date versus Terminated_Y_N is a good one. It's a more "correct" way of getting to a fully normalized solution.

    Also, you can create an index on a Bit column, but it would be pretty much useless, as its not at all selective - the optimizer will probably never use it.

    Having said all that, I still liked the article - you got across the main point, and I liked the way you used the FULL JOIN to show the pattern and illustrate the principles involved.

  • It seems to me that a lot of people is not getting really HOW a query is executed.

    But that should be querying 101.

    I would just point out the T-SQL Querying book by Itzik Ben-Gan, but let me recap.

    A T-SQL query is composed by clauses, those are:

    SELECT TOP(x) ....

    FROM ...

    WHERE ...

    GROUP BY ....

    ORDER BY ...

    But this is a description of the query, not the real execution order.

    If we where writing queries with clauses in the exact execution order, it would be in this way:

    FROM ...

    WHERE ...

    GROUP BY ...

    SELECT ...

    ORDER BY ...

    TOP ...

    This is why you can write "Order by 1" and have the data sorted by the first column: because when the order by gets executed, the selected data is already there. However, you cannot use the expressions in the SELECT clause in the WHERE, FROM or GROUP BY clauses, because when those are executed, the select has still to be materialized.

    What seems to trouble a lot of people is failing to grasp that the ON clause and the WHERE clause are executed in different moments. And, in between them, the OUTER kicks in.

    So, the ON is used to match data for the join, and AFTER the join is evaluated, 1 of 4 things happens:

    - if the join is INNER, nothing happens and no data is appended. This is why an INNER join will be always at least as fast as an outer

    - if the join is LEFT, RIGHT or FULL OUTER, the non-joining data from the related table(s) will be added to the result set.

    After this result set given by the FROM clause is built, the WHERE kicks in. No magic or difficult considerations.

    Just always, ALWAYS remember the order of execution.

    Everything else in the article is, IMHO, unnecessary clutter.

  • "Mr Pepper" ? Surely that's the only real Doctor present.

    I found this thread both amusing and enlightening on many levels.

  • I still get around 1 or 2 questions a week on this subject. Which is why I wrote the article in the first place.

    To my mind, that's an excellent reason for writing the article. Many people using SQL are not expert and an article like this helps them learn.

  • @kenneth Fisher: Thx for the article, it's most certainly something that gets a lot of people confused and the given examples are a great help to show what's going on.

    On top of that, the part

    “Why is that! “ I hear some of you cry. The rest are split between 75% who already know the answer, 15% who don’t care, and the remaining 10% are still trying to figure out what an OUTER JOIN is and why I’m going on about it.

    will have me smiling for the rest of the day =)

    @celko: Although I understand the need to 'preach against bad design', IMHO you're picking the wrong fight here and most certainly could have used more respectful wording.

  • Right on chiesa.alberto. I was going to make the same suggestion. Itzik Ben-Gan's books are exceptionally informative and I highly recommend using them.

    Here is a list of his books (or books to which he contributed or collaborated) for sale on Amazon:

    http://www.amazon.com/Itzik-Ben-Gan/e/B001IGQENW/ref=sr_ntt_srch_lnk_1?qid=1389971280&sr=8-1

  • I try to visualize any join as a Venn diagram. To me it is a great teaching tool--especially if your pupil knows statistics.

  • I always think this is a really useful thing to know with SQL because it can cause so many issues if someone isn't aware of it.

    I've had it explained to me as to do with the order in which SQL server processes queries. The very first thing that is processed is the 'from' clause (including all the joins), and then the 'where' clause next. this means that the where clause is applied to all rows in the 'from' clause, including those on the left table.

    I often update a table based on a left join using an isnull with a variable from the right table.

    e.g.

    (update a

    set a.val = isnull(b.val,0)

    from a left join b

    on a.ID = b.ID

    and b.val >3)

    this will set a.val to 0 where there is no val in b that's more than 3 or no corresponding entry in b, but a where clause with 'where b.val>3' would discount all those rows and leave me with a load of nulls. It effectively makes the left join pointless

  • Just always, ALWAYS remember the order of execution.

    I added the emphasis. Yes, understanding this execution order is why I have been able to write good performance filtered queries with no entries in the WHERE clause at all. I have shown these around our office and folks are calling "Where is your WHERE?" Further since the WHERE takes place after the JOINS (in the FROM) moving the predicates to the JOINS can reduce memory consumption.

    I have a rule that I keep in mind while designing queries: JOINS put stuff into the result set. WHERE takes stuff out of the result set.

    ATBCharles Kincaid

  • I am still having trouble understanding the Sql JOINS. I have used to the insert query that you provided and trying to figure it how it is done. Please anybody have simpler solution and suggestion on how to learn different joins. For some reason I am always confused what is the ultimate results by getting lost in the query.

  • This is a fresh look on the topic posted 2 years ago.

    In the provided example the filter "AND (Class.ClassYear >= 2011 OR Class.ClassYear IS NULL)" in WHERE clause works.

    But in general, it does not. Let's enhance the example with "Class" table and make column "ClassYear" nullable. It is wierd, but still possible.

    Professor could be given a Class but the year of that Class is not yet recorded.

    Let's insert couple of records into the "Class" table with NULLs for "ClassYear" field.

    INSERT INTO Class (ProfessorId, ClassName, ClassYear, ClassSemester)

    VALUES (1, 'Pitza Hut Delivery', null, 1)

    INSERT INTO Class (ProfessorId, ClassName, ClassYear, ClassSemester)

    VALUES (3, 'Pepsi for life', null, 2)

    Run the quiery with filter on the JOIN first and then run the quiery with filter in WHERE clause.

    You will be suprised to se wrong result for the second quiery run.

Viewing 15 posts - 31 through 45 (of 63 total)

You must be logged in to reply to this topic. Login to reply