SQL & the JOIN Operator

  • Yet another article that makes people think that table aliases (t1 and t2) are part of the required syntax. This first Join statement:

    SELECT t1.key1, t1.field1 as Name, t1.key2 as T1Key,

    t2.key2 as T2Key, t2.field1 as City

    FROM Table1 t1

    INNER JOIN Table2 t2 ON t1.key2 = t2.key2

    Should be written like this:

    SELECT Table1.key1, Table1.field1 as Name, Table1.key2 as T1Key,

    Table2.key2 as T2Key, Table2.field1 as City

    FROM Table1

    INNER JOIN Table2 ON Table1.key2 = Table2.key2

    Isn't that easier to read?

    In this simple example, why in the WORLD are you including table aliases for Table1 and Table2? They are COMPLETELY unneccessary. When new users are exposed to table aliases in this kind of setting, they naturally think the table aliases are required.

    It gets more important when the tables are named ClientStatus and ClientOrders and people start aliasing them as "cs" and "co", or even worse, as "a" and "b".

    PLEASE don't teach that table aliases are a required part of the syntax. When they are required, they are required, but not in this simple example, and here they add nothing -- and they DETRACT from human readability -- in such simple examples. The human mind has to keep track of which alias goes with which table, and it gets hard when there are 5 or 6 tables involved.

    It's also good to see a <= join here. That's often left out of examples. Don't forget to also teach that you can join tables on two conditions (using AND/OR), not just one.

    Thanks.

    David Walker

  • David Walker-278941 (10/7/2009)


    Yet another article that makes people think that table aliases (t1 and t2) are part of the required syntax. This first Join statement:

    SELECT t1.key1, t1.field1 as Name, t1.key2 as T1Key,

    t2.key2 as T2Key, t2.field1 as City

    FROM Table1 t1

    INNER JOIN Table2 t2 ON t1.key2 = t2.key2

    Should be written like this:

    SELECT Table1.key1, Table1.field1 as Name, Table1.key2 as T1Key,

    Table2.key2 as T2Key, Table2.field1 as City

    FROM Table1

    INNER JOIN Table2 ON Table1.key2 = Table2.key2

    Isn't that easier to read?

    In this simple example, why in the WORLD are you including table aliases for Table1 and Table2? They are COMPLETELY unneccessary. When new users are exposed to table aliases in this kind of setting, they naturally think the table aliases are required.

    It gets more important when the tables are named ClientStatus and ClientOrders and people start aliasing them as "cs" and "co", or even worse, as "a" and "b".

    PLEASE don't teach that table aliases are a required part of the syntax. When they are required, they are required, but not in this simple example, and here they add nothing -- and they DETRACT from human readability -- in such simple examples. The human mind has to keep track of which alias goes with which table, and it gets hard when there are 5 or 6 tables involved.

    It's also good to see a <= join here. That's often left out of examples. Don't forget to also teach that you can join tables on two conditions (using AND/OR), not just one.

    Thanks.

    David Walker

    i have seen the same thing

    the rule is you write code to look cool, hardware is cheap

  • I created this article/test page a number of years ago that describes Inner, Left, Right, Full, Cross, Triple, Self, Union and Union All queries. Joins[/url]

  • SQL Noob (10/7/2009)


    is there any performance difference doing a union all compared to a join when you need to get all the rows from two or more tables? i have a database where i do union all on some data in 20 tables or so when running a report and it seems to take a long time

    I'm not certain I understand your question.

    A UNION ALL combines the data from two similar sets into a single set with all members of each set.

    A JOIN matches the data in one set with the data in another set.

    Based on my current knowledge of SQL, I believe it might be theoretically possible to replicate a UNION ALL in a JOIN (although even in theory I think it would be dependent on your table structure), but it would be atypical and in all likelihood it would take longer to run than a straight "union all".

    I have not tested this theory.

    I'll admit, even as I attempt to consider an alternative that involves a JOIN, I still envision needing a UNION ALL in the query and a requirement for each record to have a unique id across all tables.

  • Excellent article Wagner! A good revision of good SQL and old but basic stuff for any developer to be called a good developer. Thanks again.

    SQL DBA.

  • wagner crivelini (10/7/2009)


    Hi, super, thanks for your comments ... (and thanks everybody for all other comments on this articles).

    I feel it's rather risky to try to find a golden rule when it comes to performance. It depends on so many variables that I'd better test things on each particular case.

    In both statements we are talking about here, we have operations that we are told to avoid... Either using comparions to NULL value when using the JOIN or using the NOT IN predicate when using the subquery.

    I guess the real advantage in using the syntax I suggested in the article is that the SELECT statement can list fields from both tables when we use the JOIN.

    Regarding performance, I was curious to check your suggestion and so I ran a "SET STATISTICS PROFILE ON;" to show performance on both statements.

    To my surprise, the Total Subtree Cost on those queries were almost exactly the same same (difference was less than 1%)

    0.0070812 for the JOIN

    0.00706536 for the subquery

    Sometimes RDBMs play a trick on us. 🙂

    Agreed, there are some other factors to consider - an index could be used (which would only really make a difference with alot of data) on the join vs. not being used on the where in...

  • David Walker-278941 (10/7/2009)


    Yet another article that makes people think that table aliases (t1 and t2) are part of the required syntax. This first Join statement:

    SELECT t1.key1, t1.field1 as Name, t1.key2 as T1Key,

    t2.key2 as T2Key, t2.field1 as City

    FROM Table1 t1

    INNER JOIN Table2 t2 ON t1.key2 = t2.key2

    Should be written like this:

    SELECT Table1.key1, Table1.field1 as Name, Table1.key2 as T1Key,

    Table2.key2 as T2Key, Table2.field1 as City

    FROM Table1

    INNER JOIN Table2 ON Table1.key2 = Table2.key2

    Isn't that easier to read?

    In this simple example, why in the WORLD are you including table aliases for Table1 and Table2? They are COMPLETELY unneccessary. When new users are exposed to table aliases in this kind of setting, they naturally think the table aliases are required.

    It gets more important when the tables are named ClientStatus and ClientOrders and people start aliasing them as "cs" and "co", or even worse, as "a" and "b".

    PLEASE don't teach that table aliases are a required part of the syntax. When they are required, they are required, but not in this simple example, and here they add nothing -- and they DETRACT from human readability -- in such simple examples. The human mind has to keep track of which alias goes with which table, and it gets hard when there are 5 or 6 tables involved.

    It's also good to see a <= join here. That's often left out of examples. Don't forget to also teach that you can join tables on two conditions (using AND/OR), not just one.

    Thanks.

    David Walker

    Actually, they ARE becoming required. The use of table names instead of aliases has been deprecated.

    Even if they weren't, short table names may work fine but not longer ones... they really clutter up the works so far as readability.

    What really needs to be taught is the correct type of table aliasing. Table aliases should mean something that a simple table name may not be able to infer as well as being shorter.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yet another article that makes people think that table aliases (t1 and t2) are part of the required syntax. This first Join statement:

    SELECT t1.key1, t1.field1 as Name, t1.key2 as T1Key,

    t2.key2 as T2Key, t2.field1 as City

    FROM Table1 t1

    INNER JOIN Table2 t2 ON t1.key2 = t2.key2

    Should be written like this:

    SELECT Table1.key1, Table1.field1 as Name, Table1.key2 as T1Key,

    Table2.key2 as T2Key, Table2.field1 as City

    FROM Table1

    INNER JOIN Table2 ON Table1.key2 = Table2.key2

    Isn't that easier to read?

    I think that's a matter of personal preference. I actually find the version with aliases to be easier to read because it's less cluttered. Obviously longer more descriptive table names will make queries even more cluttered. Also, if you work with the same tables long enough and are consistent with aliases, it becomes natural to think in terms of aliases. Kind of like how we all use nick-names.

    Finally, somebody please correct me if I'm wrong, but I'm pretty sure you have to use aliases when writing queries with correlated subqueries.

  • Andy DBA (10/7/2009)


    Finally, somebody please correct me if I'm wrong, but I'm pretty sure you have to use aliases when writing queries with correlated subqueries.

    You don't HAVE to, but it is strongly recommended. If someone doesn't alias a correlated subquery it is highly possible that incorrect results may come of it (like columns from the outer query being realized in the subquery incorrectly).

  • Andy DBA (10/7/2009)


    Yet another article that makes people think that table aliases (t1 and t2) are part of the required syntax. This first Join statement:

    SELECT t1.key1, t1.field1 as Name, t1.key2 as T1Key,

    t2.key2 as T2Key, t2.field1 as City

    FROM Table1 t1

    INNER JOIN Table2 t2 ON t1.key2 = t2.key2

    Should be written like this:

    SELECT Table1.key1, Table1.field1 as Name, Table1.key2 as T1Key,

    Table2.key2 as T2Key, Table2.field1 as City

    FROM Table1

    INNER JOIN Table2 ON Table1.key2 = Table2.key2

    Isn't that easier to read?

    I think that's a matter of personal preference. I actually find the version with aliases to be easier to read because it's less cluttered. Obviously longer more descriptive table names will make queries even more cluttered. Also, if you work with the same tables long enough and are consistent with aliases, it becomes natural to think in terms of aliases. Kind of like how we all use nick-names.

    Finally, somebody please correct me if I'm wrong, but I'm pretty sure you have to use aliases when writing queries with correlated subqueries.

    Oh yes, with correlated subqueries table alises are required. I thought I covered that in my original message -- I said that sometimes they are required, but not in this simple case, and you certainly shouldn't start out thinking that table aliases are always part of the syntax. Use them when you need them, not all of the time.

    And cluttered? T1 is easier for you top read than "Table1"? What about "xv" versus ExceptionVisits? And when you see a Join clause that includes "sg", "rr", "rp", and "rpv", what tables do they refer to? You have to refer back to the start of the query to see what they stand for, and then you have to remember that when you are reading the Select list and the Join list with the aliases. Can you juggle all of that in your head?

    The one without aliases is less "cluttered" since there is only one name that refers to each table, not two! 🙂

  • Jeff Moden (10/7/2009)


    Actually, they ARE becoming required. The use of table names instead of aliases has been deprecated.

    Even if they weren't, short table names may work fine but not longer ones... they really clutter up the works so far as readability.

    What really needs to be taught is the correct type of table aliasing. Table aliases should mean something that a simple table name may not be able to infer as well as being shorter.

    I call BS on your "deprecated" comment. Deprecated by whom? And table aliases do not "clutter" anything up. Words are easier for the human brain to read than abbreviations; I think that's well known from reading studies. We read words more often than abbreviations of those words.

    Still, BS on table names becoming "deprecated".

  • "What really needs to be taught is the correct type of table aliasing. Table aliases should mean something that a simple table name may not be able to infer as well as being shorter."

    Hey Jeff - thanks for the informative post on aliases.:-)

    As for Mr.Walker's post - as always, I take exception to the tone..you can say most anything you want without making it sound like a mini-rant.

    The article was meant to focus on join-basics -Wagner took a lot of trouble to put together an excellent overview. Jeff - seeing as how you are now an MVP and are on your way to becoming a famous author as well...could you not teach people the art of keeping an even tone..?!;-)







    **ASCII stupid question, get a stupid ANSI !!!**

  • SuperDBA-207096 (10/7/2009)


    Wagner,

    You might want to mention in the:

    "Excluding the Intersection of the Sets" section this is similar to

    select... where not in (select... from table2) but it performs alot better w/ the join vs. where not in?

    Just my 02c. Again, well done!

    Mark

    As someone mentions later, performance doesn't always go by rules of thumb. Sometimes an "In" or "Not In" query will give you the same execution plan as a Join. You have to try it with your own data and the frequency distribution of your values will affect the results. SQL 2005 and 2008 will often behave differently than SQL 2000 and earlier also.

    If you learned some performance "rules of thumb" during the heyday of SQL 7, don't apply those rules unthinkingly to SQL 2005 or 2008. SQL 2010 is in development, too.

    And if you're not yet using SQL 2008, remind your bosses that the year 2010 will be here in 2.7 months; SQL 2008 is no longer brand new. SQL 2008's built-in compressed backups usually take less space AND less time to back up and restore. But that's getting off-topic.

  • "1 Table, no Join"

    That's just false, of course. But possibly you're only concerned with the simplest possible situations.

  • Chris,

    Not exactly. The UNION / UNION ALL is kind of tough one to get wrapped around. UNION ALL combines two or more identically structured result sets into one. All rows from each result set are returned in the output. By contrast UNION (without the ALL) returns only distinct rows.

    Example: Say you were doing income tax preparation and had a table of clients for each year. CLIENT2007, CLIENT2008, CLIENT2009. Each table contains just name and taxpayer ID (SSN). Then your old buddy 'Joe Smith' '999-00-1234' came in all three years. Doing the UNION he would show up just the once but UNION ALL he wold appear three times.

    Terrible design that but I'm just trying to get over the one point.

    ATBCharles Kincaid

Viewing 15 posts - 16 through 30 (of 98 total)

You must be logged in to reply to this topic. Login to reply