Blog Post

T-SQL Tuesday #23 – Joins

,

There is something wrong in the SQL force today… here it is the first Tuesday of a month, and we’re doing a T-SQL Tuesday. There was another recent disturbance in the SQL force when we had a T-SQL Wednesday. Oh my, the SQL force is having a lot of recent disturbances. The good news is that this disturbance is for a good reason… many of the SQL bloggers are going to be at PASS Summit 2011, so T-SQL Tuesday is being moved up a week.

I’ve been reading the T-SQL Tuesday blog posts for quite some time. Until recently, I hadn’t jumped onto the blogging train – I’ve been content to just sit back and lurk. I’ve just started blogging, so this is my first ever T-SQL Tuesday blog. I’m hoping that I do it some justice.

This month T-SQL Tuesday is being hosted by Stuart Ainsworth (Blog | @codegumbo), and he has selected the topic of JOINS.

JOINS

When considering what to write about JOINS, the thing that strikes me is the different ways with which you can write a JOIN statement. The most conventional method that everyone is familiar with is:

USE AdventureWorks
GO
SELECT p.FirstName, p.LastName, d.Name
  FROM HumanResources.Employee e
       JOIN HumanResources.EmployeeDepartmentHistory edh
         ON e.BusinessEntityID = edh.BusinessEntityID
       JOIN HumanResources.Department d
         ON edh.DepartmentID = d.DepartmentID
       JOIN Person.Person p
         ON e.BusinessEntityID = p.BusinessEntityID;

Followed by just listing the tables in the from clause, and using the where clause to join them together:

USE AdventureWorks
GO
SELECT p.FirstName, p.LastName, d.Name
  FROM HumanResources.Employee e,
       HumanResources.EmployeeDepartmentHistory edh,
       HumanResources.Department d ,
       Person.Person p
 WHERE e.BusinessEntityID = edh.BusinessEntityID
   AND edh.DepartmentID = d.DepartmentID
   AND e.BusinessEntityID = p.BusinessEntityID;

(Note that this method used to be how all joins were performed. LEFT and RIGHT joins used a *= and =* syntax, which is now obsolete.)

A third method that you may see is to list all the tables with the JOIN clauses first, then to specify all of the ON conditions:

USE AdventureWorks
GO
SELECT p.FirstName, p.LastName, d.Name
  FROM HumanResources.Employee e
       JOIN Person.Person p
       JOIN HumanResources.EmployeeDepartmentHistory edh
       JOIN HumanResources.Department d
         ON edh.DepartmentID = d.DepartmentID
         ON p.BusinessEntityID = edh.BusinessEntityID
         ON e.BusinessEntityID = p.BusinessEntityID;

The tricky part about this method is that you have to list your ON conditions in the reverse order of the JOIN conditions – the first ON condition is for the last two tables, the second ON condition is for the prior table, etc., with the last ON condition joining the first two tables.

Would you believe that there actually is a time when you might want to use this method? It requires three tables and the situation is that you want to always retrieve all records from the first table. You want to retrieve the records from the second table that link to a record to the first table, but if and only if there is a linked record from the second table to the third table.

Right about now, I’m hearing a bunch of people going “Huh???”, so let me give you an example. You have a products table, from which you want to return all rows. Products might be included in an invoice table. And the invoice might be in a shippers table (indicating that the products in the invoice are being shipped). We want to display all products, and if it is in an invoice that is being shipped, show the invoice and shipper. (If the product is in an invoice that isn’t being shipped yet, do not show either the invoice or the shipper.)

DECLARE @Products TABLE (
  ProductId INTEGER,
  Name VARCHAR(20));
 
DECLARE @Invoices TABLE (
  InvoiceId INTEGER,
  ProductId INTEGER);
 
DECLARE @Shippers TABLE (
  ShipperId INTEGER,
  InvoiceId INTEGER);
 
INSERT INTO @Products (ProductId, Name)
SELECT 1, 'Widget, Small' UNION ALL
SELECT 2, 'Widget, Medium' UNION ALL
SELECT 3, 'Widget, Large';
 
INSERT INTO @Invoices (InvoiceId, ProductId)
SELECT 1, 1 UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 3;
 
INSERT INTO @Shippers ( ShipperId, InvoiceId )
SELECT 1, 1 UNION ALL
SELECT 3, 3;

To get the results we want, we would use:

SELECT p.ProductId,
       p.Name,
       i.InvoiceId,
       s.ShipperId
  FROM @Products p
       LEFT JOIN @Invoices i
          JOIN @Shippers s            ON i.InvoiceId = s.InvoiceId         ON p.ProductId = i.ProductId;

 

Messing with you

For all of those hard-working DBAs with never enough time to get things done, we even have the ability to completely mess up your head (and waste your time) by allowing these methods to be intermixed:

USE AdventureWorks
GO
SELECT p.FirstName, p.LastName, d.Name
  FROM HumanResources.Employee e
       JOIN Person.Person p
         ON e.BusinessEntityID = p.BusinessEntityID,
       HumanResources.EmployeeDepartmentHistory edh,
       HumanResources.Department d
 WHERE e.BusinessEntityID = edh.BusinessEntityID
   AND edh.DepartmentID = d.DepartmentID;

and

USE AdventureWorks
GO
SELECT p.FirstName, p.LastName, d.Name
  FROM HumanResources.Employee e
       JOIN HumanResources.EmployeeDepartmentHistory edh
       JOIN HumanResources.Department d
         ON edh.DepartmentID = d.DepartmentID
         ON e.BusinessEntityID = edh.BusinessEntityID
       JOIN Person.Person p
         ON e.BusinessEntityID = p.BusinessEntityID;

 

If you write code like this, you deserve to be the one to have to go back later and modify it.

Conclusion:

The conventional method is probably the most easily read and understood by people, and thus why it is so widely used.

The biggest problem for me with performing the joining in the where clause is the ease at which you can accidentally omit a table’s join condition… leading to a CROSS JOIN situation. When using the conventional method, this would be easy to spot… heck, if you forget the ON condition, SQL Server will just throw an error at you.

The third method is just plain confusing to me… and it is harder to deal with. Since you are using the JOIN syntax, you do prevent the accidental CROSS JOIN… but having to put all of those ON conditions in reverse order makes it confusing to figure out later on when you have to troubleshoot or enhance this query. However, there actually is a time when you might need to use this format (or it’s “Intermixed #2″ cousin). If you do end up using this format, please do me a favor… make it obvious that something different is going on. Show those joins with the code being in parenthesis, or indented. Or both. Something to show that it is different, like I did in the highlighted lines above.

The SQL Force strengthens!

Next month T-SQL Tuesday will be returning to the second Tuesday of the month. I think I’m starting to feel the SQL force strengthening…

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating