In any non-trivial task, developers need to join tables together. Joins are important when we want to:
There are certainly variations and extensions of why joins might be needed, but the above 3 cover most of the circumstances.
There is an alternative to using joins in some scenarios – subqueries. In instances where the goal is to filter on a restriction, a subquery can be used. Consider the following 2 SQL statements [using the Adventureworks database]:
-- Use a Subquery SELECT * FROM AdventureWorks.Person.Address WHERE StateProvinceID IN ( SELECT StateProvinceID FROM AdventureWorks.Person.StateProvince WHERE StateProvinceCode = 'CA' ) -- Use a Join SELECT addr.* FROM AdventureWorks.Person.Address addr INNER JOIN AdventureWorks.Person.StateProvince state ON addr.StateProvinceID = state.StateProvinceID WHERE state.StateProvinceCode = 'CA'
These are both functionally equivalent. Which of these is better? In this trivial example, both are fine. However, when query speed becomes an important factor, joins are generally going to outperform subqueries. I’ll explore this issue more in my next post.
Also, the queries, as-is, use select *. This is generally not a good practice – instead, a column list should be provided in the select statement. However, having the select * raises one important difference between the 2 queries above – the columns from the joined table will be available in the result set whereas the columns from the table in the subquery will not. This is why I specified the table alias in the 2nd query [for the select *] to make the results display equivalently.
With the join, notice there are two conditions – one that the StateProvinceID must be equal and one that the state code needs to be ‘CA’. Why did I put the StateProvinceID check in the ‘on’ clause and the state code check in the ‘where’ clause? I think doing it this way is cleaner. When I join tables, I consider there to be three types of join conditions:
It is also possible to join the tables without any 'on' condition. For example, the following is functionally equivalent to the above two statements:
SELECT addr.* FROM AdventureWorks.Person.Address addr, AdventureWorks.Person.StateProvince state WHERE addr.StateProvinceID = state.StateProvinceID AND state.StateProvinceCode = 'CA'
This is an implicit join with a comma separating the joined tables and all conditions in the where clause. The original join [with an 'on' condition] is an explicit join. Is the implicit join syntax recommended? Functionally and speed-wise, these two are identical. However, for queries with a number of tables being joined, the implicit syntax can become unmaintainable and difficult to debug/extend. I always prefer the explicit join syntax.