Optimizer Join Methods

SQL Server uses three join operators to process joins within a query. For smaller result sets, the optimizer might use a nested loop join but for larger results, a hash join or merge join may be used. The Query Optimizer always tries to identify the fastest way to join tables and will sometimes rearrange or sort tables in order to utilize a faster join method.

Nested loops

The nested loop join, also called a nested iteration, is the preferred join method for simple queries. The nested loop join operator has two inputs: an outer table and an inner table. When viewed in an execution plan, the top input for the nested loop join operator is the outer table while the bottom input to the nested loop join operator is the inner table. The outer table is located just below the nested loop operator in the execution plan.

The nested loop can be thought of as a set of loops, with the outer table being the outer loop and the inner table being the inner loop. The looping mechanism that a nested loop uses takes a row from the outer table and uses that row to scan the inner table for matching rows. If there is a matching row, the output is produced and if there is no matching rows; the mechanism will just go back to the outer table again and take the next row, repeating the scan of the inner table for matches. This process continues until all the rows in the outer table have been processed.

You can see how many times the inner table has been scanned by viewing the Executes column for the inner table in the execution plan. This will tell you how many rows of data exist in the outer table. Generally, the inner table or outer table selection for the nested loop is not important, but the optimizer tries to choose a smaller table for the inner loop if there is no index on both of the tables.

In cases where the inner table is very large, there is a possibility that the data pages which are accessed for the inner loop scan will be discarded between scans due to memory constraints on the server. The optimizer tries to see if the smaller table will be useful for the inner loop to keep the number of data pages small which may result in the pages not being flushed from the cache. If this will not work, it will choose another inner table, if there is no index on it. If there is an index, then it might choose a bigger table as an inner table just to speed up the scan process.

The optimizer might choose to perform a sort on some occasions to improve the seeking in the inner input. If we are joining the Customer table with the Orders table in the Northwind database, the customerIDs in the Orders table are random. If the Orders table is chosen as an outer table, then the outer loop takes each row of the outer table and searches the inner table, and then tries to find the match for the outer table. Because the customer IDs are not ordered in the outer table, the optimizer goes to the inner table may have to bring back data page that were previously placed into the cache for an earlier outer loop row if that data has been discarded. In this situation, it would be very useful for the optimizer to perform a sort on the Order table so that all the CustomerIDs will be group together allowing the data pages to be used multiple times in a row. So if you see a sort on the outer table in the execution plan, it's not bad. It's there to improve seeking capabilities or to adjust the performance of the query.

Nested loop joins are good in the fact that they consume very little memory and are usually superior to merge and hash joins on smaller inputs. One rule of thumb is to try to make sure that the joining columns always have an index on them, which may allow the optimizer to utilize a nested loop.

Example of Nested Loop Join

USE pubs

SET STATISTICS PROFILE ON

SELECT a.au_id, a.au_fname + ' ' + a.au_lname, ta.title_id

FROM dbo.authors a

INNER JOIN dbo.titleauthor ta

ON a.au_id = ta.au_id

Execution Plan (Abridged)

SELECT a.au_id, a.au_fname + ' ' + a.au_lname, ta.title_id FROM dbo.authors a

INNER JOIN dbo.titleauthor ta ON a.au_id = ta.au_id

|--Compute Scalar(DEFINE:([Expr1002]=[a].[au_fname]+' '+[a].[au_lname]))

|--Nested Loops(Inner Join, OUTER REFERENCES:([a].[au_id]))

|--Index Scan(OBJECT:([pubs].[dbo].[authors].[ncl_authors_phone] AS [a]))

|--Index Seek(OBJECT:([pubs].[dbo].[titleauthor].[auidind] AS [ta]), SEEK:([ta].[au_id]=[a].[au_id])

Merge Join:

Like the nested loop join, the merge join has two inputs with indexes being required on both inputs. The merge mechanism gets data rows from both inputs and compares each row of data. A matching row is created if both compared rows are equal. If the rows from both inputs are not equal, the row of data with the lower value is discarded and another row is obtained from the same input as the discarded row. The merge mechanism continues until all rows of data from both inputs are processed and all data has been merged into one result set.

With merge joins you might see a MANY-TO-MANY:() merge join in predicate along with the merge join operator. This means that the merge mechanism is performing a MANY-TO-MANY join within the tables. Often during the MANY-TO-MANY join operation, temporary tables are created for the MANY-TO-MANY join. This table creation will usually display a Table Spool operator just above the outer table as well as just above the inner table. The reason it generates this Table Spool is, in the case of many to many, it has to revisit the duplicate rows again; it has to rewind to the first duplicate value again. So in case of rewinding or revisiting the duplicates, it will be much more optimized if a temporary table is created and it visits those keys in the temporary table, rather than going back to the data page and reading the data row there.

Merge joins will be considered for all join clauses, except for CROSS JOIN and FULL JOIN clauses, with at least one of the joined columns having unique data. The presence of an ORDER BY clause will increase the chance that a merge join will be used. Merge joins are very fast if the columns are pre-indexed or pre-sorted. The optimizer might perform sorting on the fly, even if the columns are not in a sorted order, based on the cost if it decides if this sort is less costly than a corresponding nested loop join or hash join.

Example of Merge Join

SET STATISTICS PROFILE ON

USE northwind

SELECT * FROM dbo.orders o

INNER JOIN dbo.[order details] od

ON o.orderid = od.orderid

Execution Plan (abridged)

SELECT * FROM dbo.orders o INNER JOIN dbo.[order details] od ON o.orderid = od.orderid

|--Merge Join(Inner Join, MERGE:([o].[OrderID])=([od].[OrderID]), RESIDUAL:([od].[OrderID]=[o].[OrderID]))

|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o]), ORDERED FORWARD)

|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Order Details].[PK_Order_Details] AS [od]), ORDERED FORWARD)

Hash Join

As with the other join types, hash joins takes two inputs, a build input and a probe input with the smaller input being the probe input. When viewing an execution plan, the build input will be shown as the upper input and the probe input is shown as the bottom input. Hash joins utilizes two phases: the build phase and the probe phase.

The build phase consists of the build input being used to build the hash buckets using a hashing function. Hash buckets are like cells in memory, with the buckets being used to group the data into different groups according to the hash keys.

This grouping takes place when each row of data is taken from the build input on the tables and a hash function is performed to come up with the hash key. The hash key is an expression which consists of the set of columns in the equality predicate. In grouping operations, the columns of the group by list are the hash key. In set operations such as intersection, as well as in the removal of duplicates, the hash key consists of all columns in GROUP BY clause are used as the hash key.

If the hash key has a unique contiguous key, the key is know as minimal perfect hashing with every hash having its own slot with no gaps between the slots in the hash index. If the hash key is unique but noncontiguous, it is known as a perfect hashing with every hash having its own slot but there will be gaps between the slots in the hash index.

Whichever type of hash key is created, the hash key is used to place the rows of data into different hash buckets. With the possibility of having multiple hash buckets whose values are stored as a linked list and these linked lists are scanned when the probe input comes.

The probe phase consists of the optimizer taking one record at a time from the probe input and evaluating the hash key from the probe input to the hash buckets. The optimizer finds the appropriate hash bucket and scans all the values in that bucket by scanning the linked list values. During the scanning, the optimizer also tries to find out if there is a match to the hash key. If there is a match, an output is produced. If there is no match, then the optimizer continues with the next full value.

Because the optimizer has to build the hash tables, the build input is completely read before actually reading the probe input. Then the optimizer goes to the probe table and reads all the rows in the probe input. The optimizer generally chooses a small table as a build table because all of the hash tables will be stored in memory. When the optimizer can place all the hash buckets into memory, the hash is called a “In-Memory Hash Join”. This fact makes hash joins generally memory-intensive.

Ideally, all these hash tables will be sitting in the memory itself, if they're small enough but if they're not small enough, SQL Optimizer will partition the build input and the probe input into multiple files and store them on a disk with each of these partitions being brought into the memory as they're needed. Using the hash function on the hash keys guarantees that any two joining records must be in the same pair of files; therefore, the problem of joining two large inputs has been reduced to multiple instances of the same problem, but of smaller size - a prototypical “divide-and-conquer” algorithm. In other words, we simply apply the same algorithm again to each pair of partition files -with a modified hash function, of course. This type of hash join using multiple files is called a “Grace Hash Join” and since the disk is involved, additional I/O are needed to get the partitions from the list to the memory. This process will cause performance degradation.

If the build input is so large that inputs for a standard external merge sorts would require multiple merge levels, multiple partitioning steps and multiple partitioning levels, a Recursive Hash Join will be used. If only some of the partitions are large, additional partitioning steps are used for only those specific partitions and in order to make all partitioning steps as fast as possible, large, asynchronous I/O operations are used so that a single thread can keep multiple disk drives busy.

If the build input is larger but not a lot larger than the available memory, elements of in-memory hash join and grace hash join are combined in a single step, producing a hybrid hash join.

Sometimes a residual predicate may be involved. If so, it will also use additional columns for when it is trying to compare the probe input value with a build probe input value. It generally uses a hash key value of the probe input versus the hash key value of the build input. There are some situations where it has to do a comparison of the actual column values of the probe input versus the actual column values of the build input. So in those situations you might see a residual predicate in the execution plan.

During the phases of the hash join the SQL Server optimizer might determine that the wrong table was used for the build input. In this case the optimizer will reverse the build and probe inputs in a process called “Role Reversal” in order to assure that the smaller of the two inputs is being used for the probe input.

Hash joins are very useful for ad hoc types of queries. Ad hoc queries are the queries for which we don't know what type of columns are used in the WHERE clause, so we cannot have indexes on all the columns. There is no index on a column, and those columns are mentioned in the WHERE clause, then hash join is the best one, because it scans the whole table and then performs the join on those tables.

When both join inputs are large and of similar size, performs as good as merge join, but when join inputs are large but differ significantly in size, a hash join will usually outperform a merge join by a fair margin. Remember, hash joins are both memory and CPU intensive and in the case of grace and recursive hash joins they can also require additional I/O costs.

Example of Hash Join

USE pubs

SELECT * INTO #ttable1 FROM dbo.authors

SELECT * INTO #ttable2 FROM dbo.titleauthor

SELECT a.au_id, b.title_id

FROM #ttable1 a

INNER JOIN #ttable2 b

ON a.au_id = b.au_id

drop table #ttable1

drop table #ttable2

Execution Plan (Abridged)

SELECT a.au_id, b.title_id FROM #ttable1 a INNER JOIN #ttable2 b ON a.au_id = b.au_id

|--Hash Match(Inner Join, HASH:([a].[au_id])=(.[au_id]), RESIDUAL:(.[au_id]=[a].[au_id]))

|--Table Scan(OBJECT:([tempdb].[dbo].[#ttable1_____________________________00000000001A] AS [a]))

|--Table Scan(OBJECT:([tempdb].[dbo].[#ttable2_____________________________00000000001A] AS ))

In this case, a simple index on the join columns will change the join method to a different type of join instead of a hash join.

CREATE NONCLUSTERED INDEX ncl_ttable1 ON #ttable1(au_id)

Execution Plan (Abridged)

SELECT a.au_id, b.title_id FROM #ttable1 a INNER JOIN #ttable2 b ON a.au_id = b.au_id

|--Nested Loops(Inner Join, OUTER REFERENCES:(.[au_id]))

|--Table Scan(OBJECT:([tempdb].[dbo].[#ttable2_____________________________00000000001A] AS ))

|--Index Seek(OBJECT:([tempdb].[dbo].[#ttable1_____________________________00000000001A].[ncl_ttable1]

Summary

Understanding the different types of joins used by the optimizer will help developers and DBAs understand how the optimizer is routing their queries. Developers often create queries without knowing that it would only take a few “tweaks” to produce an execution plan that utilizes one optimizer join method over another. These small “tweaks” can have dramatic effects on the optimization of the query and the ultimate satisfaction of the query by the end-users.

Knowing how to read and understand execution plans and the joins used by the optimizer can be one of those “tools” that a DBA can have in their tool chest that separates them from all the other DBAs in their company.

About the author

I started working with SQL Server in the mid 1990's after spending time as both a Visual Basic developer and Microsoft Access developer. Numerous projects upsizing Access to SQL Server lead me to become a full-time SQL Server DBA and I have remained one ever since.

I have a large variety of experiences dealing with SQL Server, Oracle, Sybase, DB2, Access, and other database platforms over the years and have worked with environments that had over 30 Terabytes of data and environments that had 100's of database with only a few megabytes of data. During this time I have worked as a production DBA, a development DBA, a mixture of both, and as a database architect and have enjoyed all of those positions.

You can find more about me and works on one of my web sites:

www.TransactSQL.Com

www.Database-Security.Info

blog.TransactSQL.Com

blog.Database-Security.Info

Optimizer Join Methods