Click here to monitor SSC
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Get your favorite SSC scripts directly in SSMS with the free SQL Scripts addin. Search for scripts directly from SSMS, and instantly access any saved scripts in your SSC briefcase from the favorites tab.
Download now (direct download link)

Issue with Current Merge Join algorithm and ways to Improve it

By Gulli Meel,


I assume that you know how merge join works. There is however an issue with the existing merge join algorithm in SQL server 2008. I am sure that it could be improved further to improve the perfroamcne by reducing the number of rows processed and reducing the number of IO's.

When the inner table is accessed the scan starts at the first row and the table is scanned till it find the value of join key which is more than the value of join key from outer table. As soon as the join key value is more than the outer table join key row the scan is not required. This is quite efficient algorithm to end the scan. However, the scan always starts at the first value or first row( I am assuming that there are no other sarg's which are causing the scan to start somewhere else or cause other index to be used). This part could be veru inefficient if say the last value of join key is at almost at the end of the inner table. Thus the whole clustered index or table is scanned where it might have possible that we just got say 1000 rows and thus scanning the pages for those 1000 rows.

Solution: For me the scan should start at the minimum value of join key from outer table. e.g. say we have clustered index on id and say outer table has minimu value 100000 and max value 105000.  Thus the scan should start at 100000 and end at 105000.Currently it is not happening.Currently it is starting at 1 and finsihing at 105000. Thus the rows from 1 to 99999 are uselessly processed. This is waste of CPU as well as it causes more logical reads.

Below script will show you the issue in  more detail and will show you the alternate method.

This is not to replace the nested loop join or any other costing method. This is just to show that
optimizer should start the scan from appropriate point for inner table rather than the first row while doing a merge join specially on already sorted data.
This suggestion is just to show that merge join alogorithm could be handled better to reduce the logical reads and cpu processing by determining
the start point of the scan for the inner table for merge join.

The workaround I have shown should not be used unless you do the following.
1. Make sure that other performance technique are used before going for this approach.
2. This technique requires code change as you have to find the minimum value of the clustered index key..
3. Before applying this technique benchmark performance of this method against the best plan and method for your queries.based on the result.If clustering is good between joined columns it could improve performance a lot.In other cases it will perform better than or equal to the existing merge alogorithm
   you can use this technique.
4. However, microsoft might provide an alternate for this in future then the workaround code introduced will be redudant.

For similar technique to Improve the bookmark lookup performance see the below posts.

Total article views: 1080 | Views in the last 30 days: 2
Related Articles

A Technique for Determining the I/O Hog in your Database

Performance Tuning can be as much an art as a science when working with SQL Server, but there are ma...


Using Different Techniques for SQL Server Automation

Automating SQL Server tasks is the sign of an experienced DBA. One who doesn't waste time on repetit...





Non-Clustered Indexes Query Performance

Separate Indexes or Composite or Include Columns for non-clustered index improves performance?


Clustered instance will not start after removing builtin admins

After removing the builtin administrators group the clustered services do not start. Perry Whittle w...