Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

The Smiling DBA

Thomas LeBlanc ( MCITP 2005/2008 & MCDBA 2000) is a Senior SQL Server DBA at Turner Industries, LLC in Baton Rouge, LA. He has been in the IT field for 21 years starting as a COBOL programmer, graduating to dBase, FoxPro, Visual FoxPro, upgrading to Visual Basic versions 3-6 and even some .Net(C#). Designing and developing normalized database has become his passion. Full-time DBA work started about 9 years ago for Thomas while working at a Paper Mill in St. Francisville, LA continuing with IEM, then Amedisys. Performance tuning and reviewing database design and code was an everyday occurrence for DBAs at Amedisys. Thomas’ free time is spent helping those less fortunate and improving his relationship with his family and God.

Execution Plans – Merge Join

The Merge Join is a Physical Operation when joining 2 sets of data that are in the same order.

There is an interesting Clustered Index in the Adventure Works database. The SaleOrderDetail table’s primary key is a combination of the SalesOrderID and SalesOrderDetailID, not just the identity field of the table (SalesOrderDetailID). Most developers I work with usually create the primary key (by default is the clustered index) on the Identity fields with a unique constraint on the business key of the table.

The compound primary key might take more space, but it looks to help query plans have more options when optimizing.

The following query will use a Merge Join after doing a clustered index scan on SalesOrderHeader and SalesOrderDetail.

SELECT *
    FROM Sales.SalesOrderHeader soh
        INNER JOIN Sales.SalesOrderDetail sod
            ON sod.SalesOrderID = soh.SalesOrderID

image

Now, of course, no one does a SELECT *, right? Maybe not, but you can see that since the first column in the SalesOrderHeader and SalesOrderDetail clustered indexes is the SalesOrderID, the query optimizer can Merge the 2 data sets together because they are in the same order. The optimizer knows this is the best join.

A benefit to the Merge Join operation is once the first piece of data (SalesOrderID) is merged with Detail data, the data can be passed to the next operator, which can improve performance if further processing is needed.

Say I want a list of SalesOrderIDs by ProductID and retrieve the Product Name from the product tables

SELECT sod.SalesOrderID,
        p.ProductID, p.Name
    FROM Production.Product p
        INNER JOIN Sales.SalesOrderDetail sod
            ON sod.ProductID = p.ProductID
   

image

The SalesOrderDetail Scan is on a non-clustered index on ProductID. This is how the Merge is able to be used with the Clustered Index scan of the Product table

Below is the info on the Scan Operators which shows the Object (index) used in the operator.

imageimage

What happens when we add a column to this query – Order Qty.

SELECT sod.SalesOrderID, sod.OrderQty,
        p.ProductID, p.Name
    FROM Production.Product p
        INNER JOIN Sales.SalesOrderDetail sod
            ON sod.ProductID = p.ProductID
   

image

The optimizer determines the Hash Match physical join is more efficient. The Cost is 1.7226.

An option here is to create a covering index by adding the OrderQty column to a non-cluster index using the INCLUDE statement of the Create Index statement.

CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductIDIncludeOrderQty] ON [Sales].[SalesOrderDetail]
    ([ProductID] ASC )
    INCLUDE (OrderQty)

image

You can force a Merge Join with a query hint to see the difference without the Covering Index.

-- Force Merge join
SELECT sod.SalesOrderID, sod.OrderQty,
        p.ProductID, p.Name
    FROM Production.Product p
        INNER MERGE JOIN Sales.SalesOrderDetail sod
            ON sod.ProductID = p.ProductID
  

The Cost is now  6.3887 and the query runs with parallelism. And, a Sort operator (74% of query cost) is needed if you do not have the covering index above.

image

Query Hints should be left to the experts.

If the data sets are not in the same order based on the Index (clustered or non-clustered), the optimizer can still use a Sort operator to order the data sets in the same logical order to use a Merge Operation.

The AdventureWorks database is available from CodePlex at http://msftdbprodsamples.codeplex.com/

I first learned Execution Plans from a free PDF provided by Red Gate with author Grant Fritchey. You can get this PDF at http://www.simple-talk.com/sql/performance/execution-plan-basics/

Happy Query Tuning from Louisiana!!!

Comments

Leave a comment on the original post [thesmilingdba.blogspot.com, opens in a new window]

Loading comments...