Introduction
The purpose of this article is to give you a working
knowledge of how to view and understand query execution plans for SQL
Server. This is part 1 in a series of
articles that will walk you through understanding execution plans to help you
improve your queries. Throughout these
articles I will use SQL hints to force SQL Server query optimizer to choose the
data retrieval methods being explained; the use of hints without careful
planning and testing will degrade performance.
I will not mention much about hints nor explain their use as this is
outside the scope of this article. I
plan to explain how to analyze execution plans to help you optimize your
queries.
To view the graphical representation for data retrieval
methods the optimizer can choose from please go to BOL (Books On Line) and
search for execution plan and double click it.
For SQL Server 7.0 choose the item titled Graphically Displaying the
Execution Plan Using SQL Server Query Analyzer. For SQL Server 2000 choose Execution Plan
Options. I recommend you have this
available as you read this article. It
will give you additional information about the topics I cover in this article.
All sample code used in this article will work when executed
in the pubs database. To view execution
plans in Query Analyzer you must click on Query and select Show Execution
Plan. Now for any query you execute in
this window you will get an additional tab in the lower part of Query
Analyzer. It will be labeled execution
plan. This is where you will find the
execution plan for each query you executed in the upper window of Query
Analyzer. For convenience I will include
diagrams that will show you exactly what I saw while writing this article. Feel free to use them and/or view them in
your own Query Analyzer by executing the code given in the examples.
General Information
Execute this query in Query Analyzer:
USE pubs
SELECT *
FROM titles
Now take a look at the execution plan in diagram
1. Above each graphical
representation of the execution plan you will find what I’ll call the query
header. In the query header you will
always find the cost of that query relative to all other queries executed in
the batch. This cost will be represented
as a percentage. In all the examples in
this article one batch will be all the queries executed in a Query Analyzer
window. The query header will also show
you the query that was executed to generate the execution plan below it.
The graphic portion of the execution plan is always read
from right to left and from top to bottom.
Thus the beginning of the execution plan is the upper right most
icon. The number of branches leading
into the top line of the execution plan increase based on the number of tables
you include in your query. All plans end
with one icon, which is the end result of the query. This will always be found in the upper left
corner, just under the query header.
You get an idea of what the query optimizer did to get the
result set just by looking at the execution plan. The plan generated for the above query should
give you two icons. The one on the right
will be the one that fetches all the rows from the table titles. On my screen I see the icon for an index
scan. Under this icon I see the table
name and the first few characters for the name of the index used. Under this I see how much cost this step took
compared with all the other steps in the query.
It should show 100% because the above query is so simple and has no
conditions that would require a filter step.
The left step has an icon that indicates it displayed results in the
result window. Under it you can see the
word SELECT and that it cost 0% of the query to display the results.
To see exactly how SQL Server got the results it displayed
you must move the mouse pointer across each icon and or line. Here is where you will get detailed
information about what the optimizer did to give you the result set it
displayed. The usefulness of such
details increases with the complexity of the query. In diagram
2 you will find the contents of the yellow boxes described below. In this, and future diagrams, I will show all
the yellow boxes to be discussed and connected them with a line to the icon or
line where you will find them in your Query Analyzer. In your Query Analyzer you will only see one
yellow box at a time.
If you just want to know how many rows and the estimated
size per row as the query moves from step to step then just move your mouse
pointer across the lines; you’ll then see a small yellow box with this info in
it. Also, the lines are thicker when
more rows are passed from one step to the next.
So a thin line indicates 0 or few rows being passed. The thicker the line the more rows that are
being passed to the next step and the slower the query will run. So one goal when developing a query is to get
the number of rows it handles down to as few as possible as early as possible,
preferably within the first or second step of the query. On my screen I see that 18 rows with an
estimated row size (total length per row) of 207 was passed from the first step
to the second step.
To get more details on what happened and why so many or so
few rows were pulled in a particular step you must move the mouse pointer on
top of one of the icons.
You will get a lot of information in a large yellow
box. This gives you details as to what
the query optimizer did to determine how many rows continue to the next step.
Inside The Large Yellow Box
You can refer to the yellow box in diagram
3 while I explain the information found inside the large yellow boxes. At the top of each yellow box you will see
the name of the step/operation. When the
operation has a different physical and logical operator you will see the
physical operator first followed by a slash (/) and then the logical
operator. Directly below this name you
will see a brief description of the operation.
Below that description you will see a list that gives you more
information about what was performed in this step.
Physical operation:
This is the operator used by the query optimizer for this step. One example would be a nested loop.
Logical operation: Is
usually the same as the physical operator.
When different, this represents a keyword in the query. One example would be INNER JOIN.
Row count: This is
the number of rows fetched by this step and passed to the next step.
Estimated row size:
This is the estimated row size passed on to the next step.
I/O cost: This is the
estimated cost for all I/O activity for a step.
The lower this number is the better.
CPU cost: This is the
estimated cost for all CPU activity.
Number of executes:
Is the number of times the step was executed.
Cost: Is the cost for
the step being viewed. The lower this
number is the better.
Subtree cost: Is the
cost of the step being viewed and all previous steps.
Arguments: Here you
will see portions of the WHERE clause, parameters, and names of objects (such
as tables and indexes) used in the step or operation.
Now that you have a basic understanding of how to read
execution plans we will look at a few execution plans. Please note that there is no guarantee you
will get the same execution plan as I am describing as the plan is influenced
by many factors including, hardware, number of processors, current SQL Server
load, number of rows in tables, indexes, etc.
I have kept the examples as simple as possible and used hints in an
attempt to force the execution plan to be the same as what I describe in my
examples. I am providing diagrams that
will allow you to see exactly what I saw when I wrote this article.
Example 1: Nested Loop Join
Execute the following query in Query Analyzer:
USE pubs
SELECT *
FROM titleauthor ta
INNER LOOP JOIN titles t ON ta.title_id = t.title_id
Now look at the execution plan in your Query Analyzer or see
diagram
4. For convenience I have placed all
four yellow boxes in the diagram so they can all be viewed at once.
Start with the upper right icon labeled
titleauthor.UPK. You will notice that
SQL Server used a clustered index to scan the table. This is similar to a table scan, since the
index is the table. You can find the
name of the index it used in the arguments section, in this case [UPKCL_taind].
If you look at the table titleauthors (in Enterprise Manager) you can see it
has 25 rows so this step pulled all the rows in that table. Looking at the number of executes you can see
that it did this with one scan of the table.
If you look at the cost section you will notice that this scan cost 88%
of the total cost to execute this query.
Next lets look at the lower left icon labeled
titles.UPKCL_ti. For this table SQL Server used a clustered index seek to seek
specified rows from that table. Again,
you can find the name of the index in the arguments section of the yellow box;
it is called [UPKCL_titleidind]. Examining the number of executes will show you
that SQL Server ran this step 25 times, once for each record it pulled from the
table titleauthors. This step took 12%
of the total cost to execute this query.
Now lets look at the center icon labeled Nested Loops. You will notice that there is almost no cost
associated with this step. This is due to the fact that this step merely told
SQL Server how to join the rows and indicated how SQL Server should scan for
the rows it needed. The work was done in
the previous two scans and then simply merged together in this step. Here is where I will explain the subtree
cost. You will notice that for the
Nested Loop step the subtree cost is .0430.
This is the sum of the cost section in the previous two steps, which are
.037685 and .005272. The subtree cost
will show you the cost of the current step and all previous steps. In this step you will notice that the
physical operation and logical operation are different. This is because the join type is an INNER
JOIN (the logical operation) and the join method used is a nested loop. If you were to change the keyword INNER to
LEFT in this example and view that execution plan you would see that the
physical operation is still a nested loop, however the logical operation has
been changed to reflect the use of the keyword LEFT instead of INNER.
The last icon in this execution plan is labeled SELECT. This step cost even less than the previous
step. This is because all the data is
ready for display and all that is required to complete the query execution is
output the results to the bottom screen of Query Analyzer.
Example 2: Hash Match Join
Now we are going to change one word in the query executed in
example 1. We will replace LOOP with
HASH. The new query will look like this:
USE pubs
SELECT *
FROM titleauthor ta
INNER HASH JOIN titles t ON ta.title_id = t.title_id
You can execute this query in your Query Analyzer or simply
refer to diagram
5. We will not go into each icon in
as much detail as before since two of them are nearly identical to the ones we
examined in example 1.
For the upper right icon you will notice that it is nearly
identical to the upper right icon in example 1 except for the cost percentage
compared with the other 3 steps in the execution plan. The percentage has dropped to 40% even though
the cost is the same. This is because
other steps cost more this time due to the change in the query.
For the lower right icon you will notice that it only
executes once where as in example 1 it executed 25 times. This time it is a clustered index scan and
not a clustered index seek. This is because we changed the physical operation
from a nested loop to a hash match loop.
You’ll also notice that this step only pulled 18 rows where as in the
previous example 25 rows were pulled.
Again, specifying the hash match is what has caused the change. This hash match has grabbed the rows it needs
from both tables.
Now in the middle icon you will notice that it creates what
is called a hash table (comparable with a temp table) and fills it with the
results obtained in the upper right step.
After doing this, it matches each row in the hash table with a row from
the second table to get its result set.
This costs much more than the nested loop join used in example 1. This makes it clearer that there are multiple
authors for the same title and thus even though only 18 rows were pulled from
table titles they matched 25 records in the table titleauthors.
The last step simply displays the result and so has almost
no cost associated with it.
Conclusions
In this article I have shown how you can see and read
execution plans. Reading these can give
you a better understanding of how SQL Server executes the queries you design
and how it makes use of indexes. We have
examined two simple execution plans to learn how to read them and understand
them better. Examining execution plans
will help you see areas where a query is doing things you don’t expect. You can then make changes and rerun the query
to examine the new execution plan to determine if your changes have helped or
not.