One thing
that often amazes me is that many SQL Server developers do not actually measure
the performance of their queries. When you are working with a small site or home
project you might not see a big difference, but when implementing systems with
large amounts of users and high levels of traffic you can not just settle with
the fact that your query returns the expected result. You must also make sure
that your queries use the least amount of resources and execute as quickly as
possible. Sure, you can read articles and literature that describe how to write
queries that perform well, but you can still not be sure that they work in an
optimal way for your specific situation. After all, different schema designs,
amounts of data, hardware resources etc all affect how a query performs. And one
of the problems with SQL is that you can write the same query (i.e. that return
the same results) in many different ways, and the performance of these different
formulations will often differ as well. When I started investigating why some
developers did not compare the performance of their queries it became clear to
me that the main reason is that they do not know how to do this in an easy way.
Many of them thought that you needed external tools, more or less complicated,
to run against your server, and they did not have the time or inclination to
learn and try these. This article will describe a couple of much easier methods
of measuring performance of queries.
Time
The most simple way to measure performance is of course
to measure the time it takes to execute a query. If you have not already noticed
it, take a look in the status bar at the bottom-right corner of Query Analyzer.
There you will find a timer that shows how many hours:minutes:seconds it took
for a query (or rather the entire script) to execute. This is of course not a
very exact measurement. Most queries you want to measure will probably not take
more than a second to run, in a high-traffic environment they should probably
execute in milliseconds if they are correctly optimized. So you need a better
instrument to measure the execution time for a query.
Another,
and better way to measure the amount of time it takes for a query to execute is
to use the built-in function GETDATE(). Example 1 show how you can do this. The
example uses the command WAITFOR to make the query execution 'stand still' for
as long as we specify with DELAY. By first storing the present date and time
when the execution begins and then comparing this to what it is when the
execution is finished we can get a more exact measurement with milliseconds
specified. Note however that the time is only specified down to 1/300 of a
second (i.e. 3.33 ms). So, if a query takes 40 ms to execute that means
somewhere between 40-43 ms.
-- Example 1
DECLARE @start datetime, @stop datetime
SET @start = GETDATE()
WAITFOR DELAY '00:00:00.080' -- do not do anything for 80 ms
SET @stop = GETDATE()
SELECT 'The execution took ' + CONVERT(varchar(10), DATEDIFF(ms, @start, @stop)) + ' ms to finish'
STATISTICS TIME
The best way to measure time however is to use the
configuration setting SET STATISTICS TIME. The syntax for this is as shown
below:
SET STATISTICS TIME {ON | OFF}
When this
parameter is set to on the results pane of Query Analyzer will show statistics
for the time it took to execute a query. Note that if you are running QA in grid
mode you will need to switch to the Messages tab to see this. Example 2
demonstrates this:
-- Example 2
USE Northwind
GO
SET STATISTICS TIME ON
SELECT * FROM orders
In my
results pane I get the following text:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(830 row(s) affected)
SQL Server Execution Times:
CPU time = 30 ms, elapsed time = 500 ms.
At first
glance this might seem complicated to understand, but more or less the only
thing you need to do is to look for the row with SQL Server Execution Times that
is printed right after the text that specifies the number of affected rows.
Above this you can see the time it took to parse and compile the query, but that
time is not what we are interested in here. Most of the times this will be 0 ms
if you run the same query several times in a row since the execution plan will
already be cached. As said earlier, what we are looking for is the time it took
to execute the query. In the example above it needed 30 ms of CPU time, but the
total amount of time needed was 500 ms (try replacing the WAITFOR statement in
example 1 with the select statement in example 2 and see if GETDATE gives you
the same measurement). But if CPU time was only 30 ms, then where are the
remaining 470 ms? The answer for this is I/O.
STATISTICS IO
As you probably know I/O is short for Input/Output. You
could say that it means reading/writing resources, and normally you mean
reading/writing from/to disk or memory. Very simply described, SQL Server needs
to have the data pages containing the data to return to the client stored in
memory (RAM). If they are not already cached there they must first be read from
disk where they are physically stored and then placed in memory, from where they
can then be returned to the client. The data pages will then be cached in memory
for an unspecified time, which depending on several factors can range from 0 - ~
(indefinitely). Therefore a query might need more time to execute the first time
you execute it, and because of this you should always execute the query a couple
of times when measuring performance for it.
It is not
only the time it takes for a query to execute that is interesting when measuring
performance. Equally important (and often even more) is the amount of system
resources that is needed to execute it. Since I/O is normally the slowest part
of a query, especially if physical disk access is needed, it is very important
to know the amount of I/O resources needed to execute it. The way to measure
this is to use another configuration setting called SET STATISTICS IO. The
syntax for this is similar to that of SET STATISTICS TIME:
SET STATISTICS IO {ON | OFF}
The result however is different. Again, look in the text
of the results pane in QA. I executed example 2 a couple of times and the result
is shown below:
Table 'Orders'. Scan count 1, logical reads 22, physical reads 0, read-ahead reads 0.
First we
have the table name. Then comes the number of time this table was scanned, or
rather accessed, to fetch the result of the query. The next parts tells us how
many pages (data and/or index) that were read from the cache in memory to fetch
the results, how many pages that were read from disk and the final number called
read-ahead reads shows how many pages were placed into the cache for the query.
The numbers you should normally look at is logical and physical reads plus scan
count, and they should all of course be as low as possible. It might be better
to have 100 logical reads than 10 physical reads since it is faster to read from
memory, but generally speaking they should both be as low as possible. If you
execute a query a couple of times physical reads will often be 0 since the data
pages will already be cached after the first execution. Use these numbers to
compare the resources needed when executing the same query formulated in
different ways.
Other
tools
With the above mentioned tools you have a good way of
deciding which of several different versions of a query you should use to get
the results you want in an optimal way. There are lots of other tools available
as well, but I will not discuss them in this article. If you want to experiment
with them yourself I would recommend you take a look at the following tools:
-
Show execution
plan: by pressing Ctrl-K you get an extra tab when executing queries in QA.
This tab shows a graphical representation of the execution plan used by SQL
Server to execute your query. There is lots of information in Books Online
about how to use the information shown there, as well as articles online.
-
SET STATISTICS
PROFILE: This configuration option gives you a textbased variant of the
execution plan.
-
SET SHOWPLAN_ALL
and SET SHOWPLAN_TEXT: These options both present information regarding the
resources and execution plan that would be used to execute the query, without
actually executing it.
-
Profiler, Sysmon
(Performance Monitor) and other external applications: Finally there are
several external applications that can be used to measure and show different
events and measurements in SQL Server and the system under execution.
Profiler, a tool in the SQL Server client tools pack, connects to SQL Server
and log all kind of diffenent events that occur, and Sysmon can of course be
used to log measurements for a huge amount of performance counters both for
SQL Server and the system as a whole.