SQLServerCentral Article

Sneak Preview: Start to Finish Guide to SQL Server Performance Monitor

,

Sneak Preview: Start to Finish Guide to SQL Server Performance Monitoring

One

of the nice things about a book store is the ability to go, grab a book, and

look through it. Often before I purchase a book, I'll do just that. If I want to

order a book on-line because of a deal and I can subdue that "must have it

NOW" impulse in myself, I'll still make a drive down to my favorite

bookstores and see if I can find it on the shelf. This is much more difficult

with an e-Book.

To try and give more of that "browse the bookshelf"

experience, I've prepared some excerpts of the e-Book Start to Finish Guide

to SQL Server Performance Monitoring. I've made selections of several key

sections of the book. Here are the excerpts I've chosen:

If you are interested in purchasing the e-Book through SQL Server Central,

please visit the NetImpress

page for SQLServerCentral.com customers.

Excerpt 1: From Performance Counters

SQL Server Counters

SQL Server Counters Quick Reference

SQLServer:Buffer

Manager\Buffer Cache Hit Ratio

% of data pages found in memory

SQLServer:Cache

Manager\Cache Hit Ratio

Ratio of execution cache hits versus total number

of lookups

SQLServer:General

Statistics\User Connections

Number of currently connected users

SQLServer:Latches\Average

Latch Wait Time (ms)

Average time to obtain a latch

SQLServer:Latches\Latch

Waits/sec

Number of latches established per second

SQLServer:Locks\Average

Wait Time (ms)

Average time to obtain a lock

SQLServer:Locks\Lock

Timeouts/sec

Average # of lock attempts expiring without a

lock being established

SQLServer:Locks\Number of

Deadlocks/sec

Average # of deadlocks occurring per second

SQLServer:Memory

Manager/Total Server Memory

Amount of memory being utilized by SQL Server

SQLServer:SQL

Statistics\SQL Re-Compilations/sec

Average # of execution plans being recompiled per

second

Just as the hardware has counters, so does SQL

Server. There are specific counters

we use in conjunction with the hardware counters to determine how well SQL

Server is performing on a given system. We

also have to realize that the way SQL Server operates is fundamentally different

from a file server, web server, or domain controller. So there may be times when we’ll see processor utilization

near 100 % for a given time period (for instance, a particularly complex DTS

package which does our data warehousing) and we have to take those things into

account.

SQLServer:Buffer Manager\Buffer Cache Hit Ratio

Threshold: < 90 %

The Buffer Cache Hit Ratio represents the percentage of data pages

found in memory. When SQL Server

finds the data in memory, it doesn’t have

to go to the physical disk to retrieve the data.

Since retrieving data from a hard disk is significantly slower than

retrieving memory from physical memory, we want to pull data from memory as

often as possible. As a result,

we’re looking for as close to 100% for a Buffer Cache Hit Ratio as possible.

The threshold is therefore understandably high.

If we start dipping below 90 %, it’s time to consider extra RAM.

Others feel in an OLTP environment this number should be greater than

95%. Still others are looking for

98%. I like to consider 90% a

minimum.

SQLServer:Cache Manager\Cache Hit Ratio

Threshold: Varies (<90% for in-house developed applications)

This corresponds to the ratio between cache hits of execution plans versus

lookups. We want this number as

close to 100% regardless of the instance, but I’ve not seen any hard and fast numbers for a threshold. For applications we have built the data access layer for, we look for

numbers around 90 %. Often, for

third-party applications there is little we can do, as the data access layer is

hard-coded/not subject to modification.

There are two reasons why the number could be low:

  • An execution plan doesn’t exist
  • The execution plan couldn’t be located on the first pass

The first possibility can occur if we’re short on memory or if our stored

procedures, triggers, and ad hoc queries require recompiling, making an

execution plan of little use. The

second tends to occur because either we’ve not specified an owner and the user

isn’t dbo or we’ve used the sp_ prefix and the stored procedure is in a user

database. I’ll discuss more on

these issues later.

Excerpt

2: A Real World Example

In The Real World

A

real-world example I can cite occurred towards the end of a business day (as

always)and we were all looking to get home.

An application group was working late on changes to a system written

within the organization. A

development server hosting SQL Server suddenly became very slow.

Queries normally running in less than a second were taking a minute or

more. Immediately the DBA team

popped up System Monitor and took a look at the processor and memory counters as

well as the ones in SQL Server for locks and user connections.

What we saw was the number of user connections was average for the

development server (about 15-17) but lock waits were higher than normal.

We also saw the memory counters showed no problems but processor

utilization was near 100 %.

Since

the development server hosted a couple of other third-party applications that

provided additional functionality to the application, we wondered if one of them

was the culprit. Keep in mind that

during this time the application team was screaming for the heads of the DBAs

for allowing SQL Server to get in such bad shape. We took a look at the cache hit ratio and the buffer cache

hit ratio. Both were normal.

We

tried to get to SQL Server using Query Analyzer and making the connection took

considerably longer than normal. Executing

a the sp_who2 system stored procedure

to see what activity was going on in SQL Server, we saw there was very little…

though it took nearly two minutes for the system stored procedure to return this

information. At this point, we had

enough evidence to say it most likely was not SQL Server.

But we needed one more counter to confirm what we believed.

A counter object we haven’t talked about is Process.

This object has an instance for every process running on the system.

We added the instance for sqlservr (the name of the process for SQL

Server) for % processor utilization and saw it was low (less than 10 %). This

vindicated the DBA team.

We

were off the hook, but the problem still existed, so we continued to investigate

by checking out all the processes. Sure

enough, one of the third-party applications was continually registering > 95

%. We used a tool to kill the

process remotely and watched the processor utilization for the entire system

drop immediately. Suddenly the

queries started resolving in the expected times.

Problem solved! After a

quick write-up, we headed home.

Excerpt 3: Using Profiler to Find Deadlocks and Lock

Timeouts

Locks

Acquiring and releasing locks is a typical part of

any multi-user database system and SQL Server is no exception.

SQL Server allows us to track locks being acquired and released in

events, but in a SQL Server installation with any kind of concurrent usage,

these events will occur frequent enough to be of relatively little value.

There are other events, however, that are of great use.

  • Lock:Deadlock
  • Lock:Deadlock Chain
  • Lock:Timeout

The first two, Lock:Deadlock and Lock:Deadlock

Chain, are very useful if we’re trying to track down what statements are

causing deadlocking issues. While

it is well-understood database tables and views should always be accessed in the

same order, in larger environments this rule can be violated quite easily when

multiple developers are working to create a usable system.

By turning on these events, we can report these conflicts and get the

information back to our development teams.

To create a deadlock event on your own, open two

connections in Query Analyzer to a server.

For our example, we’ll use the Northwind database.

In one connection, we’ll enter the following code:

-- BEGIN TRAN

/*

UPDATE Customers

SET CompanyName = 'Doe, Inc'

*/

-- SELECT * FROM Employees

-- ROLLBACK TRAN

In the other connection, we’ll place the following code:

-- BEGIN TRAN

/*

UPDATE Employees

SET LastName = 'Doe'

*/

-- SELECT * FROM Customers

-- ROLLBACK TRAN

The idea here is we’re going to have one connection

create an exclusive table lock on the Customers table. The other connection will create an exclusive table lock on

the Employees table. Then the first

connection will try and select from the Employees table.

Since the second connection has an exclusive table lock and the

transaction isn’t committed, it’ll be blocked.

Then we’ll have the second connection try and select from the Customers

table. It too will get blocked and

SQL Server will realize that we’ve run into a deadlock situation.

Let’s step through each step. First

we highlight and execute just BEGIN TRAN for both connections, as in figure 54.

Figure 54: Executing BEGIN TRAN on Both

Connections

Next we’ll execute both UPDATE statements to create

the exclusive table locks (Figure 55).

Figure 55: Update Statements Forcing

Table Locks

The final step to is to execute the SELECT

statements. This will force the

deadlock and one of the connections will have its operation terminated and the

transaction will automatically rolled back (Figure 56).

The other connection, since the table lock is removed, will be able to

complete execution of the SELECT statement.

As a matter of cleanup, the ROLLBACK TRAN statement will need to be

executed on the connection (the first connection in figure 56) whose SELECT

statement completed.

Figure 56: SELECT Statements forcing the

Deadlock

We’ll also see

the Lock:Deadlock and Lock:Deadlock Chain events occur in our

Profiler trace in Figure 57. In

this particular example, I’ve left the Exception event in the events to

track. As a result, we see it

showing up as well.

Figure 57: Deadlocking Recorded in

Profiler

Notice the Lock:Deadlock event corresponds to

the connection (SPID) chosen as the deadlock victim. In the case of this example, the SPID is 51 and we can scroll

back through our Profiler trace to see the T-SQL statements leading to a

deadlock. We also have the

statements executing around the same time as our deadlock culprit.

In this particular case, SPID 51 was trying to do a SELECT * FROM Customers when the deadlock

occurred. We would take a closer

look at SPID 51’s commands and compare the two connections.

The last event, Lock:Timeout, isn’t as

useful because by default, SQL Server doesn’t register a lock timeout unless

the client application sets one. A

program can specify a lock timeout by SET LOCK_TIMEOUT.

Otherwise, SQL Server and the application will wait indefinitely for the

locks to clear. If we do set a lock

timeout and the threshold is reached, we’ll see the event in the Profiler

trace shown in Figure 58.

Figure 58: Lock Timeout

Concluding Remarks

Hopefully these three excerpts have whetted your appetite for the Start to

Finish Guide to SQL Server Performance Monitoring. There's a lot more in the

book, including how to capture the data, how to transform it into a readable

format, and how to use the data to spot potential issues ahead of time. I also

cover a step-by-step methodology on performance monitoring, my attempt at trying

to put into a "science" what some consider an "art." The

last thing I'll leave you with is a condensed Table of Contents:

Condensed Table of Contents:

  • Performance Monitoring Tools
  • A Straight Forward Methodology
  • Performance Logs and System Monitor
    • Monitoring Locally vs. Remotely
    • Real-Time versus "Unattended" Monitoring
    • Setting Up a Performance Log
    • Choosing the Right Counters
    • Spot Checking in an Emergency
    • Baselines for the Long Haul
    • Responding to Trends
  • SQL Profiler
    • Creating a Profiler Trace
    • Typical Events
    • Stored Procedures, Caching, and Recompiles
    • Converting Trace Files
  • Putting it All Together

Rate

4 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

4 (1)

You rated this post out of 5. Change rating