Monitoring Performance


DBAs are in charge of performance on production SQL servers and, sure, they hate to hear complains from end-users that the system is slow. But the truth is that often the complains are not groundless. As long as for developers performance is a last thing to care about after delivering a required functionality, providing wonderful interface, debugging and loud celebrating a new release shipped to production, then we have what we have. As a result, managers use to ask “What’s going on on the server?” and DBAs really need to have a clear and accurate answer.

Getting active processes

For myself I created a stored procedure sp_ActiveProcesses (Listing 1) showing what's running on the SQL Server. The sp has one parameter - time interval - to retrieve snapshot of running processes. Default is 5 sec. It can be decreased to make the snapshot more instantaneous or increased to catch consumers for a longer term. A process to hit into the snapshot must to be running at the beginning and at the end of the period. It's why there is no sense to make the interval too long (though maximum allowed is 59 sec).

I run the sp so often that I had to create a shortcut in my Query Analyzer (Tools/Customize/Custom ...). Install the sp, set the shortcut assigning the sp, say, to Ctrl-4, press these two buttons and you've got the picture (the list of processes wrapped to fit on the page):

ProcessId TotalCPU CPU_ConsumedInTheTimeFragment TotalPhysical_IO IO_InTheTimeFragment ...
---------- --------- ------------------------------ ----------------- --------------------- ...
55 239 109 21 10 ...
85 31328 31 7521 536 ...
88 5678 1001 795 164 ...

... Hostname ApplicationName NT_LoginName DatabaseName SPIDBuffer
--------- ---------------- -------------- ------------ -----------------------
BillS MSP Company\Bill MSP GetContacts
KirkA MSP Company\Kirk MSP ReassignTimeApprover
KimN MSP Company\Kim MSP InsertExpense

TheFragmentDuration NumberOfCPUs SUM_CPU_Consumed SUM_Physical_IO_Committed
------------------------ --------------- ---------------- ---------------------------------
5123 2 1141 710

Maximum SUM_CPU_Consumed can be TheFragmentDuration times NumberOfCPUs. In the example above it is 10246 milliseconds.

Just one note about accuracy of the numbers being showed by the sp. Microsoft Knowledge Base Article - 309377 says: In Microsoft SQL Server 2000 (all editions) CPU time for a particular server process ID (SPID) may not accumulate correctly.

It is explained as: SQL Server maintains a pool of worker threads that execute the tasks given to it. SQL Server may assign a different thread from this pool to a given SPID to execute a new query batch. When SQL Server assigns a different thread to a SPID, SQL Server does not properly calculate the CPU time to indicate the accumulated CPU time up to that point for the SPID.

Microsoft has confirmed this to be a problem in SQL Server 2000. This problem was first corrected in Microsoft SQL Server 2000 Service Pack 2.

The wrong calculation almost never happens to user processes, which are most important for us, even there is no SP2 for SQL Server 2000 installed. In contrary, I saw lots of such cases with system processes, for instance, replication agents.

Getting blocked processes

There are two kinds of SQL Server processes in the scope of DBA's concern:

1. Consumers

2. Long-runners

Consumers eat up server resources. By identifying and optimizing the consuming codes you can free some room for other processes. Long-runners are codes having outstanding duration. Exactly they make end-users uncomfortable. However for developers to provide optimization it is more important "what consumes more" than "what works longer". Because sometimes innocent processes take minutes being blocked by other processes (or maybe not blocked, but just waiting for resources - CPU, I/O to be released). And developers can do nothing about the cases. Not these codes but rather those blocking/consuming codes must be optimized. In other words: “consumption” indicates how healthy is the code, “duration” indicates how healthy is the system (hardware, surrounding processes etc).

To identify processes blocked by others I created sp_BlockedProcesses (Listing 2). Assign it to the shortcut, say, Ctrl-5, press the buttons and here we go:

BlockedSPID BlockedBuffer BlockingSPID BlockingBuffer waitresource BlockedHostname ...




















I bet you can recall cases when some simple code was started and a quick reply was expected, but it seemed was hanging. It is quite probable that by pressing Ctrl-5 you'd see what is the matter. 

Sending emails with top consumers

Both sp_ActiveProcesses and sp_BlockedProcesses are instantaneous. Of course a DBA needs overall reports showing top consumers and long-runners. I can share how I organized it in the company I work for.

1. A job was scheduled to run every morning to start a trace on production servers. One of the trace parameters specifies time to stop the trace. The job runs at 8:30 AM and the trace stops itself at 5:30 PM, when the peak of user activity on production servers is over. The name of the executed stored procedure: spTraceBuild (Listing 3).

spBuildTrace is based on a very useful stored procedure build_trace from the Microsoft Knowledge Base Article Q283790. I did the following minor modification:

a. Added server name and current date to the trace file name

b. Added error handling. If a mistake was done (for example, an option value is incorrect or a date to stop the trace has been already expired) and the trace wasn't created, it's nice to get a message.

c. Changed the code to expect only time to stop the trace (instead of date/time) - 5:30 PM in my case. I.e. you don't need ever to modify ActivityTrace.ini. The trace will be always stopped at the specified time on the same day it was started.

spTraceBuild gets configuration data from a text file named ActivityTrace.ini. It contents could be like:

@tracefile = \\Process02\D$\Program Files\Microsoft SQL Server\MSSQL\LOG\Trace
@maxfilesize = 15000
@stoptime = 5:30
@options = 2
@events = 10,12
@columns = 1,3,12,13,14,16,17,18
@filter1 = 10, 0, 7, N'SQL Profiler'

Apparently you need to modify at least the first parameter, @tracefile, to make it appropriate.

Two types of events give us consumption numbers:

10 - RPC Completed

12 - Batch Completed

2. Another job was scheduled to run every night to absorb the trace file and process it, i.e. to insert the trace data into a SQL Server table and aggregate the information. Why to collect the data into a file to bulkcopy them afterward into a table instead of collecting them directly into the table? Firstly, collecting trace data into a file works faster, secondly, you cannot run a trace programmatically into a table as you do when starting a trace from Profiler. I created the following aggregations:

- top CPU consumers

- top long-runners

3. The processing job sends an e-mail containing the reports to managers of development. Every morning development managers can find the "Top consumers" report in their Inbox. That is important as long as performance is a serious issue in my company. You can schedule the trace and processing/reporting jobs to run once a week, for example, on Tuesday, if user activity and workload do not differ from day to day.

The name of the processing/reporting stored procedure is spProcessTrace (Listing 4). An important part of the code is a UDF fnExtractSPNameFromTextData (Listing 5). Definitely, you can aggregate data from a trace only if codes running against the server are named codes as stored procedures are. Ad hoc queries will be out of the scope. However, I do not think any interactive, frequently executed codes can be running as ad hoc queries, which would need compilation on the fly. Therefore, all real top consumers should be represented in the report.

Run the sp from a scheduled job as:

EXEC spProcessTraceFile

@ServerName = 'MyProductionServerName',

@ReportDate = null, -- date the trace file was created on (default - current)

@TraceFilePath = '\\MyProcessingServerName\C$\Program Files\Microsoft SQL Server\MSSQL\LOG\',

@recipients = ';'

The value specified in @TraceFilePath should match to @tracefile in ActivityTrace.ini.

Resulting report for top CPU consumers looks like:

Top Process02 CPU consumers for Feb 22, 2003:

SP TimesExecuted TotalCPU AverageExecTime MinCPU MaxCPU
------------------------ --------------- ------------ ------------------ ------------- --------
RemoveCoManager 7 615531 87933 0 595062
UpdateProspect 110 474517 4313 2328 29062
AddStandardTaskTime 673 457651 680 594 829
TaskbyResource 2480 130684 52 0 6656
GetAssetTypes 5318 88720 16 0 78
SubmitExpenseById 1583 63696 40 0 719
BillingRatesByBillingOffices 110 63164 574 32 1312
SessionCleanUp 1231 56099 45 0 19406
CheckSummaryTask 230 16443 71 46 110
RollupSummaryTask 207 15844 76 0 281
CreateBatchNumber 2720 14146 5 0 32
RejectTime 1345 13396 9 0 79
DeleteBillingRole 12 12108 1009 578 1390
ProjectSummary 143 10003 69 15 172
GetApprovedInvoices 12 9767 813 718 1032
ProgressProject 228 8322 36 0 94
AddSubProject 280 7875 28 0 265
InsertExpense 7 7422 1060 0 5906
LoadCustomer 16 6953 434 312 688
PercentOfContractIncurred 164 5790 35 15 47
GetTaxes 8 5469 683 640 828
RolesFeatures 6 5330 888 750 1016
GetWorkflowTypes 246 4519 18 0 78
GetDraftInvoices 250 4439 17 0 63

 Consider locations of the codes and files as the following:


msdb on a production server


DB Traces on a processing server


DB Traces on a processing server


C:\Program Files\Microsoft SQL Server\MSSQL\Binn\

on the production server

Activity Graph

The Results Pane of Query Analyzer does not allow us to represent results graphically, but spActivityGraph (Listing 6) challenges this limitation. The stored procedure uses the trace table created by spProcessTrace. spActivityGraph shows how processes running on the SQL Server interfere with each other. Looking at the graph you can see peak-times, concurrency and how this or that process is taking longer than usually being surrounded by a tough company of other processes:

StartTime Duration Text 11:3011:3911:4911:59
11:38:42 12033 InsertExpense


11:39:10 6220 GetAssetTypes


11:40:00 122810 GetRequestsToApprove


11:40:06 52826 GetMonthlyRevenue


11:40:11 30516 GetApproverTimes


11:40:16 30527 GetApproverTimes


11:40:17 30533 GetApproverTimes


11:40:25 30530 PopulatePlanFact


11:40:28 30543 ProjectSummary


11:40:28 30516 LoadLeadByResource


11:40:30 30513 ProjectSummary


11:40:36 11736 SetLockAllTask


11:40:38 21623 InvoiceByClient


11:40:42 103116 PopulatePlanFact


11:40:44 15780 GetDraftInvoices


11:40:49 10310 InsertAd


11:40:50 9513 ModifyCodeUpdatedExpense


11:40:51 8280 DeleteBillingRole


11:40:59 60966 ProjectSummary


11:41:04 30516 AutoEscalate


11:41:07 30446 GetLicenceUpdate


11:41:21 5046 GetImageBatch


 spActivityGraph has 6 input parameters:

@TraceTable - name of the trace table created in spProcessTrace
@TraceStart - start time of the period
@TraceEnd - end time of the period
@SPNameLayoutLen - width of the column with SP names in the report. Default - 20.
@DurationLayoutLen - width of the column with duration values in the report. Default - 6
@LayoutWidth - width of the report. Default - 115 symbols (wrapping would make the picture totally messed up).

I would suggest to build graphs for 30 minutes intervals. If you increase the interval trying to cover the entire day or at least a half of the day in one graph, duration of one unit (shown by dash '-') will be also increased and even processes with duration more than 10 sec will be missed in the graph. What if nonetheless you would like to see the activity for the entire day? No problem: the stored procedure spShowActivityGraphByChunks (Listing 7) will give you the full day picture divided into 0.5-hour pieces. The only 2 mandatory input parameters for the stored procedure (@ServerName, @ReportDate) serve to identify a trace table to work with.


Scripts referenced in the article are available here.

Stored procedures showing instantaneous and overall performance reports give us a clear picture of user activity on production SQL Servers and help us to find ways to make the performance better.


5 (1)




5 (1)