Monitoring Performance

DBAs are in charge of performance on production SQL servers and, sure, they hate to hear complains from end-users that the system is slow. But the truth is that often the complains are not groundless. As long as for developers performance is a last thing to care about after delivering a required functionality, providing wonderful interface, debugging and loud celebrating a new release shipped to production, then we have what we have. As a result, managers use to ask “What’s going on on the server?” and DBAs really need to have a clear and accurate answer.

Getting active processes

For myself I created a stored procedure sp_ActiveProcesses (Listing 1) showing what's running on the SQL Server. The sp has one parameter - time interval - to retrieve snapshot of running processes. Default is 5 sec. It can be decreased to make the snapshot more instantaneous or increased to catch consumers for a longer term. A process to hit into the snapshot must to be running at the beginning and at the end of the period. It's why there is no sense to make the interval too long (though maximum allowed is 59 sec).

I run the sp so often that I had to create a shortcut in my Query Analyzer (Tools/Customize/Custom ...). Install the sp, set the shortcut assigning the sp, say, to Ctrl-4, press these two buttons and you've got the picture (the list of processes wrapped to fit on the page):

ProcessId	TotalCPU	CPU_ConsumedInTheTimeFragment	TotalPhysical_IO	IO_InTheTimeFragment	...
----------	---------	------------------------------	-----------------	---------------------	...
55	239	109	21	10	...
85	31328	31	7521	536	...
88	5678	1001	795	164	...

Hostname	ApplicationName	NT_LoginName	DatabaseName	SPIDBuffer
---------	----------------	--------------	------------	-----------------------
BillS	MSP	Company\Bill	MSP	GetContacts
KirkA	MSP	Company\Kirk	MSP	ReassignTimeApprover
KimN	MSP	Company\Kim	MSP	InsertExpense

TheFragmentDuration	NumberOfCPUs	SUM_CPU_Consumed	SUM_Physical_IO_Committed
------------------------	---------------	----------------	---------------------------------
5123	2	1141	710

Maximum SUM_CPU_Consumed can be TheFragmentDuration times NumberOfCPUs. In the example above it is 10246 milliseconds.

Just one note about accuracy of the numbers being showed by the sp. Microsoft Knowledge Base Article - 309377 says: In Microsoft SQL Server 2000 (all editions) CPU time for a particular server process ID (SPID) may not accumulate correctly.

It is explained as: SQL Server maintains a pool of worker threads that execute the tasks given to it. SQL Server may assign a different thread from this pool to a given SPID to execute a new query batch. When SQL Server assigns a different thread to a SPID, SQL Server does not properly calculate the CPU time to indicate the accumulated CPU time up to that point for the SPID.

Microsoft has confirmed this to be a problem in SQL Server 2000. This problem was first corrected in Microsoft SQL Server 2000 Service Pack 2.

The wrong calculation almost never happens to user processes, which are most important for us, even there is no SP2 for SQL Server 2000 installed. In contrary, I saw lots of such cases with system processes, for instance, replication agents.

Getting blocked processes

There are two kinds of SQL Server processes in the scope of DBA's concern:

1. Consumers

2. Long-runners

Consumers eat up server resources. By identifying and optimizing the consuming codes you can free some room for other processes. Long-runners are codes having outstanding duration. Exactly they make end-users uncomfortable. However for developers to provide optimization it is more important "what consumes more" than "what works longer". Because sometimes innocent processes take minutes being blocked by other processes (or maybe not blocked, but just waiting for resources - CPU, I/O to be released). And developers can do nothing about the cases. Not these codes but rather those blocking/consuming codes must be optimized. In other words: “consumption” indicates how healthy is the code, “duration” indicates how healthy is the system (hardware, surrounding processes etc).

To identify processes blocked by others I created sp_BlockedProcesses (Listing 2). Assign it to the shortcut, say, Ctrl-5, press the buttons and here we go:

BlockedSPID	BlockedBuffer	BlockingSPID	BlockingBuffer	waitresource	BlockedHostname	...
-----------	-------------	------------	--------------	------------	---------------	...
21	GetLateTasks	65	GetImage	21	KimN	...
5	SetStatus	65	GetImage	21	JasonC	...

I bet you can recall cases when some simple code was started and a quick reply was expected, but it seemed was hanging. It is quite probable that by pressing Ctrl-5 you'd see what is the matter.

Sending emails with top consumers

Both sp_ActiveProcesses and sp_BlockedProcesses are instantaneous. Of course a DBA needs overall reports showing top consumers and long-runners. I can share how I organized it in the company I work for.

1. A job was scheduled to run every morning to start a trace on production servers. One of the trace parameters specifies time to stop the trace. The job runs at 8:30 AM and the trace stops itself at 5:30 PM, when the peak of user activity on production servers is over. The name of the executed stored procedure: spTraceBuild (Listing 3).

spBuildTrace is based on a very useful stored procedure build_trace from the Microsoft Knowledge Base Article Q283790. I did the following minor modification:

a. Added server name and current date to the trace file name

b. Added error handling. If a mistake was done (for example, an option value is incorrect or a date to stop the trace has been already expired) and the trace wasn't created, it's nice to get a message.

c. Changed the code to expect only time to stop the trace (instead of date/time) - 5:30 PM in my case. I.e. you don't need ever to modify ActivityTrace.ini. The trace will be always stopped at the specified time on the same day it was started.

spTraceBuild gets configuration data from a text file named ActivityTrace.ini. It contents could be like:


@tracefile	=	\\Process02\D$\Program Files\Microsoft SQL Server\MSSQL\LOG\Trace
@maxfilesize	=	15000
@stoptime	=	5:30
@options	=	2
@events	=	10,12
@columns	=	1,3,12,13,14,16,17,18
@filter1	=	10, 0, 7, N'SQL Profiler'

Apparently you need to modify at least the first parameter, @tracefile, to make it appropriate.

Two types of events give us consumption numbers:

10 - RPC Completed

12 - Batch Completed

2. Another job was scheduled to run every night to absorb the trace file and process it, i.e. to insert the trace data into a SQL Server table and aggregate the information. Why to collect the data into a file to bulkcopy them afterward into a table instead of collecting them directly into the table? Firstly, collecting trace data into a file works faster, secondly, you cannot run a trace programmatically into a table as you do when starting a trace from Profiler. I created the following aggregations:

- top CPU consumers

- top long-runners

3. The processing job sends an e-mail containing the reports to managers of development. Every morning development managers can find the "Top consumers" report in their Inbox. That is important as long as performance is a serious issue in my company. You can schedule the trace and processing/reporting jobs to run once a week, for example, on Tuesday, if user activity and workload do not differ from day to day.

The name of the processing/reporting stored procedure is spProcessTrace (Listing 4). An important part of the code is a UDF fnExtractSPNameFromTextData (Listing 5). Definitely, you can aggregate data from a trace only if codes running against the server are named codes as stored procedures are. Ad hoc queries will be out of the scope. However, I do not think any interactive, frequently executed codes can be running as ad hoc queries, which would need compilation on the fly. Therefore, all real top consumers should be represented in the report.

Run the sp from a scheduled job as:

EXEC spProcessTraceFile

@ServerName = 'MyProductionServerName',

@ReportDate = null, -- date the trace file was created on (default - current)

@TraceFilePath = '\\MyProcessingServerName\C$\Program Files\Microsoft SQL Server\MSSQL\LOG\',

@recipients = 'Manager1@MyCompany.com;Manager2@MyCompany.com'

The value specified in @TraceFilePath should match to @tracefile in ActivityTrace.ini.

Resulting report for top CPU consumers looks like:

Top Process02 CPU consumers for Feb 22, 2003:

SP	TimesExecuted	TotalCPU	AverageExecTime	MinCPU	MaxCPU
------------------------	---------------	------------	------------------	-------------	--------
RemoveCoManager	7	615531	87933	0	595062
UpdateProspect	110	474517	4313	2328	29062
AddStandardTaskTime	673	457651	680	594	829
TaskbyResource	2480	130684	52	0	6656
GetAssetTypes	5318	88720	16	0	78
SubmitExpenseById	1583	63696	40	0	719
BillingRatesByBillingOffices	110	63164	574	32	1312
SessionCleanUp	1231	56099	45	0	19406
CheckSummaryTask	230	16443	71	46	110
RollupSummaryTask	207	15844	76	0	281
CreateBatchNumber	2720	14146	5	0	32
RejectTime	1345	13396	9	0	79
DeleteBillingRole	12	12108	1009	578	1390
ProjectSummary	143	10003	69	15	172
GetApprovedInvoices	12	9767	813	718	1032
ProgressProject	228	8322	36	0	94
AddSubProject	280	7875	28	0	265
InsertExpense	7	7422	1060	0	5906
LoadCustomer	16	6953	434	312	688
PercentOfContractIncurred	164	5790	35	15	47
GetTaxes	8	5469	683	640	828
RolesFeatures	6	5330	888	750	1016
GetWorkflowTypes	246	4519	18	0	78
GetDraftInvoices	250	4439	17	0	63

Consider locations of the codes and files as the following:

spBuildTrace	msdb on a production server
spProcessTrace	DB Traces on a processing server
fnExtractSPNameFromTextData	DB Traces on a processing server
ActivityTrace.ini	C:\Program Files\Microsoft SQL Server\MSSQL\Binn\
	on the production server

Activity Graph

The Results Pane of Query Analyzer does not allow us to represent results graphically, but spActivityGraph (Listing 6) challenges this limitation. The stored procedure uses the trace table created by spProcessTrace. spActivityGraph shows how processes running on the SQL Server interfere with each other. Looking at the graph you can see peak-times, concurrency and how this or that process is taking longer than usually being surrounded by a tough company of other processes:

StartTime	Duration	Text	11:3011:3911:4911:59
11:38:42	12033	InsertExpense	--
11:39:10	6220	GetAssetTypes	--
11:40:00	122810	GetRequestsToApprove	--------------
11:40:06	52826	GetMonthlyRevenue	------
11:40:11	30516	GetApproverTimes	----
11:40:16	30527	GetApproverTimes	----
11:40:17	30533	GetApproverTimes	----
11:40:25	30530	PopulatePlanFact	----
11:40:28	30543	ProjectSummary	----
11:40:28	30516	LoadLeadByResource	----
11:40:30	30513	ProjectSummary	----
11:40:36	11736	SetLockAllTask	--
11:40:38	21623	InvoiceByClient	--
11:40:42	103116	PopulatePlanFact	------------
11:40:44	15780	GetDraftInvoices	--
11:40:49	10310	InsertAd	--
11:40:50	9513	ModifyCodeUpdatedExpense	--
11:40:51	8280	DeleteBillingRole	--
11:40:59	60966	ProjectSummary	--------
11:41:04	30516	AutoEscalate	----
11:41:07	30446	GetLicenceUpdate	----
11:41:21	5046	GetImageBatch	-

spActivityGraph has 6 input parameters:


@TraceTable	-	name of the trace table created in spProcessTrace
@TraceStart	-	start time of the period
@TraceEnd	-	end time of the period
@SPNameLayoutLen	-	width of the column with SP names in the report. Default - 20.
@DurationLayoutLen	-	width of the column with duration values in the report. Default - 6
@LayoutWidth	-	width of the report. Default - 115 symbols (wrapping would make the picture totally messed up).

I would suggest to build graphs for 30 minutes intervals. If you increase the interval trying to cover the entire day or at least a half of the day in one graph, duration of one unit (shown by dash '-') will be also increased and even processes with duration more than 10 sec will be missed in the graph. What if nonetheless you would like to see the activity for the entire day? No problem: the stored procedure spShowActivityGraphByChunks (Listing 7) will give you the full day picture divided into 0.5-hour pieces. The only 2 mandatory input parameters for the stored procedure (@ServerName, @ReportDate) serve to identify a trace table to work with.

Conclusion

Scripts referenced in the article are available here.

Stored procedures showing instantaneous and overall performance reports give us a clear picture of user activity on production SQL Servers and help us to find ways to make the performance better.

Getting active processes

Getting blocked processes

Sending emails with top consumers

Activity Graph

Conclusion

Rate

Share

Categories

Share

Rate

Monitoring Performance

Getting active processes

Getting blocked processes

Sending emails with top consumers

Activity Graph

Conclusion

Rate

Share

Categories

Share

Rate

Related content

What is the search order for Procedures prefixed sp_?

Understanding Execution Plans Part 1

Using Interface-Based Programming Techniques in SQL Server

Making Dynamic Queries Static

SQL Server 2000 Table Hints

Cookies on SQLServerCentral