Using Server Side Traces for Dynamic Performance Evaluation

Introduction:

The query execution plan is probably the first tool that DBAs and T-SQL programmers resort to to solve the performance issues of their SQL Server applications. In my opinion, the biggest problem with this approach is the isolated context of the performance information gathered. It looks only at one query at a time. A working SQL Server is a dynamic system bombarded by hundreds or thousands requests at the same time and execution time of one query may impact another’s, especially if they access the same tables. In some cases the only way to further improve one query’s performance is to modify a different query.

The technique described in this article may not be a viable solution for every SQL Server deployment but, because it was successful in solving several technical issues in our implementation, I would like to share it with you. Before I begin, I would like to describe the environment we are operating within.

Our System:

We use SQL Server Express Edition 2005 on a medical device. This simplifies things on one end – there is very limited and predictable number of clients calling the database and the database is very small by any standard. On the other hand, because the database is a part of a physical system, the response time of the query in some cases must be limited to specific number of milliseconds regardless of the database size.

All clients are C# applications and all database access is thru stored procedure calls.

In our architecture the SQL database is one of the focal points and most of the business logic is handled inside the stored procedures. Because of this approach, our stored procedures are quite complex and they often call other stored procedures and views internally.

Our Need to Analyze

In order to be able to analyze the overall system performance and the impact of any changes we introduced during the development of the system, we needed a way to gather some physical evidence and we needed some hard numbers to compare the “before’n’after” with. Server side traces provided us with a way to collect data easily and in a consistent manner. By analyzing the collected data we were able to determine the dynamic relationship between different queries.

Because of the way we access the database, the trace used was limited to capturing the RPC calls only. Of course, you could capture different events according to your own needs. You just need to find out the one that matches your environment the best.

Creating the Trace

The code below is what I used to create the trace. In this case, the trace is captured in a file with an automatically dated file name on the local server drive. If the file reaches the maximum defined size the captured data is being rolled over to a new file and so forth. Other details of the trace code are explained in the embedded comments…

 -- Declare variables
 DECLARE @rc INT
 DECLARE @TraceID INT
 DECLARE @maxFileSize bigint
 DECLARE @fileName NVARCHAR(128)
 DECLARE @on bit
 DECLARE @off bit
 -- Set values
 DECLARE @CurrentDate NVARCHAR(20)
 SET @CurrentDate = CONVERT(VARCHAR(20),getdate(), 112)
 SET @maxFileSize = 10
 SET @fileName = N'C:\Trace' + @CurrentDate
 SET @on = 1
 SET @off = 0
 PRINT @fileName
 -- Create trace
 EXEC @rc = sp_trace_create @TraceID output, 2, @fileName, @maxFileSize, NULL 
 -- If error end process
 IF (@rc != 0)GOTO error
 -- Set the events and data to collect
 EXEC sp_trace_setevent @TraceID, 10,  2, @off   --BinaryData
 EXEC sp_trace_setevent @TraceID, 10, 12, @on    --SPID
 EXEC sp_trace_setevent @TraceID, 10, 13, @on    --Duration
 EXEC sp_trace_setevent @TraceID, 10, 14, @on    --StartTime
 EXEC sp_trace_setevent @TraceID, 10, 15, @on    --EndTime
 EXEC sp_trace_setevent @TraceID, 10, 16, @on    --Reads
 EXEC sp_trace_setevent @TraceID, 10, 17, @on    --Writes
 EXEC sp_trace_setevent @TraceID, 10, 18, @on    --CPU
 EXEC sp_trace_setevent @TraceID, 10, 24, @on    --IndexID
 EXEC sp_trace_setevent @TraceID, 10, 34, @on    --ObjectName
 EXEC sp_trace_setevent @TraceID, 10, 48, @on    --RowCounts
 EXEC sp_trace_setevent @TraceID, 10, 4, @on     --TransactionID
 -- Do not collect ADO.NET connection reset calls.
 EXEC sp_trace_setfilter @traceid = @TraceID
         , @columnid = 1
             , @logical_operator = 0
         , @comparison_operator = 1
         , @value = N'exec sp_reset_connection'
 -- Start the trace
 EXEC sp_trace_setstatus @TraceID, 1
 -- display trace id for future references 
 SELECT TraceID=@TraceID 
 GOTO finish 
 -- error trap
 error: 
 SELECT ErrorCode=@rc 
 -- exit
 finish: 
 GO

Importing the Trace Data

For performance and system dynamics testing, I created a set of simulators (C# apps) that execute database calls in a similar fashion the live system would. The usage of simulators vs the live system allows us to run database tests in the same environment regardless of the performance of the physical equipment. I ran the simulators for about 24 hours and collected the trace data. Once completed, we needed to put the trace data in a spot where we could do some analysis. What better place to do such an analysis than an SQL Table?

Using the code below, the collected trace data was loaded into a table called “Trace1”. Since this uses SELECT/INTO, it’s quick and easy because you don’t even need to create the trace table ahead of time. The reason I used the TOP clause is to capture the same number of calls after each test run, so the statistical data could be analyzed on comparable number of calls.

SELECT TOP 500000
 ObjectName,
 TransactionID,
 SPID,
 Duration,
 StartTime,
 EndTime,
 Reads,
 Writes,
 CPU,
 RowCounts
 INTO Trace1
 FROM ::fn_trace_gettable('C:\Trace20090918.trc', DEFAULT)

After that, a simple call like this…

 SELECT
 ObjectName,
 COUNT(*),
 AVG(Duration),
 MAX(Duration),
 MIN(Duration)
 FROM Trace1
 GROUP BY ObjectName

…can provide us with some basic performance numbers.

Analyzing the Trace

The table below illustrates some collected numbers from two independent test runs. It lists just 5 stored procedures of the stored procedures we analyzed, the number of calls made, and the average duration (in microseconds) of those calls.

Procedure	Count	Avg Duration	% Change
DrawerSamples	37384	50755
SequencerWorldView	9341	74328
LISOrder	30355	1529
ResultAdd	77856	1650
MechanismPositionsStatus	117637	550

DrawerSamples	41848	35088	0.31
SequencerWorldView	10474	46930	0.37
LISOrder	27188	1477	0.03
ResultAdd	73034	1612	0.02
MechanismPositionsStatus	131843	532	0.03

In both test runs, all the stored procedures were identical and the tests started with a clean database. The only difference was in a way the DrawerSamples stored procedure accessed the data. I have changed the call to use WITH (NOLOCK) clause on all tables accessed. As you can see from the table above, it provided a 31% performance gain for this stored procedure. Because of the way this stored procedure is used in our application, it makes no difference if the data is read with or without the locks (from the business perspective). The performance critical stored procedure, in our case, is the SequencerWorldView. This query was tuned up all the way possible but the required performance was just not there and additional tuning attempts proved fruitless. But thanks to the change made to DrawerSamples stored procedure, we got a 37% performance gain on the calls to the critical SequencerWorldView stored procedure.

Now I have to point out, that both stored procedures are totally independent. They both are primarily SELECT statements accessing the same tables but they retrieve different data. The performance gain of SequencerWorldView was achieved by providing more access time to the underlying tables due to the lack of locks set by the DrawerSamples during the first test run.

The other procedures listed here showed no performance gain (2-3 % statistically in this case is negligible).

Without the ability to trace all the database calls and collect the hard data to analyze, it would be almost impossible to see the impact of such changes on the overall system performance. Of course this is just a small sample of the performance data that can be collected that way. The average duration is critical piece of information in measuring the overall query performance but is not the only one.

In our case, we also need to look at the duration distribution because we are controlling a physical device and frequent long queries could potentially create hardware lockups.

Analyzing the Distribution of Code Duration

A query like the following can provide some distribution data from the same trace. It uses classic Cross-Tab methodology to count the number of times ranges of durations took place:

SELECT
 ObjectName,
 COUNT(*),
 AVG(Duration),
 MAX(Duration),
 MIN(Duration),
 SUM(
      CASE
           WHEN Duration < 10000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration BETWEEN 10000 AND 50000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration BETWEEN 50000 AND 100000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration BETWEEN 100000 AND 200000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration BETWEEN 200000 AND 300000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration BETWEEN 300000 AND 400000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration BETWEEN 400000 AND 500000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration > 500000 THEN 1
           ELSE 0
      END)
 FROM Trace1
 GROUP BY ObjectName

Procedure	Count	Avg Duration	Max Duration	MinDuration	0-10	10-50	50-100	100-200	200-300	300-400	400-500	>500

DrawerSamples	37384	50755	1206951	4274	422	22454	13932	365	65	128	13	6
SequencerWorldView	9341	74328	731776	4558	19	1445	7001	837	34	3	1	1
LISOrder	30355	1529	185064	927	30294	30	28	3	0	0	0	0
ResultAdd	77856	1650	133453	657	74993	2834	28	1	0	0	0	0
MechanismPositionsStatus	117637	550	352133	367	117478	126	31	1	0	1	0	0

DrawerSamples	41848	35088	603626	4078	454	39781	1368	15	116	104	8	2
SequencerWorldView	10474	46930	440528	4467	57	6128	4049	223	16	0	1	0
LISOrder	27188	1477	196738	916	27124	39	22	3	0	0	0	0
ResultAdd	73034	1612	158325	661	71041	1885	23	85	0	0	0	0
MechanismPositionsStatus	131843	532	139235	356	131690	150	2	1	0	0	0	0

The above table shows the duration distribution for two test runs. The first test run used the original code. The second used the modified DrawersSample code (using the WITH NOLOCK clause). As you can see, before the modification there were 5 calls to SequencerWorldView stored procedure that took longer then 300 ms including one taking more then 0.5 second. After the modification that number dropped to 1 call taking longer then 300 ms and no calls longer then 0.5 seconds. Our system guys were pleased with such results.

In the past (using a different software architecture on the same device) we noticed that if the decision making process took too long, it needed to be reset and started again to accommodate for the changes to the physical world. Occasional long queries could be absorbed by the system but when the ratio of long running queries reached a certain level, the system would just freeze. The distribution of the query duration provided us with a critical piece of information we needed to confirm that the database performance will not introduce any system stability issues.

Conclusion

Although the analysis of the execution plan for a given query will help you tune that one query, there are times when you need to look at an entire system of queries and the impact they can have on each other. This article demonstrated how analyzing the data collected by the SQL Trace could help solve certain performance issues.

Depending on your needs, you can collect different pieces of information. They can be used to provide additional valuable performance reports. Look up the sp_trace_SetEvent system stored procedure in the Books Online to get an idea of what can be collected this way. If you experiment with the traces you can probably come up with some other creative ways they can help you address some technical and business needs.

Thank You for reading.

Using Server Side Traces for Dynamic Performance Evaluation

Introduction:

Our System:

Our Need to Analyze

Creating the Trace

Importing the Trace Data

Analyzing the Trace

Analyzing the Distribution of Code Duration

Conclusion

Rate

Share

Categories

Share

Rate

Using Server Side Traces for Dynamic Performance Evaluation

Introduction:

Our System:

Our Need to Analyze

Creating the Trace

Importing the Trace Data

Analyzing the Trace

Analyzing the Distribution of Code Duration

Conclusion

Rate

Share

Categories

Share

Rate

Related content

Saving Space To Increase Performance

Optimizing SQL Server Performance by using File and Filegroups

Taming the Stateless Beast: Managing Session State

Locking in SQL Server 6.5

Job to Monitor SQL Server 7.0 Performance and Activity