Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Using Server Side Traces for Dynamic Performance Evaluation

By Jacek Osuchowski,

Introduction:

The query execution plan is probably the first tool that DBAs and T-SQL programmers resort to to solve the performance issues of their SQL Server applications. In my opinion, the biggest problem with this approach is the isolated context of the performance information gathered. It looks only at one query at a time. A working SQL Server is a dynamic system bombarded by hundreds or thousands requests at the same time and execution time of one query may impact another’s, especially if they access the same tables. In some cases the only way to further improve one query’s performance is to modify a different query.

The technique described in this article may not be a viable solution for every SQL Server deployment but, because it was successful in solving several technical issues in our implementation, I would like to share it with you. Before I begin, I would like to describe the environment we are operating within.

Our System:

We use SQL Server Express Edition 2005 on a medical device. This simplifies things on one end – there is very limited and predictable number of clients calling the database and the database is very small by any standard. On the other hand, because the database is a part of a physical system, the response time of the query in some cases must be limited to specific number of milliseconds regardless of the database size.

All clients are C# applications and all database access is thru stored procedure calls.

In our architecture the SQL database is one of the focal points and most of the business logic is handled inside the stored procedures. Because of this approach, our stored procedures are quite complex and they often call other stored procedures and views internally.

Our Need to Analyze

In order to be able to analyze the overall system performance and the impact of any changes we introduced during the development of the system, we needed a way to gather some physical evidence and we needed some hard numbers to compare the “before’n’after” with. Server side traces provided us with a way to collect data easily and in a consistent manner. By analyzing the collected data we were able to determine the dynamic relationship between different queries.

Because of the way we access the database, the trace used was limited to capturing the RPC calls only. Of course, you could capture different events according to your own needs. You just need to find out the one that matches your environment the best.

Creating the Trace

The code below is what I used to create the trace. In this case, the trace is captured in a file with an automatically dated file name on the local server drive. If the file reaches the maximum defined size the captured data is being rolled over to a new file and so forth. Other details of the trace code are explained in the embedded comments…

 -- Declare variables
 DECLARE @rc INT
 DECLARE @TraceID INT
 DECLARE @maxFileSize bigint
 DECLARE @fileName NVARCHAR(128)
 DECLARE @on bit
 DECLARE @off bit
 -- Set values
 DECLARE @CurrentDate NVARCHAR(20)
 SET @CurrentDate = CONVERT(VARCHAR(20),getdate(), 112)
 SET @maxFileSize = 10
 SET @fileName = N'C:\Trace' + @CurrentDate
 SET @on = 1
 SET @off = 0
 PRINT @fileName
 -- Create trace
 EXEC @rc = sp_trace_create @TraceID output, 2, @fileName, @maxFileSize, NULL 
 -- If error end process
 IF (@rc != 0)GOTO error
 -- Set the events and data to collect
 EXEC sp_trace_setevent @TraceID, 10,  2, @off   --BinaryData
 EXEC sp_trace_setevent @TraceID, 10, 12, @on    --SPID
 EXEC sp_trace_setevent @TraceID, 10, 13, @on    --Duration
 EXEC sp_trace_setevent @TraceID, 10, 14, @on    --StartTime
 EXEC sp_trace_setevent @TraceID, 10, 15, @on    --EndTime
 EXEC sp_trace_setevent @TraceID, 10, 16, @on    --Reads
 EXEC sp_trace_setevent @TraceID, 10, 17, @on    --Writes
 EXEC sp_trace_setevent @TraceID, 10, 18, @on    --CPU
 EXEC sp_trace_setevent @TraceID, 10, 24, @on    --IndexID
 EXEC sp_trace_setevent @TraceID, 10, 34, @on    --ObjectName
 EXEC sp_trace_setevent @TraceID, 10, 48, @on    --RowCounts
 EXEC sp_trace_setevent @TraceID, 10, 4, @on     --TransactionID
 -- Do not collect ADO.NET connection reset calls.
 EXEC sp_trace_setfilter @traceid = @TraceID
         , @columnid = 1
             , @logical_operator = 0
         , @comparison_operator = 1
         , @value = N'exec sp_reset_connection'
 -- Start the trace
 EXEC sp_trace_setstatus @TraceID, 1
 -- display trace id for future references 
 SELECT TraceID=@TraceID 
 GOTO finish 
 -- error trap
 error: 
 SELECT ErrorCode=@rc 
 -- exit
 finish: 
 GO

Importing the Trace Data

For performance and system dynamics testing, I created a set of simulators (C# apps) that execute database calls in a similar fashion the live system would. The usage of simulators vs the live system allows us to run database tests in the same environment regardless of the performance of the physical equipment. I ran the simulators for about 24 hours and collected the trace data. Once completed, we needed to put the trace data in a spot where we could do some analysis. What better place to do such an analysis than an SQL Table?

Using the code below, the collected trace data was loaded into a table called “Trace1”. Since this uses SELECT/INTO, it’s quick and easy because you don’t even need to create the trace table ahead of time. The reason I used the TOP clause is to capture the same number of calls after each test run, so the statistical data could be analyzed on comparable number of calls.

SELECT TOP 500000
 ObjectName,
 TransactionID,
 SPID,
 Duration,
 StartTime,
 EndTime,
 Reads,
 Writes,
 CPU,
 RowCounts
 INTO Trace1
 FROM ::fn_trace_gettable('C:\Trace20090918.trc', DEFAULT)

After that, a simple call like this…

 SELECT
 ObjectName,
 COUNT(*),
 AVG(Duration),
 MAX(Duration),
 MIN(Duration)
 FROM Trace1
 GROUP BY ObjectName

…can provide us with some basic performance numbers.

Analyzing the Trace

The table below illustrates some collected numbers from two independent test runs. It lists just 5 stored procedures of the stored procedures we analyzed, the number of calls made, and the average duration (in microseconds) of those calls.

Procedure Count Avg Duration % Change
DrawerSamples 37384 50755
SequencerWorldView 9341 74328
LISOrder 30355 1529
ResultAdd 77856 1650
MechanismPositionsStatus 117637 550
       
DrawerSamples 41848 35088 0.31
SequencerWorldView 10474 46930 0.37
LISOrder 27188 1477 0.03
ResultAdd 73034 1612 0.02
MechanismPositionsStatus 131843 532 0.03

In both test runs, all the stored procedures were identical and the tests started with a clean database. The only difference was in a way the DrawerSamples stored procedure accessed the data. I have changed the call to use WITH (NOLOCK) clause on all tables accessed. As you can see from the table above, it provided a 31% performance gain for this stored procedure. Because of the way this stored procedure is used in our application, it makes no difference if the data is read with or without the locks (from the business perspective). The performance critical stored procedure, in our case, is the SequencerWorldView. This query was tuned up all the way possible but the required performance was just not there and additional tuning attempts proved fruitless. But thanks to the change made to DrawerSamples stored procedure, we got a 37% performance gain on the calls to the critical SequencerWorldView stored procedure.

Now I have to point out, that both stored procedures are totally independent. They both are primarily SELECT statements accessing the same tables but they retrieve different data. The performance gain of SequencerWorldView was achieved by providing more access time to the underlying tables due to the lack of locks set by the DrawerSamples during the first test run.

The other procedures listed here showed no performance gain (2-3 % statistically in this case is negligible).

Without the ability to trace all the database calls and collect the hard data to analyze, it would be almost impossible to see the impact of such changes on the overall system performance. Of course this is just a small sample of the performance data that can be collected that way. The average duration is critical piece of information in measuring the overall query performance but is not the only one.

In our case, we also need to look at the duration distribution because we are controlling a physical device and frequent long queries could potentially create hardware lockups.

Analyzing the Distribution of Code Duration

A query like the following can provide some distribution data from the same trace. It uses classic Cross-Tab methodology to count the number of times ranges of durations took place:

SELECT
 ObjectName,
 COUNT(*),
 AVG(Duration),
 MAX(Duration),
 MIN(Duration),
 SUM(
      CASE
           WHEN Duration < 10000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration BETWEEN 10000 AND 50000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration BETWEEN 50000 AND 100000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration BETWEEN 100000 AND 200000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration BETWEEN 200000 AND 300000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration BETWEEN 300000 AND 400000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration BETWEEN 400000 AND 500000 THEN 1
           ELSE 0
      END),
 SUM(
      CASE
           WHEN Duration > 500000 THEN 1
           ELSE 0
      END)
 FROM Trace1
 GROUP BY ObjectName
Procedure Count Avg Duration Max Duration MinDuration 0-10 10-50 50-100 100-200 200-300 300-400 400-500 >500
DrawerSamples 37384 50755 1206951 4274 422 22454 13932 365 65 128 13 6
SequencerWorldView 9341 74328 731776 4558 19 1445 7001 837 34 3 1 1
LISOrder 30355 1529 185064 927 30294 30 28 3 0 0 0 0
ResultAdd 77856 1650 133453 657 74993 2834 28 1 0 0 0 0
MechanismPositionsStatus 117637 550 352133 367 117478 126 31 1 0 1 0 0
DrawerSamples 41848 35088 603626 4078 454 39781 1368 15 116 104 8 2
SequencerWorldView 10474 46930 440528 4467 57 6128 4049 223 16 0 1 0
LISOrder 27188 1477 196738 916 27124 39 22 3 0 0 0 0
ResultAdd 73034 1612 158325 661 71041 1885 23 85 0 0 0 0
MechanismPositionsStatus 131843 532 139235 356 131690 150 2 1 0 0 0 0

The above table shows the duration distribution for two test runs. The first test run used the original code. The second used the modified DrawersSample code (using the WITH NOLOCK clause). As you can see, before the modification there were 5 calls to SequencerWorldView stored procedure that took longer then 300 ms including one taking more then 0.5 second. After the modification that number dropped to 1 call taking longer then 300 ms and no calls longer then 0.5 seconds. Our system guys were pleased with such results.

In the past (using a different software architecture on the same device) we noticed that if the decision making process took too long, it needed to be reset and started again to accommodate for the changes to the physical world. Occasional long queries could be absorbed by the system but when the ratio of long running queries reached a certain level, the system would just freeze. The distribution of the query duration provided us with a critical piece of information we needed to confirm that the database performance will not introduce any system stability issues.

Conclusion

Although the analysis of the execution plan for a given query will help you tune that one query, there are times when you need to look at an entire system of queries and the impact they can have on each other. This article demonstrated how analyzing the data collected by the SQL Trace could help solve certain performance issues.

Depending on your needs, you can collect different pieces of information. They can be used to provide additional valuable performance reports. Look up the sp_trace_SetEvent system stored procedure in the Books Online to get an idea of what can be collected this way. If you experiment with the traces you can probably come up with some other creative ways they can help you address some technical and business needs.

Thank You for reading.

Total article views: 5358 | Views in the last 30 days: 2
 
Related Articles
FORUM

Trace

Trace

FORUM

Performance Trace Helpers

I have created these two stored procedures for performance measuring my queries - hoping for constru...

FORUM

SQL 2005 - Long duration Stored Procedure

SQL 2005 - Long duration Stored Procedure

BLOG

Tracing Introduction

SQL Server tracing is essential for troubleshooting performance issues, yet it can put loads on your...

FORUM

SQL Server 2005 Profiler

Duration

Tags
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones