The query execution plan is probably the first tool that DBAs and T-SQL programmers resort to to solve the performance issues of their SQL Server applications. In my opinion, the biggest problem with this approach is the isolated context of the performance information gathered. It looks only at one query at a time. A working SQL Server is a dynamic system bombarded by hundreds or thousands requests at the same time and execution time of one query may impact another’s, especially if they access the same tables. In some cases the only way to further improve one query’s performance is to modify a different query.
The technique described in this article may not be a viable solution for every SQL Server deployment but, because it was successful in solving several technical issues in our implementation, I would like to share it with you. Before I begin, I would like to describe the environment we are operating within.
We use SQL Server Express Edition 2005 on a medical device. This simplifies things on one end – there is very limited and predictable number of clients calling the database and the database is very small by any standard. On the other hand, because the database is a part of a physical system, the response time of the query in some cases must be limited to specific number of milliseconds regardless of the database size.
All clients are C# applications and all database access is thru stored procedure calls.
In our architecture the SQL database is one of the focal points and most of the business logic is handled inside the stored procedures. Because of this approach, our stored procedures are quite complex and they often call other stored procedures and views internally.
Our Need to Analyze
In order to be able to analyze the overall system performance and the impact of any changes we introduced during the development of the system, we needed a way to gather some physical evidence and we needed some hard numbers to compare the “before’n’after” with. Server side traces provided us with a way to collect data easily and in a consistent manner. By analyzing the collected data we were able to determine the dynamic relationship between different queries.
Because of the way we access the database, the trace used was limited to capturing the RPC calls only. Of course, you could capture different events according to your own needs. You just need to find out the one that matches your environment the best.
Creating the Trace
The code below is what I used to create the trace. In this case, the trace is captured in a file with an automatically dated file name on the local server drive. If the file reaches the maximum defined size the captured data is being rolled over to a new file and so forth. Other details of the trace code are explained in the embedded comments…
-- Declare variables DECLARE @rc INT DECLARE @TraceID INT DECLARE @maxFileSize bigint DECLARE @fileName NVARCHAR(128) DECLARE @on bit DECLARE @off bit -- Set values DECLARE @CurrentDate NVARCHAR(20) SET @CurrentDate = CONVERT(VARCHAR(20),getdate(), 112) SET @maxFileSize = 10 SET @fileName = N'C:\Trace' + @CurrentDate SET @on = 1 SET @off = 0 PRINT @fileName -- Create trace EXEC @rc = sp_trace_create @TraceID output, 2, @fileName, @maxFileSize, NULL -- If error end process IF (@rc != 0)GOTO error -- Set the events and data to collect EXEC sp_trace_setevent @TraceID, 10, 2, @off --BinaryData EXEC sp_trace_setevent @TraceID, 10, 12, @on --SPID EXEC sp_trace_setevent @TraceID, 10, 13, @on --Duration EXEC sp_trace_setevent @TraceID, 10, 14, @on --StartTime EXEC sp_trace_setevent @TraceID, 10, 15, @on --EndTime EXEC sp_trace_setevent @TraceID, 10, 16, @on --Reads EXEC sp_trace_setevent @TraceID, 10, 17, @on --Writes EXEC sp_trace_setevent @TraceID, 10, 18, @on --CPU EXEC sp_trace_setevent @TraceID, 10, 24, @on --IndexID EXEC sp_trace_setevent @TraceID, 10, 34, @on --ObjectName EXEC sp_trace_setevent @TraceID, 10, 48, @on --RowCounts EXEC sp_trace_setevent @TraceID, 10, 4, @on --TransactionID -- Do not collect ADO.NET connection reset calls. EXEC sp_trace_setfilter @traceid = @TraceID , @columnid = 1 , @logical_operator = 0 , @comparison_operator = 1 , @value = N'exec sp_reset_connection' -- Start the trace EXEC sp_trace_setstatus @TraceID, 1 -- display trace id for future references SELECT TraceID=@TraceID GOTO finish -- error trap error: SELECT ErrorCode=@rc -- exit finish: GO
Importing the Trace Data
For performance and system dynamics testing, I created a set of simulators (C# apps) that execute database calls in a similar fashion the live system would. The usage of simulators vs the live system allows us to run database tests in the same environment regardless of the performance of the physical equipment. I ran the simulators for about 24 hours and collected the trace data. Once completed, we needed to put the trace data in a spot where we could do some analysis. What better place to do such an analysis than an SQL Table?
Using the code below, the collected trace data was loaded into a table called “Trace1”. Since this uses SELECT/INTO, it’s quick and easy because you don’t even need to create the trace table ahead of time. The reason I used the TOP clause is to capture the same number of calls after each test run, so the statistical data could be analyzed on comparable number of calls.
SELECT TOP 500000 ObjectName, TransactionID, SPID, Duration, StartTime, EndTime, Reads, Writes, CPU, RowCounts INTO Trace1 FROM ::fn_trace_gettable('C:\Trace20090918.trc', DEFAULT)
After that, a simple call like this…
SELECT ObjectName, COUNT(*), AVG(Duration), MAX(Duration), MIN(Duration) FROM Trace1 GROUP BY ObjectName
…can provide us with some basic performance numbers.
Analyzing the Trace
The table below illustrates some collected numbers from two independent test runs. It lists just 5 stored procedures of the stored procedures we analyzed, the number of calls made, and the average duration (in microseconds) of those calls.
|Procedure||Count||Avg Duration||% Change|
In both test runs, all the stored procedures were identical and the tests started with a clean database. The only difference was in a way the DrawerSamples stored procedure accessed the data. I have changed the call to use WITH (NOLOCK) clause on all tables accessed. As you can see from the table above, it provided a 31% performance gain for this stored procedure. Because of the way this stored procedure is used in our application, it makes no difference if the data is read with or without the locks (from the business perspective). The performance critical stored procedure, in our case, is the SequencerWorldView. This query was tuned up all the way possible but the required performance was just not there and additional tuning attempts proved fruitless. But thanks to the change made to DrawerSamples stored procedure, we got a 37% performance gain on the calls to the critical SequencerWorldView stored procedure.
Now I have to point out, that both stored procedures are totally independent. They both are primarily SELECT statements accessing the same tables but they retrieve different data. The performance gain of SequencerWorldView was achieved by providing more access time to the underlying tables due to the lack of locks set by the DrawerSamples during the first test run.
The other procedures listed here showed no performance gain (2-3 % statistically in this case is negligible).
Without the ability to trace all the database calls and collect the hard data to analyze, it would be almost impossible to see the impact of such changes on the overall system performance. Of course this is just a small sample of the performance data that can be collected that way. The average duration is critical piece of information in measuring the overall query performance but is not the only one.
In our case, we also need to look at the duration distribution because we are controlling a physical device and frequent long queries could potentially create hardware lockups.
Analyzing the Distribution of Code Duration
A query like the following can provide some distribution data from the same trace. It uses classic Cross-Tab methodology to count the number of times ranges of durations took place:
SELECT ObjectName, COUNT(*), AVG(Duration), MAX(Duration), MIN(Duration), SUM( CASE WHEN Duration < 10000 THEN 1 ELSE 0 END), SUM( CASE WHEN Duration BETWEEN 10000 AND 50000 THEN 1 ELSE 0 END), SUM( CASE WHEN Duration BETWEEN 50000 AND 100000 THEN 1 ELSE 0 END), SUM( CASE WHEN Duration BETWEEN 100000 AND 200000 THEN 1 ELSE 0 END), SUM( CASE WHEN Duration BETWEEN 200000 AND 300000 THEN 1 ELSE 0 END), SUM( CASE WHEN Duration BETWEEN 300000 AND 400000 THEN 1 ELSE 0 END), SUM( CASE WHEN Duration BETWEEN 400000 AND 500000 THEN 1 ELSE 0 END), SUM( CASE WHEN Duration > 500000 THEN 1 ELSE 0 END) FROM Trace1 GROUP BY ObjectName
|Procedure||Count||Avg Duration||Max Duration||MinDuration||0-10||10-50||50-100||100-200||200-300||300-400||400-500||>500|
The above table shows the duration distribution for two test runs. The first test run used the original code. The second used the modified DrawersSample code (using the WITH NOLOCK clause). As you can see, before the modification there were 5 calls to SequencerWorldView stored procedure that took longer then 300 ms including one taking more then 0.5 second. After the modification that number dropped to 1 call taking longer then 300 ms and no calls longer then 0.5 seconds. Our system guys were pleased with such results.
In the past (using a different software architecture on the same device) we noticed that if the decision making process took too long, it needed to be reset and started again to accommodate for the changes to the physical world. Occasional long queries could be absorbed by the system but when the ratio of long running queries reached a certain level, the system would just freeze. The distribution of the query duration provided us with a critical piece of information we needed to confirm that the database performance will not introduce any system stability issues.
Although the analysis of the execution plan for a given query will help you tune that one query, there are times when you need to look at an entire system of queries and the impact they can have on each other. This article demonstrated how analyzing the data collected by the SQL Trace could help solve certain performance issues.
Depending on your needs, you can collect different pieces of information. They can be used to provide additional valuable performance reports. Look up the sp_trace_SetEvent system stored procedure in the Books Online to get an idea of what can be collected this way. If you experiment with the traces you can probably come up with some other creative ways they can help you address some technical and business needs.
Thank You for reading.