SQLServerCentral Article

Performance Tuning Using Extended Events: Part 3

,

Performance tuning is the process of analyzing query plans, execution plans, indexes, partitioning, etc. and finding ways to reduce the time a query takes or reduce resource usage by that query. There are four events that I found can help you to identify queries that can be performance tuned. They are deadlocks, application aborts, blocking, and long running queries. However, for these aggregation reports deadlocks will not be included.

In Part 1 of this series we looked at building these Extended Event sessions and in Part 2 we looked at shredding the EE sessions XML. In this, Part 3, article we’ll build aggregate reports using the stored procedure defined in Part 2 and analyze the results.

Running the Reports

Import questions to answer before running these reports are: when should I run them and what time range should I use? I have chosen to run these monthly in between each code release. My range of time is from the first day after the release to the current day I run the report on. This timing and range ensures that each report I run does not overlap with a previous run and that I am evaluating the most recent set of code available. By running the report in the middle of each release window I ensure that I have sufficient data to do analysis with and also that I have enough time to analyze and generate recommendations that can be acted on before the next release window. Of course most issues that are identified during this iteration end up being resolved one release after the current one.

Application Aborts (A.K.A Time Outs)

For time outs we will look at 2 aggregate reports: one that looks for patterns in calls and one that looks for patterns in time.

Populating the #TO_RPT Temporary Table

Because we are getting our data from the event_GetEventData stored procedure defined in Part 2 of this series we will need to populate a temporary table to hold the results. You’ll notice that I am filtering out SqlQueryNotificationService. This is because this application will time out occasionally and may also cause blocking (to itself) and long running queries. However, this has not been something I’m concerned with as it recovers on its own with no adverse effect.

DECLARE @Start datetime2,@Stop datetime2
SELECT @Start = DATEADD(hh,-240,GETDATE()),@Stop = GETDATE()
-- TIME OUTS ---------------------------------------
IF OBJECT_ID('tempdb..#TO_RPT') IS NOT NULL
       DROP TABLE #TO_RPT;
CREATE TABLE #TO_RPT([timestamp] datetime2,client_app_name varchar(255),database_name varchar(255),[statement] varchar(MAX),[object_name] varchar(255),
                                         output_parameters varchar(MAX),duration decimal(10,2),cpu_time decimal(10,2),physical_reads bigint,logical_reads bigint,writes bigint,
                                         row_count bigint,client_hostname varchar(255),nt_username varchar(255),session_id int,plan_handle varchar(255),connection_reset_option bigint,
                                         connection_reset_option_text varchar(255),data_stream varchar(MAX),is_system varchar(20),result bigint,event_name varchar(50))
INSERT INTO #TO_RPT([timestamp],client_app_name,database_name,[statement],[object_name],output_parameters,duration,cpu_time,physical_reads,logical_reads,writes,
                                  row_count,client_hostname,nt_username,session_id,plan_handle,connection_reset_option,connection_reset_option_text,data_stream,
                                  is_system,result,event_name)
EXEC dbo.event_GetEventData @eventName='TimeOuts',@beginDateTime=@Start,@endDateTime=@Stop,@client_app_NOT_LIKE='SqlQueryNotificationService'
----------------------------------------------------

Time Outs by Call

The query below returns the count of the number of time a query or stored procedure has been called along with the database name and sums of the CPU time, physical and logical reads and writes. The importance of performance tuning these calls can be measured by the number of times a call has timed out. Calls that have a high count are most likely more important to tune than ones with small or one time out. However, knowledge of the database and supporting applications may lead you to different conclusions.

SELECT COUNT(*) AS [Call Count],[object],MAX(database_name) AS [database_name],
              SUM(cpu_time) AS cpu_time,SUM(physical_reads) AS physical_reads,SUM(logical_reads) AS logical_reads,SUM(writes) AS writes
FROM (SELECT [timestamp],client_app_name,database_name,
                     CASE WHEN [object_name] = 'sp_executesql' THEN LEFT(REPLACE(REPLACE(REPLACE([statement],CHAR(10),' '),CHAR(9),' '),CHAR(13),' '),8000) ELSE [object_name] END AS [object],
                     LEN([statement]) AS [statement length],
                     [object_name],output_parameters,duration,cpu_time,physical_reads,logical_reads,writes,
                     row_count,client_hostname,nt_username,session_id,plan_handle,connection_reset_option,connection_reset_option_text,data_stream,is_system,result,event_name
          FROM #TO_RPT) x
GROUP BY [object]

Time Outs by Time

The query below returns the count of calls broken out by the hour of the day in which they occurred and includes CPU time, physical and logical reads and writes. The value of this report is that you may find a pattern where a large number of your calls are timing out. For example, I have found on many servers on 24/7 applications that a large number of time outs will occur during nightly maintenance periods.

SELECT COUNT(*) AS [Call Count],[hour],SUM(cpu_time) AS cpu_time,SUM(physical_reads) AS physical_reads,SUM(logical_reads) AS logical_reads,SUM(writes) AS writes
FROM (SELECT DATEPART(hour,[timestamp]) AS [hour],client_app_name,database_name,
                     CASE WHEN [object_name] = 'sp_executesql' THEN LEFT(REPLACE(REPLACE(REPLACE([statement],CHAR(10),' '),CHAR(9),' '),CHAR(13),' '),8000) ELSE [object_name] END AS [object],
                     LEN([statement]) AS [statement length],
                     [object_name],output_parameters,duration,cpu_time,physical_reads,logical_reads,writes,
                     row_count,client_hostname,nt_username,session_id,plan_handle,connection_reset_option,connection_reset_option_text,data_stream,is_system,result,event_name
          FROM #TO_RPT) x
GROUP BY [hour]
ORDER BY [hour]

Blocking

For blocking we’ll look at 3 reports. These reports look to identify the most common causes of blocking, the most commonly blocked calls, and the most commonly blocked objects.

Populating the #BLK_RPT Temporary Table

Because we are getting our data from the event_GetEventData stored procedure defined in Part 2 of this series we will need to populate a temporary table to hold the results. You’ll notice that I am filtering out SqlQueryNotificationService. This is because this application will time out occasionally and may also cause blocking (to itself) and long running queries. However, this has not been something I’m concerned with as it recovers on its own with no adverse effect.

DECLARE @Start datetime2,@Stop datetime2
SELECT @Start = DATEADD(hh,-240,GETDATE()),@Stop = GETDATE()
-- BLOCKING ----------------------------------------
IF OBJECT_ID('tempdb..#BLK_RPT') IS NOT NULL
       DROP TABLE #BLK_RPT;
CREATE TABLE #BLK_RPT([BlockGrouping] int,[TimeStamp] datetime2,[BlockDuration] decimal(10,2),[IsBlocked] bit,SPID int,[Blocking_SPID] int,[Status] varchar(255),
                                         [ProcessId] varchar(255),WaitResource varchar(255),WaitTime bigint,ClientApp varchar(255),HostName varchar(255),LoginName varchar(255),
                                         IsolationLevel varchar(255),[dbName] varchar(255),[Buffered_Input] varchar(MAX),TransactionName varchar(255),LastTranStarted varchar(255),
                                         LockMode varchar(255),LastBatchStarted datetime2,LastBatchCompleted datetime2)
INSERT INTO #BLK_RPT([BlockGrouping],[TimeStamp],[BlockDuration],[IsBlocked],SPID,[Blocking_SPID],[Status],[ProcessId],WaitResource,WaitTime,ClientApp,HostName,
                                         LoginName,IsolationLevel,[dbName],[Buffered_Input],TransactionName,LastTranStarted,LockMode,LastBatchStarted,LastBatchCompleted)
EXEC dbo.event_GetEventData @eventName='BlockingReport',@beginDateTime=@Start,@endDateTime=@Stop,@client_app_NOT_LIKE='SqlQueryNotificationService'
----------------------------------------------------

Blockers Summary

The query below returns the count of calls that caused blocking. Note that this list includes lead blockers as well as the chain of blocking calls. This call also returns the client application name, a portion of the SQL text, the wait resource, sum of wait time and isolation level of the call.

SELECT COUNT(*) AS [Call Count],ClientApp,Buffered_Input,WaitResource,SUM(WaitTime) AS WaitTime,MIN(IsolationLevel) AS IsolationLevel
FROM ( SELECT RANK() OVER(ORDER BY [TimeStamp]) AS BlockGrouping,[TimeStamp],[BlockDuration],[IsBlocked],SPID,[Blocking_SPID],[Status],[ProcessId],WaitResource,WaitTime,
                     ClientApp,HostName,LoginName,IsolationLevel,[dbName],
                     LEFT(REPLACE(REPLACE(REPLACE([Buffered_Input],CHAR(10),' '),CHAR(9),' '),CHAR(13),' '),8000) AS [Buffered_Input],
                     TransactionName,LastTranStarted,LockMode,LastBatchStarted,LastBatchCompleted
              FROM #BLK_RPT) x
WHERE IsBlocked = 0
GROUP BY ClientApp,Buffered_Input,WaitResource
ORDER BY ClientApp,Buffered_Input,WaitResource

In the subquery for the Buffered_Input column I am replacing carriage returns, tabs and line feeds with spaces. This is to make it easier to copy and paste the report into Excel. I am also only taking the left 8000 characters for the same reason.

Blocked Summary

The query below returns the count of calls that were blocked. There will be an overlap of calls between this query and the Blockers Summary calls above. This is because some of these calls will be both a blocker and a blocked call. This query is identical to the Blockers Summary call except the WHERE clause is changed to “IsBlocked = 1”.

SELECT COUNT(*) AS [Call Count],ClientApp,Buffered_Input,WaitResource,SUM(WaitTime) AS WaitTime,MIN(IsolationLevel) AS IsolationLevel
FROM ( SELECT RANK() OVER(ORDER BY [TimeStamp]) AS BlockGrouping,[TimeStamp],[BlockDuration],[IsBlocked],SPID,[Blocking_SPID],[Status],[ProcessId],WaitResource,WaitTime,
                     ClientApp,HostName,LoginName,IsolationLevel,[dbName],
                     LEFT(REPLACE(REPLACE(REPLACE([Buffered_Input],CHAR(10),' '),CHAR(9),' '),CHAR(13),' '),8000) AS [Buffered_Input],
                     TransactionName,LastTranStarted,LockMode,LastBatchStarted,LastBatchCompleted
              FROM #BLK_RPT) x
WHERE IsBlocked = 1
GROUP BY ClientApp,Buffered_Input,WaitResource
ORDER BY ClientApp,Buffered_Input,WaitResource

In the subquery for the Buffered_Input column I am replacing carriage returns, tabs and line feeds with spaces. This is to make it easier to copy and paste the report into Excel. I am also only taking the left 8000 characters for the same reason.

Blocked Objects Summary

The query below returns the count of objects that were blocked. This is useful information in determining a potential workload issue or poor performance on a table or index.

SELECT COUNT(*) AS [Call Count],WaitResource
FROM #BLK_RPT
GROUP BY WaitResource
ORDER BY COUNT(*) DESC

Long Running Queries

For long running queries there is only one report we will run. This report looks to find the calls that exceed 25 seconds in duration the most number of times.

Populating the #GT25_RPT Temporary Table

Because we are getting our data from the event_GetEventData stored procedure defined in Part 2 of this series we will need to populate a temporary table to hold the results. You’ll notice that I am filtering out SqlQueryNotificationService. This is because this application will time out occasionally and may also cause blocking (to itself) and long running queries. However, this has not been something I’m concerned with as it recovers on its own with no adverse effect.

DECLARE @Start datetime2,@Stop datetime2
SELECT @Start = DATEADD(hh,-240,GETDATE()),@Stop = GETDATE()
-- DurationGT25Seconds -----------------------------
IF OBJECT_ID('tempdb..#GT25_RPT') IS NOT NULL
       DROP TABLE #GT25_RPT;
CREATE TABLE #GT25_RPT([timestamp] datetime2,duration decimal(10,2),client_app_name varchar(255),[statement] varchar(MAX),cpu_time decimal(10,2),physical_reads bigint,
                                         logical_reads bigint,writes bigint,row_count bigint,database_name varchar(255),[object_name] varchar(255),output_parameters varchar(MAX),
                                         client_hostname varchar(255),nt_username varchar(255),connection_reset_option bigint,connection_reset_option_text varchar(255),data_stream varchar(MAX),
                                         is_system varchar(20),plan_handle varchar(255),session_id int,event_name varchar(50),result bigint)
INSERT INTO #GT25_RPT([timestamp],duration,client_app_name,[statement],cpu_time,physical_reads,logical_reads,writes,row_count,database_name,[object_name],
                                  output_parameters,client_hostname,nt_username,connection_reset_option,connection_reset_option_text,data_stream,is_system,plan_handle,
                                  session_id,event_name,result)
EXEC dbo.event_GetEventData @eventName='DurationGT25Seconds',@beginDateTime=@Start,@endDateTime=@Stop,@client_app_NOT_LIKE='SqlQueryNotificationService'
---------------------------------------------------

Long Running Queries Summary

This query returns a list of calls exceeding 25 seconds sorted descending by the number of time a call is made and the maximum duration of those calls. The query includes the maximum values for duration, CPU time, physical and logical read, writes and row count. It also shows the stored procedure executed or the first 70 characters of the query and an indicator stating whether the call timed out.

-- DurationGT25Seconds -----------------------------
SELECT COUNT(*) AS [Call Count],[Result],[statement],
              MAX(duration) AS [Max_duration],MAX(cpu_time) AS [Max_cpu_time],MAX(physical_reads) AS [Max_physical_reads],MAX(logical_reads) AS [Max_logical_reads],
              MAX(writes) AS [Max_writes],MAX(row_count) AS [Max_row_count],
              MAX(client_app_name) AS client_app_name,MAX(database_name) AS database_name
FROM ( SELECT CASE WHEN result=0 THEN 'Success' WHEN result=2 THEN 'Time Out' ELSE CONVERT(varchar(20),result) END AS [Result],
                     duration,
                     CASE WHEN [object_name]='sp_executesql' THEN LEFT(REPLACE(REPLACE(REPLACE([statement],CHAR(10),' '),CHAR(9),' '),CHAR(13),' '),70) ELSE [object_name] END AS [statement],
                     LEN([statement]) AS [statement length],
                     cpu_time,physical_reads,logical_reads,writes,row_count,
                     client_app_name,database_name,[timestamp]
              FROM #GT25_RPT
         ) x
GROUP BY [Result],[statement]
ORDER BY COUNT(*) DESC,MAX(duration) DESC
----------------------------------------------------

Using the reports above you should be able to identify your most problematic query calls and if used in an iterative process you should be able to get your SQL server queries and stored procedures in tip top shape before too long. However, careful root-cause analysis should be used to determine which calls need to be modified and thorough testing should be performed before implementing any changes. One nice feature of this iterative approach is that you can verify the changes that have been made are working because they should drop off of these reports.

Resources

Rate

4.46 (13)

You rated this post out of 5. Change rating

Share

Share

Rate

4.46 (13)

You rated this post out of 5. Change rating