Recently I was asked to investigate a timeout issue that was causing SSIS package executions to fail.
The timeouts seemed to have no pattern of any kind, there were no long blocking queries, the packages’ sessions were SUSPENDED until they timed out, and the sessions were not blocked at any point. At first glance, there was no reason to suspect a memory issue, since all user queries executed successfully and so did most of the SSIS packages. Furthermore, packages that did time out occasionally executed successfully from time to time and the number of executions remained the same.
Executing sp_who2 returned the sessions that were suspended but the blocked by column (Blk By) was empty. Tracking the server’s activity using the sys.dm_os_waiting_tasks dynamic view, I discovered that all the suspended sessions were pending with the RESOURCE_SEMAPHORE wait type, indicating that this is indeed was a case of memory pressure.
In the following example, we’re executing sp_who2, which returns a SUSPENDED status for session 71 with no blocking session. However the sys.dm_os_waiting_tasks dynamic view returns the blocking_session_id, the wait_duration_ms, and, most important for us in this case, the wait_type, which was RESOURCE_SEMAPHORE.
EXEC sp_who2 71
SELECT [session_id], [wait_duration_ms], [wait_type], [blocking_session_id] FROM sys.dm_os_waiting_tasks WHERE [session_id] = 71
The RESOURCE_SEMAPHORE wait type indicates a worker thread is waiting for operations such as hashing or sorting to be granted. A short description of the memory granting flow will shed some light on the way this case was resolved.
Once a query is executed, SQL Server uses the compiled plan to determine how much memory is required. There are several factors in this process, such as degree of parallelism, a need for memory grant, query weight (there are two memory buffers used by memory query, small and regular), the query queue, and the first-in-first-out rule. A query will be placed in a queue in the case of insufficient memory. Each time memory is freed, the next query in queue will be executed.
There are two dynamic views which are most useful in troubleshooting cases such as this:
- sys.dm_exec_query_resource_semaphoresThis dynamic view returns information about the status of the current query-resource semaphore. It is most useful for finding the query-memory utilization and for detecting memory pressure, and its result set can be divided into two categories:Memory Utilization – the columns ending with “kb”. The most informative column is total_memory_kb (the sum of the available and granted memory), which in times of memory pressure may exceed the values of the target memory or maximum memory.Memory Pressure – the columns ending with “count”. The column waiter_count provides the number of queries waiting for grants. These queries will have the RESOURCE_SEMAPHORE wait and eventually might be timed out if they are not satisfied.
- sys.dm_exec_query_memory_grantsThis dynamic view returns information only about those queries waiting for memory grants or the ones that have already been acquired with a memory grant. This dynamic view can be divided into three parts: memory utilization, execution status, and additional information:?Memory Utilization – the columns ending with “kb”. These columns can be helpful in understanding the memory characteristics of a query, such as requested memory compared to used memory. The column requested_memory_kb can assist in detecting large memory consumers.Execution Status – columns that provide information on the execution of the queries. The most meaningful columns are queue_id (the id of the query’s queue), wait_order (the position of the query in its queue), is_next_candidate (to be granted with memory) and wait_time_ms.?Additional Information – the columns that describe the query in its context, such as dop (the query’s degree of parallelism).
In the following examples, according to the first row in the upper result set, there are two queries granted with memory (grantee_count = 2) and six queries waiting for memory grants (waiter_count = 6). Session level information is returned in the second result set, where the queue is perfectly visible. Even though the granted memory is less than the total memory, the session is still waiting for memory grant due to the fact there is not enough memory to satisfy its requested memory in the pool.
SELECT [resource_semaphore_id], [total_memory_kb], [granted_memory_kb], [grantee_count], [waiter_count], [pool_id] FROM sys.dm_exec_query_resource_semaphores ORDER BY [waiter_count] DESC
SELECT [session_id], [requested_memory_kb], [queue_id], [wait_order], [is_next_candidate], [wait_time_ms] FROM sys.dm_exec_query_memory_grants WHERE [grant_time] IS NULL ORDER BY [wait_order] DESC
Both dynamic views came with high values in the columns waiter_count, requested_memory_kb, wait_time_ms and wait_order. This information led me to the conclusion there were many queries waiting for memory to be granted and each pending query requested large portions of the memory.
Now it was clear why some packages failed while other light-weight packages and user queries succeeded. From time to time, several SSIS packages executed in parallel, each of which is a large memory consumer. The first few packages were granted with the requested memory, while the later packages didn’t have enough memory in the pool to be granted, so the packages were queued. This was causing a temporary memory pressure and time-outs of packages waiting too long.
This case was resolved using only the dynamic views provided with SQL Server, which emphasizes the power of the DMVs and the need to know how to use them. The DMVs used to resolve this case can be used to create metrics for monitoring and alerting purposes. The query below returns the number of sessions waiting for memory to be granted and calculates memory metrics for the possibility of memory pressure:
SELECT [waiter_count] -- # of queries waiting for grants to be satisfied. , CAST( CASE WHEN ( ([total_memory_kb] - [target_memory_kb] > 0) OR ([total_memory_kb] - [max_target_memory_kb] > 0) ) THEN 1 ELSE 0 END AS bit) AS [potential_pressure] FROM sys.dm_exec_query_resource_semaphores
For further reading, I recommend the excellent article Understanding SQL Server Memory Grant for more information on the mechanism of memory grating by SQL Server.