In the world of SQL Server, a process is either running, runnable, or waiting. A running process is, obviously, running; it’s the one getting CPU time. A runnable process has everything it needs to run, it’s just waiting for its turn on the CPU. A process that’s waiting, however, is one that needs some resource before it can continue. That resource might be I/O, memory, network, or it might be waiting for another thread to finish its own work. By knowing what waits are prevalent in your system, you can pinpoint where bottlenecks exist and fix them.
What are you waiting for?
So how do you see the predominant wait types in your instance? Glenn Berry (blog | @GlennAlanBerry) has written a great query to list your top waits since the last instance restart or statistics clear. Not only does it rank waits in descending order of occurrance, it also excludes those insignificant waits that can clutter up your output.
-- Isolate top waits for server instance since last restart or statistics clear WITH Waits AS (SELECT wait_type, wait_time_ms / 1000. AS wait_time_s, 100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct, ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn FROM sys.dm_os_wait_stats WHERE wait_type NOT IN ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK' ,'SLEEP_SYSTEMTASK','SQLTRACE_BUFFER_FLUSH','WAITFOR', 'LOGMGR_QUEUE','CHECKPOINT_QUEUE' ,'REQUEST_FOR_DEADLOCK_SEARCH','XE_TIMER_EVENT','BROKER_TO_FLUSH','BROKER_TASK_STOP','CLR_MANUAL_EVENT' ,'CLR_AUTO_EVENT','DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT' ,'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP')) SELECT W1.wait_type, CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s, CAST(W1.pct AS DECIMAL(12, 2)) AS pct, CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct FROM Waits AS W1 INNER JOIN Waits AS W2 ON W2.rn <= W1.rn GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct HAVING SUM(W2.pct) - W1.pct < 99 OPTION (RECOMPILE); -- percentage threshold
The thing to keep in mind is that these stats get reset every time you restart your instance. So if you want to track them over time, you’ll need to store the output in a table somewhere. You can also reset the statistics using the DBCC SQLPERF command:
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);
This is handy when you need to measure the impact of changes to your environment. For example, if you want to see how wait stats change when parallelism is turned on or off.
Common wait types
So now that you know what your top waits are, what do they actually mean? Let’s go over some of the most common waits in your average SQL Server environment.
CXPACKET waits are typical in environments using parallelism, like data warehouses. A session is split into multiple threads, each assigned a subset of the work. Inevitably, some threads will finish faster than others, and those threads are left to wait for the others to finish before the session can move on. Those waits are CXPACKET waits. There’s some debate over how much of a problem CXPACKET waits really are. Some folks think there’s nothing inherently wrong with them. But there’s also the argument that having too many threads waiting on CXPACKETs are a waste of CPU cycles. They can also mask an underlying problem, what’s holding up the other threads? If you think CXPACKET waits might be a problem in your system, consider lowering the max degree of parallelism in you instance or session, and/or raising the parallelism threshold.
PAGEIOLATCH_XX waits are used to synchronize disk to memory page transfers. When a thread needs a data page that’s not currently in memory, it obtains a page in the buffer cache and places a latch on that buffer page while it waits for the page to be read from disk. Once the page is written to the cache, the latch is released. The XX part of the wait type indicates the mode of the latch. SH is shared mode, EX is exclusive, UP is update mode, etc. Long PAGEIOLATCH_XX waits are usually indicative of IO subsystem pressure.
ASYNC_NETWORK_IO waits happen when a session is blocked by a process behind the network. This wait type can indicate network problems, but more often it simply means that the client process or application can’t keep up with the results that SQL Server is sending it. So SQL Server has to wait until the client is ready to handle more data.
A resource semaphore is a mechanism responsible for handling memory requests in SQL Server. If there isn’t enough memory available to satisfy a request, a RESOURCE_SEMAPHORE wait occurs. This could be caused by insufficient memory on the system overall, or it could be an issue with a really poorly written query and (ahem!) bad settings in resource governor. Keep an eye out for memory-intensive queries that are doing large sorts or hashes, or a high number of concurrent queries.
SQL Server threads are quite well-mannered. Sometimes a thread has all the resources it needs and is ready for its time on the processor. In other words, the thread is runnable. While the thread may be ready to go, SQL Server might not be. The SQLOS runs threads on schedulers, and there is a finite number of them. If all of the schedulers are busy and a new thread becomes runnable, one of the running threads will voluntarily yield its scheduler time to that new process. When this happens a SOS_SCHEDULER_YIELD occurs. A high number of these waits are a good indication of CPU pressure on your system. An obvious solution to the situation would be to increase the number of processors on your server, but we know this isn’t always possible. Another possible solution is to find your most cpu-intensive queries and tune them.
The SQL Server Wait Type Repository
Pinal Dave’s Waits and Queues series
Everything I Needed to Know About Waits and Queues I Learned From the TSA