Database servers have to wait on different resources, and these waits are huge to the performance of SQL Server. Sometimes something changes without our knowledge and is running differently than normal, this is the first place you look to see what’s going on. On the other side, you may have made a change that could have been anything from increasing memory, tuning a query, moving to different hardware, changing indexes, etc.. All of these things should reduce waits, but how effective were they?
The trick is to know what things looked like before so you have something to compare it to. If you know what it looked like before then you’ll know how much of a difference you made.
My Default Monitoring Message
I’ll copy/paste this section on multiple posts, so feel free to skip it if you’ve seen it before.
The biggest issue is the word “before”. If you weren’t watching your servers before then you have nothing to compare the current state of your server to. Also, if you’re watching it consistently then you’ll be able to make an apples to apples comparison.
Your production servers should have some kind of monitoring software in place, and the monitoring software is already capturing this type of information to help make all the pretty charts and graphs you see through the GUI. You need this software to run so you can get alerted when things go wrong, and pulling this data will just make it more valuable.
Monitoring software doesn’t perform magic, so you can pull this data on your own if you don’t have any. However, that’s taking on a lot of responsibility, time, and effort. If something is collecting the data already, take advantage of that.
For me, I use Idera Diagnostic Manager, so my queries are going to look at the data collected using this tool. Other monitoring tools are going to collect the same basic data in a database you can read.
If anyone from one of these companies wants to provide me with scripts that fill the same need, I’ll make a write up on them and give you credit for being awesome supporting your product like that. I can only write against what I can test against, it’s nothing personal.
The Script
There are 3 parameters at the start, and that’s typically all I ever change. A couple things stick out with these parameters. The times are in UTC, so I default to GetUTCDate(). I want the time range to be at least an hour so I’m not letting a single query that ran skew the results too much. Then I’m also converting my instance name to upper case because Idera DM may very well be your only database that’s case sensitive, so I cheat by putting everything in upper case.
My upper case trick isn’t a recommended trick. It causes scans and possible bad joins. In this case I know it’s against small tables and that I don’t have two servers with the same name using different cases. It’s safe here, but not everywhere.
DECLARE @StartTime DateTime , @EndTime DateTime , @InstanceName sysname SET @EndTime = GetUTCDate() SET @StartTime = DateAdd(Hour, -24, @EndTime) SET @InstanceName = UPPER('Server\Instance') IF OBJECT_ID('tempdb..#IgnoredWaits') IS NOT NULL BEGIN DROP TABLE #IgnoredWaits END IF OBJECT_ID('tempdb..#Now') IS NOT NULL BEGIN DROP TABLE #Now END IF OBJECT_ID('tempdb..#Yesterday') IS NOT NULL BEGIN DROP TABLE #Yesterday END IF OBJECT_ID('tempdb..#LastWeek') IS NOT NULL BEGIN DROP TABLE #LastWeek END IF OBJECT_ID('tempdb..#TwoWeeks') IS NOT NULL BEGIN DROP TABLE #TwoWeeks END CREATE TABLE #IgnoredWaits (Wait_Type NVarChar(60) PRIMARY KEY) --Values taken from Paul Randal of SQLskills --http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/ INSERT INTO #IgnoredWaits SELECT N'BROKER_EVENTHANDLER' UNION SELECT N'BROKER_RECEIVE_WAITFOR' UNION SELECT N'BROKER_TASK_STOP' UNION SELECT N'BROKER_TO_FLUSH' UNION SELECT N'BROKER_TRANSMITTER' UNION SELECT N'CHECKPOINT_QUEUE' UNION SELECT N'CHKPT' UNION SELECT N'CLR_AUTO_EVENT' UNION SELECT N'CLR_MANUAL_EVENT' UNION SELECT N'CLR_SEMAPHORE' UNION SELECT N'DBMIRROR_DBM_EVENT' UNION SELECT N'DBMIRROR_EVENTS_QUEUE' UNION SELECT N'DBMIRROR_WORKER_QUEUE' UNION SELECT N'DBMIRRORING_CMD' UNION SELECT N'DIRTY_PAGE_POLL' UNION SELECT N'DISPATCHER_QUEUE_SEMAPHORE' UNION SELECT N'EXECSYNC' UNION SELECT N'FSAGENT' UNION SELECT N'FT_IFTS_SCHEDULER_IDLE_WAIT' UNION SELECT N'FT_IFTSHC_MUTEX' UNION SELECT N'HADR_CLUSAPI_CALL' UNION SELECT N'HADR_FILESTREAM_IOMGR_IOCOMPLETION' UNION SELECT N'HADR_LOGCAPTURE_WAIT' UNION SELECT N'HADR_NOTIFICATION_DEQUEUE' UNION SELECT N'HADR_TIMER_TASK' UNION SELECT N'HADR_WORK_QUEUE' UNION SELECT N'KSOURCE_WAKEUP' UNION SELECT N'LAZYWRITER_SLEEP' UNION SELECT N'LOGMGR_QUEUE' UNION SELECT N'ONDEMAND_TASK_QUEUE' UNION SELECT N'PWAIT_ALL_COMPONENTS_INITIALIZED' UNION SELECT N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP' UNION SELECT N'QDS_SHUTDOWN_QUEUE' UNION SELECT N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP' UNION SELECT N'REQUEST_FOR_DEADLOCK_SEARCH' UNION SELECT N'RESOURCE_QUEUE' UNION SELECT N'SERVER_IDLE_CHECK' UNION SELECT N'SLEEP_BPOOL_FLUSH' UNION SELECT N'SLEEP_DBSTARTUP' UNION SELECT N'SLEEP_DCOMSTARTUP' UNION SELECT N'SLEEP_MASTERDBREADY' UNION SELECT N'SLEEP_MASTERMDREADY' UNION SELECT N'SLEEP_MASTERUPGRADED' UNION SELECT N'SLEEP_MSDBSTARTUP' UNION SELECT N'SLEEP_SYSTEMTASK' UNION SELECT N'SLEEP_TASK' UNION SELECT N'SLEEP_TEMPDBSTARTUP' UNION SELECT N'SNI_HTTP_ACCEPT' UNION SELECT N'SP_SERVER_DIAGNOSTICS_SLEEP' UNION SELECT N'SQLTRACE_BUFFER_FLUSH' UNION SELECT N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP' UNION SELECT N'SQLTRACE_WAIT_ENTRIES' UNION SELECT N'WAIT_FOR_RESULTS' UNION SELECT N'WAITFOR' UNION SELECT N'WAITFOR_TASKSHUTDOWN' UNION SELECT N'WAIT_XTP_HOST_WAIT' UNION SELECT N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG' UNION SELECT N'WAIT_XTP_CKPT_CLOSE' UNION SELECT N'XE_DISPATCHER_JOIN' UNION SELECT N'XE_DISPATCHER_WAIT' UNION SELECT N'XE_TIMER_EVENT' SELECT s.InstanceName , t.WaitType , WaitTime_Minutes = SUM(WaitTimeInMilliseconds)/1000/60 , dStart = Min(UTCCollectionDateTime) , dEnd = Max(UTCCollectionDateTime) INTO #Now FROM SQLdmrepository..WaitStatisticsDetails wsd INNER JOIN SQLdmrepository..WaitTypes t ON wsd.WaitTypeID = t.WaitTypeID INNER JOIN SQLdmrepository..WaitStatistics ws ON ws.WaitStatisticsID = wsd.WaitStatisticsID INNER JOIN SQLdmrepository..MonitoredSQLServers s ON ws.SQLServerID = s.SQLServerID WHERE ws.UTCCollectionDateTime BETWEEN @StartTime and @EndTime ANDUPPER(s.InstanceName) = @InstanceName AND t.WaitType NOT IN (SELECT Wait_Type COLLATE SQL_Latin1_General_CP1_CI_AS FROM #IgnoredWaits) --AND t.WaitType LIKE 'PAGEIO%' GROUP BY s.InstanceName, t.WaitType SELECT @StartTime = @StartTime - 1 , @EndTime = @EndTime - 1 SELECT s.InstanceName , t.WaitType , WaitTime_Minutes = SUM(WaitTimeInMilliseconds)/1000/60 , dStart = Min(UTCCollectionDateTime) , dEnd = Max(UTCCollectionDateTime) INTO #Yesterday FROM SQLdmrepository..WaitStatisticsDetails wsd INNER JOIN SQLdmrepository..WaitTypes t ON wsd.WaitTypeID = t.WaitTypeID INNER JOIN SQLdmrepository..WaitStatistics ws ON ws.WaitStatisticsID = wsd.WaitStatisticsID INNER JOIN SQLdmrepository..MonitoredSQLServers s ON ws.SQLServerID = s.SQLServerID WHERE ws.UTCCollectionDateTime BETWEEN @StartTime and @EndTime ANDUPPER(s.InstanceName) = @InstanceName AND t.WaitType NOT IN (SELECT Wait_Type COLLATE SQL_Latin1_General_CP1_CI_AS FROM #IgnoredWaits) --AND t.WaitType LIKE 'PAGEIO%' GROUP BY s.InstanceName, t.WaitType SELECT @StartTime = @StartTime - 6 , @EndTime = @EndTime - 6 SELECT s.InstanceName , t.WaitType , WaitTime_Minutes = SUM(WaitTimeInMilliseconds)/1000/60 , dStart = Min(UTCCollectionDateTime) , dEnd = Max(UTCCollectionDateTime) INTO #LastWeek FROM SQLdmrepository..WaitStatisticsDetails wsd INNER JOIN SQLdmrepository..WaitTypes t ON wsd.WaitTypeID = t.WaitTypeID INNER JOIN SQLdmrepository..WaitStatistics ws ON ws.WaitStatisticsID = wsd.WaitStatisticsID INNER JOIN SQLdmrepository..MonitoredSQLServers s ON ws.SQLServerID = s.SQLServerID WHERE ws.UTCCollectionDateTime BETWEEN @StartTime and @EndTime ANDUPPER(s.InstanceName) = @InstanceName AND t.WaitType NOT IN (SELECT Wait_Type COLLATE SQL_Latin1_General_CP1_CI_AS FROM #IgnoredWaits) --AND t.WaitType LIKE 'PAGEIO%' GROUP BY s.InstanceName, t.WaitType SELECT @StartTime = @StartTime - 7 , @EndTime = @EndTime - 7 SELECT s.InstanceName , t.WaitType , WaitTime_Minutes = SUM(WaitTimeInMilliseconds)/1000/60 , dStart = Min(UTCCollectionDateTime) , dEnd = Max(UTCCollectionDateTime) INTO #TwoWeeks FROM SQLdmrepository..WaitStatisticsDetails wsd INNER JOIN SQLdmrepository..WaitTypes t ON wsd.WaitTypeID = t.WaitTypeID INNER JOIN SQLdmrepository..WaitStatistics ws ON ws.WaitStatisticsID = wsd.WaitStatisticsID INNER JOIN SQLdmrepository..MonitoredSQLServers s ON ws.SQLServerID = s.SQLServerID WHERE ws.UTCCollectionDateTime BETWEEN @StartTime and @EndTime ANDUPPER(s.InstanceName) = @InstanceName AND t.WaitType NOT IN (SELECT Wait_Type COLLATE SQL_Latin1_General_CP1_CI_AS FROM #IgnoredWaits) --AND t.WaitType LIKE 'PAGEIO%' GROUP BY s.InstanceName, t.WaitType SELECT TOP 10 n.InstanceName , n.WaitType , Now_Minutes = n.WaitTime_Minutes , Yesterday_Minutes = d1.WaitTime_Minutes , OneWeek_Minutes = d7.WaitTime_Minutes , TwoWeeks_Minutes = d14.WaitTime_Minutes FROM #Now n LEFT JOIN #Yesterday d1 ON n.WaitType = d1.WaitType LEFT JOIN #LastWeek d7 ON n.WaitType = d7.WaitType LEFT JOIN #TwoWeeks d14 ON n.WaitType = d14.WaitType ORDER BY n.WaitTime_Minutes DESC
The Results
The results pretty much speak for themselves. What were the top 10 things you waited on recently and how much did you wait on them in the past. It’s great to have a view of it where you’re comparing totals side-by-side. Seeing the same thing in charts just doesn’t have the same effect for me.
These numbers are going to have natural variances to them. Did a user run a couple large queries during one interval but not another? Was the shared storage under different loads? Really anything can come up. The longer the period the more you smooth out these numbers.
So, the real question is what do these waits mean? I’m not going to pretend like I can cover all the wait stats, their meanings, and how to affect them in this post. What I can say is that there are good resources out there for you.
Paul Randal (b|t) has the best source I know of on his post Wait Statistics, or please tell me where it hurts.
CXPACKET
One wait type I like to talk about, and discredit a bit, is CXPACKET. This wait type is just saying you used parallelism, nothing more. I first covered this topic in my post Why worry about CXPACKET.
Think of it in terms of a manager who delegates a large project to multiple employees. As the employees are hard at work accomplishing the task, the manager goes about their day and records their time as CXPACKET. The task is typically completed in less overall duration because it was a group effort, but the total amount of work done goes up some because there’s overhead and small duplicated tasks when you delegate to a group.
That less overall duration could make or break your application, so the idea of having CXPACKET isn’t necessarily bad (setting MAXDOP = 1 is rarely a good idea). However, a task being so large that it needed to be delegated to multiple employees may mean it’s not efficient, making query and index tuning a good approach. Also, your manager may be too eager to delegate their tasks, so possibly adjust your Cost Threshold for Parallelism as suggested in Jeremiah Peschka’s (b|t) post Five SQL Server Settings to Change.
As I said before, I can’t cover all the wait types here, so go to Paul Randal’s post I suggested above for a better list.
What I Skipped
This is looking specifically at WaitTimeInMilliseconds, which is one of two technically correct things to do. It does not look at ResourceWaitTimeInMilliseconds, which is the other technically correct thing to do. I wrote it one way and was happy with the results so I didn’t change it or clutter it with returning both.
Here’s the difference so you can decide if you want to change it or talk me into changing mine. It’s a conversation I’d enjoy…I’m strange like that.
ResourceWaitTimeInMilliseconds is how long you’re waiting on your actual resource. If you’re waiting for a lock, this is how long you waited for that lock to be made so you could get back in line for CPU time.
WaitTimeInMilliseconds also includes the time from when you get in line for CPU time until you’re using the CPU. This additional time is called your signal wait time.
The reason I like using WaitTimeInMilliseconds is that the reason you gave up the CPU and had to get back in line is due to the fact you weren’t ready due to what you were waiting on. In that way, it’s the whole picture.
The argument going the other way is that you were only waiting for the resource. It’s not the resource’s fault there was a line all queued up at the CPU when it got there. Why should I tack on extra time on a non-CPU wait for that?
I’m undecided on this part, but having the info both ways allows you to figure out the signal wait time on your entire instance and see if you have too much CPU contention. There’s no arguing about that being a good thing to know.
Steal my query, make it your own, and write that part the way that works best for you.
When Idera Finds Out What I’m Doing….
Someone asked me for queries pulling from Diagnostic Manager, and I posted on Twitter when I made my Idera Diagnostic Manager Scripts page. The outcry and rage from this was…very positive.
Vicky Harp (t), who is now Idera’s Corporate Strategist wrote pieces of Diagnostic Manager and its schema, took time out of her day to meet with me and talk about the details of querying the data they collect.
Scott Stone (t) who is Diagnostic Manager’s Product Manager also jumped in very quickly to see if I’d like to be a beta tester for them, bringing real power users into the loop to make their products better.
This is the level of support you want from your monitoring software.
While I don’t officially endorse or recommend any specific software at this time, I do recommend you look for responses like this.