The PerfMon Counters That Just Won’t Die
One of the things that’s simultaneously great and horrible about the Internet is that once something gets posted out in the ether, it basically never goes away. (Some day, politicians will realize this. We can easily fact check their consistency). Because of longevity of content posted to the Internet, a lot of performance tuning topics become “zombies”. We shoot ‘em in dead, but they keep coming back!
In other words, those old recommendations were
a suggested best practices for long ago, for a specific version of SQL Server, but are now inappropriately for the newer version. It’s not uncommon for me, when speaking at a conference, to encounter someone who’s still clinging to settings and techniques which haven’t been good practice since the days of SQL Server 2000. Here’s an example of Microsoft SQL Server 2000 Best Practices that are very version-specific
So here’s an example. The %Disk Time counter and the Disk Queue Length were heavily recommended as a key performance indicator for IO performance. SQL Server throws a lot of IO at the disks using scatter/gather to maximize the utilization of the disk-based IO subsystem. This approach leads to short bursts of long queue depths during checkpoints and readaheads for an instance of SQL Server. Sometimes the server workload is such that your disk can’t keep up with the IO shoved at it and when that happens, you’ll see long queue lengths too. The short burst scenario isn’t a problem. The lengthening queue length scenario usually is a problem. So is that a good practice?
In a word, not-so-much.
Those counters can still be of some use on an instance of SQL Server which only has one hard disk drive. But that’s exceedingly rare these days. Why?
The PerfMon counter %Disk time is a bogus performance metric for several reasons. It does not take into account asynchronous I/O requests. It can’t tell what the real performance profile is for an underlying RAID set may be, since they contain multiple disk drives. The PerfMon counter Disk Queue Length is also mostly useless, except on SQL Server’s with a single physical disk, because the hard disk controller cache obfuscates how many IO operations are actually pending on the queue or not. In fact, some hard disks even have tiny write caches as well, which further muddies the water was to whether the IO is truly queued, in a cache somewhere between the OS and the disk, or has finally made it all the way to the CMOS
on the disk.
Better IO PerfMon Counters
Instead of using those PerfMon counters, use the Ave Disk Reads /sec, Avg Disk Write /sec, and Avg Disk Transfers/sec to track the performance of disk subsystems. These counters track the average number of read IOs, write IOs, and combined read and write IOs to occured in the last second. Occassionally, I like to track the same metrics by volume of data rather than the rate of IO operations. So, to get that data, you may wish to give these volume-specific PerfMon counters a try: Avg Disk Transfer Bytes/sec, Ave Disk Read Bytes /sec, and Avg Disk Write Bytes/sec
For SQL Server IO Performance, Use Dynamic Management Views (DMV)
And unless you’ve been living in a cave, you should make sure to use SQL Server’s Dynamic Management Views (DMVs) to check on IO performance for recent versions of SQL Server. Some of my favorite DMV’s for IO include:
So how are you tracking IO performance metrics? Which ones are you using?
I look forward to hearing back from you!