Yesterday, I blogged about basic I/O usage with some useful scripts, DMVs, and links to more in-depth details on examining these statistics.
Since my blog was blasted to many LinkedIN SQL Server groups, I received many, many of your positive comments and feedback - and for that, I/O you a big thanks! :-)
(Please post all comments directly to the Blog, so all Internet travelers can see your thoughtful comments.)
I wanted to follow-up with some ways to help identify I/O bottlenecks, performance tuning, tips and troubleshooting.
First using performance monitor, here are the two I/O related counters that can help you understand if there is any I/O bottleneck:
PhysicalDisk(_Total)\Avg. Disk Queue Length
- if this counter is more than Avg. 2 per disk (or spindle if using SAN) then there is an issue
PhysicalDisk(_Total)\Avg. Disk sec/Transfer
- if this is counter value is more than 15 milli seconds this also indicates I/O issues.
One I/O related error message that you may have often seen and wondered about in the SQL Error Log is:
'SQL Server has encountered n occurrence(s) of I/O requests taking longer than 15 seconds to complete on file <filename> in database <dbname>.'
I believe this was added to SQL Server 2005 SP2, to help identify issues that are affecting SQL Server's performance, as related to a poor disk subsystem, misconfiguration, driver/HW problem connection between the server and the SAN. It could also indicate fragmentation, excessive I/O requests not being handled properly by the disk, or data files not optimally placed on disk.
The reason why it is so important to understand I/O performance, especially for DBA's, is that it is often misunderstood by those who are NOT DBA's. System and SAN adminstrators are often quick to blame the database server, when in actuality, SQL Server is simply being helpful in identifying a potential Disk/HW problem. And, managers are more apt to viscerally accept ther analysis over the DBA. So, the more you understand about this, the more knowledge you will have in leading the Systems folks to the real underlying problem.
This in fact, happened recently at a client of mine, and it took some serious caucusing and long meeting hours to prove that it was in fact a disk problem. I identified the cause right away, but skeptics needed more convincing and disk and SAN diagnostic tests. This "fiddling while Rome burns" technique caused several incidents of crashing clusters going down at regular intervals. Turns out, because of a flawed driver and improper configuration, the subsystem could not keep up with the I/O requests generated from the SQL Server.
I have found this excellent article/link, that delves into the origins and causes of the infamous 'SQL Server has encountered n occurrence(s) of I/O requests taking longer than...' and worth adding to your DBA favorites list:
Continue monitoring and watching this blog space for more exciting info and helpful tips!
I will begin running a regular feature called: 'MVP Thoughts of the Day' - great tips, info and insights from known industry sql gurus, experts, and MVPs on all things relating to SQL Server.
Any of you MVP's out there, feel free to send me your comments and thoughts at email@example.com, and I'll post them right here. Thank you all for reading and your contributions!
Try SQLCentric at: