AVG Disk Transfer goes out of scale randomly

Question

AVG Disk Transfer goes out of scale randomly

agustingarzon

SSC Eights!

Points: 817
More actions
August 28, 2017 at 8:27 am

#325110

Hi there guys.
I bring to you this strange situation where a Disk D would go out of scale (AVG DISK SEC/TRANSFER) for a couple minutes, up to the point that the system freezes, until it finally stops several minutes after: http://imgur.com/a/TWDNh
The average is 0.006 and it peaked at 187, which is 30.000 times higher than ordinary.
Of course during that peak the disk queue goes out of scale.
This doesn't happen often, but it does happen from time to time, it causes a denial of services for several minutes, nothing seems to respond.
That disk is dedicated to a single database, no raid setup, disk Seagate ST2000NM0011, we perform regular index and statistics maintenance, no specific task was executed during the peak, it just seems arbitrary.
No long running queries were detected, by checking the wait statistics, most of the waiting time is DB MIRRORING. This is the principle server, mirroring is async.
We monitor LOG size and it never exceeds ~40% meaning the mirror is in sync, LOG is not growing out of scale.
Is it possible it could be a physical problem ? What would you suggest me to look at ?
Best regards,
Agustin.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

A) So you are running SQL Server database on a single rotating disk? May I suggest you start there as the root cause of your problems? 🙂 And are all files for database on the one disk?

B) Evaluate file IO stalls per file during the period where things are poor. Capture file IO stats, wait for 60 seconds or some such, capture again and diff and divide by time to get ms/IO for both reads and writes per all files on that instance.

C) use sp_whoisactive to determine right-now running processes and their workload. It can also do a differential analysis for things that run for a given interval (10-15 seconds might be good there).

D) 40% log file size is meaningless for us. Is it 40% of 5MB or 40% of 20GB?

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Arsh SSCertifiable Points: 6345 More actions · Answer 2

agustingarzon - Monday, August 28, 2017 8:27 AM
Hi there guys.
I bring to you this strange situation where a Disk D would go out of scale (AVG DISK SEC/TRANSFER) for a couple minutes, up to the point that the system freezes, until it finally stops several minutes after: http://imgur.com/a/TWDNh
The average is 0.006 and it peaked at 187, which is 30.000 times higher than ordinary.
Of course during that peak the disk queue goes out of scale.
This doesn't happen often, but it does happen from time to time, it causes a denial of services for several minutes, nothing seems to respond.
That disk is dedicated to a single database, no raid setup, disk Seagate ST2000NM0011, we perform regular index and statistics maintenance, no specific task was executed during the peak, it just seems arbitrary.
No long running queries were detected, by checking the wait statistics, most of the waiting time is DB MIRRORING. This is the principle server, mirroring is async.
We monitor LOG size and it never exceeds ~40% meaning the mirror is in sync, LOG is not growing out of scale.
Is it possible it could be a physical problem ? What would you suggest me to look at ?
Best regards,
Agustin.

Please specify if this disk is one of the disks associated with this database or its the only disk where all the files belonging to this database are placed (as Kevin asked).

agustingarzon SSC Eights! Points: 817 More actions · Answer 3

Hey guys.
We have this disk servicing only one database (logs and data). And it's this disk that goes out of scale randomly (not very often)
We have a few other databases and system databases on different disks and setups.
The disk pattern is extraordinary, I thought it would mean something: http://imgur.com/a/43PXf
I highlighted in red some gaps, I have only seen those gaps on desktop computers during a system hang for a few seconds.
How is it possible that disk activity could spike to the point of cause an overall system denial of service? On those times this database is not doing maintenance, backup, or running lengthy or IO intensive queries (per monitoring with profiler).

I want to adhere to the idea that the disk is not good, or good enough, but what could possibly explain a system DOS by a disk activity spike ? I guess it's just another of those mysteries we have to live with.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

Numerous things can result in a huge bottleneck on a disk. They include:
Antivirus, backups, snapshot access, controller (or other shared IO component) congestion due to non-problematic-disk load, driver/firmware bug anywhere in IO stack, etc.
Also, you are measuring disk sec/transfer. What if the transfer size rocketed up to some massive value?

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service