May 4, 2010 at 10:10 am
We have an older (7 yrs) HP ML350 server that has 2k adv + SQL 2k. 4 proc, 4GB ram, DB is located on a raid 5 array, 133 GB with approx 40% free space. Largest table (main app table) is 35 GB, log is 14 GB, so the lion's share. Not the optimal arrangement for a sql deployment, but it's what was delivered and has been running fine.
The thing's been humming along fine for years, occasional hiccup or two, but nothing major. Perfmon numbers well within the norm for SQL. We are replacing it with a whole new setup and app upgrade in a few months, but all of a sudden we are now seeing an issue with a performance spike that hangs the app for approx 80 seconds and it happens approx every 20-25 minutes during peak hours. ALL and I mean virtually all metrics are still at the normal range and haven't varied much except for avg disk queue length and corresponding avg disk queue write length. The thing runs normally < 1, but will spike out to 50 to 100+ (that's with the 100.0 factor in perfmon) and we then see the clients app freeze for 80 seconds and then resume. This is enterprise wide and reproduces itself with regularity.
I've tried looking at other counters and nothing jumps except disk queue. SQL profiler doesn't show anything unusual, no repeating items when this occurs. No sql jobs run concurrently with these disk events. Process explorer doesn't show any unusal disk allocation from any proces during these spikes. Nothing in the event log except the alerts I put in for when disk queue length goes up. When it happens there are 8 10 second interval entries every time.
I'm at a loss as to what else I can look for to find the culprit for this oddity. I/O bottleneck seems to be the issue that's pointed at with high disk queues but other things not too high so we're expanding our array with another drive currently, just waiting for it to finish building. But these problems were going on before we installed another disk, so the build process isn't causing them. But why not high disk queues constantly? Why do they run very low then suddenly spike for a minute+ and then subside back to normal levels? Any obscure counters I could captuer that might match up with the disk queue numbers?
Viewing post 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply