IO Performance Problem

Question

Post reply

IO Performance Problem

huntersimon

Old Hand

Points: 394
More actions
February 10, 2017 at 6:58 am

#340469

We have a periodic problem on our sql server that I'm totally stumped on. This is on SQL Server 2012, clustered server, attached to a SAN. All latest updates and patches are installed. Every so often (maybe once a month totally random) for multiple hours the Disk IO will start spiking on and off. This has no correlation to load on the server. The IO will spike for roughly 90 seconds and then go back down for about the same period of time and repeat this over and over until it just stops. Failing over the cluster and restarts have no affect on the problem. I've narrowed it down to one database on one drive. This particular db has a lot of blob objects and is our largest db (2 TB) . When montioring perf mon the Avg. Disk sec/Transfer for this drive spikes up from the normal .010 to around 0.8. It will stay up around 0.8 for 90 seconds and then drop back to normal. We have checked all the SAN hardware even switching to totally new hardware to try to isolate but it doesn't fix it. Eventually the problem just goes away as mysteriously as it appears. I'm totally stumped any random thoughts appreciated, thanks.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 1

Anything in the default trace or sys.dm_exec_requests when this is happening?
😎

huntersimon Old Hand Points: 394 More actions · Answer 2

No it is completely unrelated to any sql transactions, load is unrelated. It spikes for about 90 seconds then goes back down for about the same time rinse and repeat. I've already checked for any scheduled processes, db and application.

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 3

huntersimon - Friday, February 10, 2017 7:27 AM
No it is completely unrelated to any sql transactions, load is unrelated. It spikes for about 90 seconds then goes back down for about the same time rinse and repeat. I've already checked for any scheduled processes, db and application.

What about the default trace?
😎

huntersimon Old Hand Points: 394 More actions · Answer 4

We traced the db wide open for over an hour while the problem persisted and found nothing unusual.

adb2303 SSCertifiable Points: 7588 More actions · Answer 5

Windows updates, anti virus scans, monthly reports? Just thinking outside of the box...

What type of SAN do you have? Anything going on like adaptive optimization / tiering?

huntersimon Old Hand Points: 394 More actions · Answer 6

adb2303 - Friday, February 10, 2017 7:35 AM
Windows updates, anti virus scans, monthly reports? Just thinking outside of the box...
What type of SAN do you have? Anything going on like adaptive optimization / tiering?

Its an HP p2000 g3 controller. We have restarted all the controllers and everything and eventually we installed all new SAN hardware and moved one db to its own drive. This has isolated the problem to that drive so we feel we have eliminated all the SAN possibilities. I'll look into adaptive optimization / tiering that is not something I've researched.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

Use sp_whoisactive in differential mode to see the cumulative amount of work done by spids that run for the duration. I might try 30 or 60 seconds. Also use it in interactive mode in another spid to see real-time work. Be sure to include system spids.

I would also do a profiler trace to see if I can capture things doing a lot of IO. The problem is that it could be bunch of little things that a large-IO filter wouldn't catch. You can find this by doing an aggregate profiler trace analysis. Perhaps something small is getting called iteratively for some reason.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Joie Andrew One Orange Chip Points: 27360 More actions · Answer 8

Is this a physical server or a VM? If a VM is there anything on the VM host layer going on during that timeframe (VM snapshots, vMotions, storage migrations, etc.)?

Joie Andrew
"Since 1982"

huntersimon Old Hand Points: 394 More actions · Answer 9

Its physical, problem has vanished again, still didn't figure out the root cause. Thanks for the ideas.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 10

huntersimon - Friday, February 17, 2017 6:09 AM
Its physical, problem has vanished again, still didn't figure out the root cause. Thanks for the ideas.

Does EVERY piece of hardware in the server and along the ENTIRE IO stack have updated drivers and firmware? I have seen some VERY bad things due to both of those issues.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 11

Haven't checked to see what the others have said but do you , by any chance, have automatic updates or automatic downloads turned on in Windows or your anti-virus software? If so, you need to turn that off.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110206 More actions · Answer 12

Could it be caused by DB file auto-grow?

How much free space do you have in the files?

Do you shrink DB files overnight?

_____________
Code for TallyGenerator

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

Sergiy - Tuesday, February 21, 2017 1:45 AM
Could it be caused by DB file auto-grow?How much free space do you have in the files?Do you shrink DB files overnight?

DOH!! I HATE it when I miss the obvious possibilities (file growth)!! :hehe:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service