Thanks for the link SpringTown... I hadn't found that article. I have opened an incident with MSFT and have yet another meeting with the storage gang today. The pressure is on ME to find what is causing this since users come to me complaining about performance issues.
Ah, the joys of dealing with storage people....
I highly recommend Brent Ozar's webcast: "How to Prove It's a SAN Problem" http://www.brentozar.com/archive/2011/08/how-prove-its-san-problem-webcast-video/
Also, I'd recommend capturing some windows perfmon counters while the problem is occuring.
This is my list for troubleshooting IO issues:
(N: and P: are the SAN volumes that host DB files)
\SQLServer:Buffer Manager\Buffer cache hit ratio
\SQLServer:Buffer Manager\Checkpoint pages/sec
\SQLServer:Buffer Manager\Database pages
\SQLServer:Buffer Manager\Free list stalls/sec
\SQLServer:Buffer Manager\Free pages
\SQLServer:Buffer Manager\Lazy writes/sec
\SQLServer:Buffer Manager\Page life expectancy
\SQLServer:Buffer Manager\Page lookups/sec
\SQLServer:Buffer Manager\Page reads/sec
\SQLServer:Buffer Manager\Page writes/sec
\SQLServer:Buffer Manager\Readahead pages/sec
\SQLServer:Buffer Manager\Reserved pages
\SQLServer:Buffer Manager\Stolen pages
\SQLServer:Buffer Manager\Target pages
\SQLServer:Buffer Manager\Total pages
Capture at 1 second intervals, (capture only while needed).
Major things to look for, Disk Reads/Sec, Disk Writes/Sec (IOPS) and how that affects latency ( Avg Sec/Write, Avg Sec/Read )
And, what is driving those IOPS (Lazy writes, checkpoints, etc)