I apologize for such a general question, and there is not enough info for a detailed answer, but ANY general suggestions would be welcome.
We're running an SQL2008R2 instance on a VM with 12 cores and 32G ram. For years a number of nightly TSQL jobs have been run, each one consisting of multiple steps, each step (generally relatively small select queries and update steps) normally runs from a few mS to maybe 2 minutes. It has been in place for years.
This week, the jobs suddenly are not complete in the morning. What is happening is that only one or two of the many steps during the night unexpectedly take an extraordinarily long time to run (a step that might normally be a few seconds to a minute suddenly takes an hour or two).
* It's affecting DIFFERENT steps at DIFFERENT times during the night. Does not seem related to time, or to what kind of step is running.
* During the rest of the night all the other steps take normal times.
* There are no errors in the SQL logs or the Windows logs.
* Monitoring software (Idera diagnostic manager) shows normal levels of locks, wait states, disk access times, memory, CPU etc during these time periods--nothing that would appear to cause such extreme slowdown.
* After an affected step ends, the following steps occur at normal speed like nothing was ever wrong. For example a step that normally takes 50 seconds might suddenly take an hour, but the step before, and the step after seem perfectly normal.
* VM administrator has examined VM logs and sees no evidence of extended processor 'ready' states, memory or disk conflicts, or other VM issues. Since SQL and OS don't "see" such slowdowns, we had theorized that VM issues might be the case.
* Rerunning the job will often complete at a normal speed.
Not sure where to look next. Never saw an issue like this before.
-- FORTRAN manual for Xerox Computers --