I am having a rather challenging issue that I am hoping to get some help on.
In the last 3 weeks, we have 3 instances where the server decided to take a puke and start stumbling along like a drunk. Slow moving, huge waits, etc. Currently, we haven't been able to figure out what the issues are. Sadly, restarting the issues resolves the issues for 5-7 days before it crops up again. What I do know is below. I just need help figuring out some different directions to look next or some really good idea to figure out what is going on.
On 4/10, 4/17, and 4/22, the SQL 2005 SP4 64 instance simply starts to drag.
- No blocking
- Light waits on TempDB GAM/SGAM pages, but very minimal
- 60+ percent of total instance waits (once the issue starts to occur) are sos_scheduler_yeilds. Prior, it is 15-20%. During a single hour last night this reached 31k seconds of signal waits and over 7m task waiting on it.
- Sys.dm_os_schedulers go from a normal running value of 0-1 to 3-5.
Top 2 users of CPU per Confio come from the same TVF. Both are deletes. The TVF is heavily used, often being called 30K/hour
Server Specs and Config:
SQL 2005 SP4 Standard Ed 64 bit
Server 2008 R2
DAS Box with 2 SAS drives for OS and Backups
32 SSD’s for the remaining drives
128GB of ram with max server memory set to 100GB
2 x 6 core processors for a total of 24 threads with hyper threading enabled.
MAXDOP = 1
TempDB, Indexes, Data, and Logs are all on separate drives
Total of 14 TempDB files
Replicate (transactional) 95% of all reporting to reporting system. Only 1 large processes runs on the server, but was not running when the issues began.
From what I have read, when we start to have running_tasks from sys.dm_os_schedulers continually above 2-3, that it is a sign of CPU pressure. However, I don't see it on the CPU counters.
Regardless, anyone got any ideas on what might be going on or what to look at next?