CPU Pressure: Confirmation

Question

CPU Pressure: Confirmation

Oohd

SSC Veteran

Points: 243
More actions
June 20, 2012 at 2:43 am

#392561

Hello,
I've recently been asked to look at a server that was running at a minimum of 50-60% CPU with regular spikes to > 90%, sometimes even 100%.
Rather than jump straight to CPU, I took a more 'considered' approach to rule out other things, these are my findings:
(It's a VM, allocated 2 VCPUs with 16GB RAM)
Perfmon Stats
Memory is not under pressure
12GB SQL Server Target Memory
12GB SQL Server Total Memory
>1200 seconds SQL Server Buffer Page Life Expectancy
0 Memory Grants pending
Storage is not under pressure
All disk access (read and rights) < 10ms, sometimes peaking to around 20ms.
CPU is always busy and occasionally under pressure
% Processor Time: Minimum 50%, often higher than 80%
% Privileged time: Maximum 10%, often around 5%
% User time: High, generally the inverse of Privileged time
%Processor time:sqlserver: High, it's definitely SQL Server consuming CPU resource
Processor queue length: Seldom zero, often 3 or 4 with occasional peaks to 8/9. I assumed to see a correlation of non zero with critically high CPU, but alas not which is a little confusing.
Batch Requests/sec: Constantly high, above 500, peaking to 5000, again, no correlation between high batch and high CPU (but query complexity plays a part there right?)
Compilations/sec: Low (IMO), around 1% of batch requests
Recompilations/sec: Very low, rarely above 1!
Little use of cursors.
The wait statistics show a very high occurrence of SOS_SCHEDULER_YIELD, with a signal to wait time ratio of 99%!
The next four most common wait types include (but certainly nowhere near the amount compared to SOS_SCHEDULER_YELD) :
- a highish occurance of ASYNC_NETWORK_IO at 34% signal to wait time ratio
- a little WRITELOG at 13% signal to wait time ratio
- a little CXPACKET (I've found a few parallel plans) but nothing major 7% signal to wait time ratio
- a very low occurance (with 1.4% ratio) of PAGEIOLATCH_SH
Although SOS_SCHEDULER_YIELD is the most prevelant, and constantly increases when querying dm_os_wait_stats, I never see it when querying dm_os_waiting_tasks, this is confusing me ! What's also confusing me is why would the processor yield, even when it's not at full capacity? Does the quantum reduce when the load/Batch Requests/sec increases? (That would explain it?)
The server has many databases which are all transactionally replicated publishers.
I've looked at the most 'expensive' queries from a time/processor cost perspective and the majority are all very, very small queries with good (IMO) plans. There were a couple of missing indexes according to the optimiser, but the indexes that claimed to make the most benefit were for queries that generally run sub second anyway, so it's looking like a case of the volume of concurrent queries, rather than the complexity.
The plan cache is about 3GB, with a high number (30k) adhoc plans compared to about 2k non-adhoc, but that seems to have stabalised, and with a low SQL compilation metric doesn't appear to be an issue.
As a result I've recommended that they up the number (or speed) of the allocated processors as what I have understood from above this all points to under specced CPUs.. But I desparetely need some confirmation.. anything I've missed/should have looked at?
I was hoping to find some VM perfmon counters so I could validate the processor reservations but they were not listed in perfmon.. (?) Power Management on the VM at least is set to high performance - I couldn't check the host, or the bios.
Thanks in advance, sorry for the brain dump and multiple questions. (Slightly out of my depth and in high stress mode!)
Cheers
O

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 1

I wouldn't consider 50-60% to be overly high. It's good usage of resources (personally I don't like my expensive CPUs sitting mostly idle). The spikes are a bit of a concern, see if you can correlate the spikes to some queries.

Also, chapter 3: http://www.simple-talk.com/books/sql-books/troubleshooting-sql-server-a-guide-for-the-accidental-dba/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Oohd SSC Veteran Points: 243 More actions · Answer 2

Thanks for this. I did use that book as input to some of my investigation, I haven't yet correlated any of the queries for a specific point in time, but all the 'expensive' queries that are returned from dm_exec_query_stats run sub-second when the server is not maxed out and have good plans - they are just called many many times.

They have the MDW Data Collectors running so I'm hoping I can use that rather than create a SQL Trace to help correlate properly.

The main concern is the very high number of SOS_SCHEDULER_YIELD waits, in particular the high signal to wait ratio ??

Thanks

O

markyb SSC Rookie Points: 35 More actions · Answer 3

CPU stats as measured within the virtual machine are not accurate. Who knows how much CPU time that machine is really getting on the host environment?

Adding virtual CPUs will probably not help and will make things much worse if the host machine is under heavy load.

If my VM uses 1 vCPU I only need one physical core to be free to do some work.

When the VM has 2 vCPUs I need two free cores to do some work... and so on. As you can see as you add vCPUs windows of opportunity to do work on the host decrease.

I think you need to get your virtual environment admins to let you know how much real CPU time this machine is using. Also see if they can give you a higher CPU priority (100% of 1 or 2 physical cores.) and keep 2 or even drop to 1 vCPU.

I seem to remember that Brent Ozar has done some blogging and webcasts on the multiple virtual CPU issue.

Mark

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 4

Oohd (6/20/2012)
Thanks for this. I did use that book as input to some of my investigation, I haven't yet correlated any of the queries for a specific point in time, but all the 'expensive' queries that are returned from dm_exec_query_stats run sub-second when the server is not maxed out and have good plans - they are just called many many times.

Not uncommon. That's why tuning stuff that runs very often is as or more important than tuning stuff that runs long. What are the wait types involved for those queries when they run slow? (other than scheduler yield)

The main concern is the very high number of SOS_SCHEDULER_YIELD waits, in particular the high signal to wait ratio ??

That's almost 'expected'. The way SOS_Scheduler_Yield works is that the query actually doesn't spend any time on the waiter queue (resource wait time), it goes straight back to the bottom of the runnable queue and waits to be run (signal wait)

I'm not saying that you don't have CPU pressure, it does appear there is some, it just may not be quite as bad as it seems. See if you can tune the frequently run queries. Shaving 5 ms off a query that runs a couple hundred times an hour is likely to have more of an impact on overall performance as shaving 5 sec off something that runs every hour.

Check with the VM admin, make sure that the host isn't over-utilised, check that they are following VMWare's recommendations for hosted SQL Server

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Oohd SSC Veteran Points: 243 More actions · Answer 5

Thanks for this, you've made some real sense here 🙂

I have made a couple of small index changes to the most prevelent 'quick' queries to see if that helps matters - typically, since I've been tasked with looking at the server, the CPU has rarely spiked, and it was only restarted (following a CU) at the weekend so all stats are only a couple of days old. From what I have, there's a low occurrance of CXPACKET and I have found a couple of (quick) queries that run in parallel and is parallel throughout the whole plan - no context switches - there could be more queries I'll need to look harder. There's also a reasonable amount of ASYNC_NETWORK_IO but I've yet to catch either of them in the act.

So the apparent, constant, 'busyness' of the CPU could be just down to the number of requests being asked of SQL Server? With that in mind, I need to try and track down the query or queries that is causing the spike, and determine what waits they are causing, preferably before the next spike !

The VM host is apparently very under utilised and has little on it, apart from SQL Server. I've asked about memory and processor reservation and all things thin-provisioning and still waiting on the details, but they've suggested they are not thin provisioning.

Thank you - any other pointers most welcome 🙂

O

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 6

Oohd (6/20/2012)

So the apparent, constant, 'busyness' of the CPU could be just down to the number of requests being asked of SQL Server? With that in mind, I need to try and track down the query or queries that is causing the spike, and determine what waits they are causing, preferably before the next spike ![/quote

Could well be. Compiles and recompiles are low, so SQL's not sitting compiling unnecessarily, IO is low so that's not using CPU. Could well just be volume of tasks. Can't say for sure though, saying for sure would involve a full analysis of the server.

The VM host is apparently very under utilised and has little on it, apart from SQL Server. I've asked about memory and processor reservation and all things thin-provisioning and still waiting on the details, but they've suggested they are not thin provisioning.

Yup, and that's probably worth the paper it's 'suggested' on.

I had a VM a while back with strange memory behaviour, no matter how much I lowered SQL's max server memory, I could never get more than 100MB free (which is way too low). I asked about over committing memory, memory reservations, etc, was assured that it was set up correctly. Eventually asked to see the VM settings myself, what do you know, the memory is set so that the virtual thought it had 48 GB of memory, but only was given 24.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass