How Your Hypervisor Can Impact Your CPU

Recently I had a client complain of chronic high CPU utilization. The performance of their SQL Server had degraded, and it appeared to be related to higher than normal CPU utilization in conjunction with symptoms of unresponsive user queries. The root cause was twofold—a third party hosting provider had overallocated virtual processors on the physical host where the virtual machine (VM) running SQL Server was residing, as well as a recent upgrade from a version of VMWare that was not patched for Spectre and Meltdown. The host had 16 physical cores and was hyperthreading (making it effectively 32 cores) until the hosting provider patched from VMWare 5.5 to a newer release (we believe 6.5) which was required for Meltdown and Spectre processor vulnerabilities. This patch disabled hyperthreading from the hypervisor to mitigate the security risk from speculative execution. Note, this patch is over a year old and a critical security risk; most software vendors (including VMWare) put this out as an immediate requirement after the announcement of the vulnerabilities.

Given this was a virtual machine, it shared a physical host with many other VMs; this is a very common configuration. However, this host was VERY overallocated. As mentioned above, there were 16 cores–however 61 additional vCPUs had been allocated to other machines. That’s 4.3 times the number of CPUs available for allocation. The screenshot below shows this singular Host, highlighting the vCPUs allocated.

The most prevalent SQL Server wait type was SOS_SCHEDULER_YIELD. This wait type is typically prevalent on a server undergoing CPU pressure. Whenever a thread needs a resource that it can’t immediately acquire, that worker thread becomes suspended and waits on the Waiter List to be told (signaled) that its resource is available. When this is encountered on a VM, it is commonly associated with overallocation of CPU on the underlying host, particularly when they happen suddenly in your environment. The VM was struggling to get and use the 16 vCPUs required for the workload from the host. According to their hosting provider it is their practice to over allocate their VM hosts. After speaking with them, we explained that this is in fact not a common practice and in this case was the very root of the issue. As a fix they moved the VM to a dedicated host. This immediately made a difference in CPU performance. Jonathan Kehayias (B|T) has two great blogs on CPU pressure I referred to while working through this issue. You can find them here and here.

The final solution was a combination of the move to a dedicated older server with appropriate vCPU allocation, query tuning, indexing and best practice configuration changes on the server. The following SQL Server settings were changed: Max Degree of Parallelism, Cost of Parallelism Threshold, Max Memory, and Tempdb file configuration and sizing. Additionally, Trace flags (1118, 1117, 2371 and 4199), Optimize for Adhoc Workloads and Instant File Initialization settings were enabled. Once best practices were configured, I tuned 26 query plans and adjusted indexes appropriately according to their needs thus reducing the I/O and CPU needs.

The moral of the story is, sometimes performance is impacted by things outside of SQL Server. After you make sure SQL is optimized and you are still having issues, start looking to the VM host. When you experience weird SQL performance on a VM, you always want to look at the host to see the rest of the environment (if you are in the cloud this is easier—everything is tightly managed with quality of service (QoS). Ask these questions.

Was the Host patched recently?
Was this VM vMotioned to another Host?
Is the memory allocated to the VM “reserved”?
What is the CPU allocation on the Host?
Any config changes on the Host?
What does Ready Time report show on the Host? (link is to a great blog by David Klee (B|T))

The last question in my list above regarding Ready Time really helped me in proving over allocation to the VM Administrator and ultimately lead to host modifications. Ready Time is a metric on the VM Host that shows the percentage of time the virtual machine was ready but could not get scheduled to run on the physical CPU. As you can see in the screenshot above the Host had a VERY high Ready time which was a clear indication of over allocation. Typically, you want to see a ready time under 5%. If it’s higher than this, you should really investigate further.

When facing high CPU pressure take the time to not only performance tune SQL Server but look the Host it’s residing on. You may find that outside factors are affecting the VM beyond SQL’s control. Being a great DBA is not only managing your SQL Servers but also knowing all the players involved in performance.

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

How Your Hypervisor Can Impact Your CPU

Rate

Share

Share

Rate

How Your Hypervisor Can Impact Your CPU

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts