Blog Post

Schedulers To Rule Them All

,

Sometimes things just don’t make sense.

One of our production servers has 4×8 (4 sockets, 8 core) processors with hyper threading enabled.  This results in 32 physical cores or 64 logical cores.  Therefore, this particular server should have 4 NUMA nodes available.

bufferHowever, I was looking at the dashboard for this particular server through SentryOne.   I happened to notice that the server was only showing 3 buffers for SQL Server to utilize.  If you look at the  image, the 3 purple blocks on the left represent the number of buffers the server has.  Each block  reflects a one to one ratio of buffer to a NUMA node.  If you have two NUMA nodes, you would have two blocks represented.

As you can see, I’ve got 3 blocks, so 3 buffers are aligned with 3 NUMA nodes, which is not expected.  Where is the fourth buffer?  Did it just not show up for work?

So I started to investigate it.

SQL Server is pretty versatile.  It’s got dynamic management views (DMVs) that will tell you a number of things if you know where to look.  In this case, I needed to look at those that told me about the underlying core structure.  Namely, these:

Investigation

numa_nodesThe DMV sys.dm_os_nodes did indeed confirm that the server has 4 NUMA nodes available.  SQL Server sees that there are four of them.  Node ID of 64 represents the Dedicated Administrator Connection (DAC).  This node guarantees that if you need to use the DAC, you’ll have resources available to do the work.

Also notice that node_id of 3 is currently set to “OFFLINE“.  That’s curious.  We will come back to that.

You could also use the sys.dm_os_schedulers DMV to find out how many nodes it has:

SELECT DISTINCT parent_node_id
FROM sys.dm_os_schedulers

I have now confirmed in two different ways that SentryOne should be reflecting 4 memory buffers.  Let’s take a closer look at what this DMV is telling us.  A scheduler is essentially a gatekeeper.  Schedulers coordinate processing time on the processors for worker processes.  There is a one to one correlation between the number of cores and the number of schedulers.  If you have a 32 core system, you would have 32 schedulers to process end user requests.

Just as a side note, while there is a direct link between the schedulers and the processor, this does not mean that each scheduler is limited to that specific processor.  A scheduler assigned to processor #1 could end up work on processor #2 assuming that the affinity mask IS NOT set.

In this DMV, I happened to notice that each scheduler also has an available status.  These statuses are:

  • HIDDEN ONLINE
  • HIDDEN OFFLINE
  • VISIBLE ONLINE
  • VISIBLE OFFLINE
  • VISIBLE ONLINE (DAC)
  • HOT_ADDED

Hidden schedulers are used to process requests that are internal to the engine itself.  Visible schedulers are used to handle end-user requests.  When you run the casual SELECT * query, it will utilize a visible scheduler to process the query.  With this information, if I have a 64 core server and all is well, I should have 64 visible schedulers online to process requests.

offline_schedulersHowever, I discovered that some of the schedulers were set to “VISIBLE OFFLINE”.  This essentially means that those particular schedulers are unavailable to SQL Server for some reason.   How many offline schedulers do I have? A quick query resulted in 24 schedulers currently offline.  24 logical cores means that 12 physical cores are offline.

But why would a scheduler be set to “VISIBLE OFFLINE”?

I wasn’t able to find anything really online as to why SQL Server would only have 40 logical cores online and available.  So, I started to wonder if SQL would dynamically spin these up if it was under CPU pressure.

After my google-fu failed me and I had some conversations with Chris Yates (B|T), we thought it was a good idea to turn to my good friend David Klee (B|T) at HeraFlux Technologies.  David is a longtime friend from Omaha, Nebraska and is wicked smart about this stuff.   As a side note, being in the SQL Server Community and knowing who to call, makes a world of difference.

Later that evening David called me and we discussed what I was seeing.   Within about 15 minutes David asked which ISO we had installed on the server.   Given that the server was installed and running prior to my starting at the company, I wasn’t sure which ISO all I knew was that it was SQL Server 2012 Enterprise Edition.  David continued to tell me that there are two different versions of SQL Server 2012 Enterprise ISO’s.  One was core based which allowed you to utilize all of the cores available to the server.  The other version would limit you to 20 physical cores or 40 logical cores with hyper threading.

Wait.  What?  I’m sure my jaw hit the floor as I have never heard of that.  David gave me some things to check the following morning.

You Can’t Handle The Truth

online_schedulers_1The next morning,  I checked the core counts again, and there it was.  40 cores online.  Things were starting to make some sense.

One of the things that David said to check was the server log.  If you are using the one that limits the number of cores, there will be an entry in the log upon start up.  Given that we had just recently patched the operating system, I could easily check to see if that message there.

error_log

Click to Enlarge Image

Shoot.  The incorrect ISO was installed and we are limited to only 40 logical cores. This means that a portion of our 64 core we can’t even utilize to process queries.  This also potentially means that we could experience other issues due to the fact that the work load is now not balanced across all of the processors.  We haven’t experienced this (yet) but I suspect as time goes on we would start to see some evidence of this.

Now that I had a better understanding of what had happened, I change my google queries and was able to locate some references to this particular “issue”.

https://blogs.msdn.microsoft.com/sql_shep/2012/06/21/sql-server-2012-license-core-limitaion/

From that blog post, it sounds like an easy fix, however we will have to test it some fashion as this is one of our Production SQL Servers.  And it’s clustered.

Update 2017-04-01 (not an April Fool’s Joke):

Jonathan Kehayias (B|T) has a excellent article on this very subject over at SQLPerformance.com.  I highly recommend taking  a look.  One thing that he mentions that I didn’t consider, is that now the work load is unbalanced across the remaining online CPU’s.  This in of itself can cause issues.  Take a look!

Summary

If you have SQL Server 2012 Enterprise edition and you have more than 20 physical cores available, you might want to check to ensure the correct ISO was installed.  If not, you could be licensing cores that you can’t even utilize.  Who knew?

Here’s script that will help you determine whether or not you need to investigate further. This should work with SQL Server 2008 and up.

SELECT CASE WHEN COUNT(1) < ( SELECT COUNT(1)
FROM sys.dm_os_schedulers
WHERE [status] IN ( 'VISIBLE OFFLINE',
'VISIBLE ONLINE' )
) THEN 'ERROR - Offline Schedulers Detected'
ELSE 'SUCCESS! No offline schedulers Detected'
END AS 'Status Check'
FROM sys.dm_os_schedulers
WHERE [status] = 'VISIBLE ONLINE'
GO

Stay tuned to see how the fix goes!

 

© 2017, John Morehouse. All rights reserved.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating