aggregated sys.dm_exec_query_stats CPU figures don't match the operating system counters

  • We take hourly soundings from sys.dm_exec_query_stats and aggregate it to understand what is going on in SQL Server. One of the things we aggregate is the CPU usage.

    The figures I get from SQL Server say that SQL Server is using approximately 25% of the available CPUs. However the figures from the operating system are more than double that at 65% to 75%

    Am I to interpret that SQL Server is spending more than half it's time doing internal SQL Server things rather than running queries and if so can I find out what these are with a view to trying to tune then up?

    Or is there another explanation for the discrepancy?

    Thanks

    Tim Walker

    .

  • There are a number of things that could result in those figures not lining up, but the main thing is that sys.dm_exec_query_stats only has information for plans that are still in cache, and only information for those plans since they were last put in cache.

    If a query has used 5 hours of CPU time and then its plan is pushed out of the cache, that information is gone from sys.dm_exec_query_stats.

    Another aspect that has to be accounted for is the amount of time a plan has been in cache. If a query has used 5 minutes of CPU time, it matters a great deal whether its current plan was cached 48 hours ago or 1 minute ago. The former would not necessarily be much of a contributor to CPU pressure, while the latter would.

    There are other confounding factors (CPU used for compiles, just to pick one), so I wouldn't worry too much about trying to make the numbers line up.

    Is this mostly out of curiosity, or is there a specific problem you're wanting to address?

    Cheers!

  • Hi Jacob, thanks for the response.

    Basically I sample the DMV every hour and compare it to what I sampled the previous hour. I then do some further augmentation of what I have so I can read it. This gives me a really clear view of the big hitting statements on my SQL Server across all the databases. If I don't like something I see in the analysis, I can get it amended / corrected.

    I'm fairly sure that there won't be plans that have been created and then removed, working on the basis that the oldest unused ones are the ones that get trashed. I presume thats correct.

    We also use Zabbix to capture a whole load of windows counters, CPU utilisation being one of these. This shows that CU usage is higher than I would like, but if I can avoid doing so, I don't want to add CPUs and put up my license costs for Enterprise Edition.

    So since I can't see much scope for further tuning in my queries, I am turning to what the this quite large proportion of CPU time might be.

    .

  • For a little bit about what determines if a plan stays in cache, you could take a look at https://technet.microsoft.com/en-us/library/ms181055(v=sql.105).aspx#Anchor_0.

    I definitely wouldn't just assume that you're neither pushing plans out of cache nor getting new plans in the cache.

    When you're doing your analysis of sys.dm_exec_query_stats, pay close attention to the values of creation_time you're seeing for the queries to see if many of them have had their plans entered since the last pull from sys.dm_exec_query_stats.

    Moving on to the analysis itself, exactly what is the method you're using to determine SQL Server's CPU utilization from the plan cache (queries used, especially)?

    Also, just to cover this base, what sorts of numbers are you seeing for Compilations/sec and Re-Compilations/sec?

    Cheers!

  • Thanks for the link, it looks interesting. I will study it in more depth.

    To determine the CPU being used I'm look at total_worker_time this time minus whatever it was last time I sampled joining on plan_handle, plan_generation_num and statement_start_offset. Most of the work done in the system is in 'proper' objects like stored procedures and triggers and there is hardly any ad-hoc SQL.

    I suppose if a plan gets a new generation number before I resample the previous generation is probably removed. I guess you article willtell me that when I read it properly. So I guess I could be losing quite a lot of information for anything that recompiles a lot.

    I'm hoping that this is a logic or understanding problem on my side as I can fix that once I know what it is!

    I haven't checked the recompile rate and I am on holiday this week but I'll take a look at these counters and also see if I do lose a lot of plans compared to my last sample, I haven't checked that and it would be a good start to do that I think.

    Thanks very much for your help to date.

    Tim

    .

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply