Blog Post

A Study in SQL Server Ad hoc Query Plans

,

A Study in SQL Server Ad hoc Query Plans

We, (or maybe it’s just my dismissive attitude towards it), often think of ad hoc query plans as single use query plans that won’t be reused very frequently or at least not in very near future. by which time the plan might have been already pushed out of cache by SQL Server.  But, what if we have too many of such queries as a percentage of the overall workload, say for sake of example more than 50%,  obviously caching such execution plans will be just waste of memory for the sql server, and the server memory in general. Some might say it will make the plan cache bloated. 

For example, the following is a real example where the overall memory used by the query plans is whopping 30GB !!!!

;WITH cte_plan_cache_usage_by_obj_type AS 
(
SELECT COALESCE(objtype, '-- ALL Plans') objtype,
           Sum(Cast(size_in_bytes AS BIGINT)) / 
              1024 / 1024 size_mb
    FROM   sys.dm_exec_cached_plans
    GROUP  BY rollup ( objtype )
)
SELECT objtype,
       size_mb,
   'percent' = 
   ( size_mb * 100 ) / (SELECT size_mb
FROM   cte_plan_cache_usage_by_obj_type
WHERE  objtype = '-- ALL Plans'
   ) 
FROM   cte_plan_cache_usage_by_obj_type
ORDER  BY size_mb DESC 

Next, I fire up a slight variation of the query to see the type of workloads using the most memory:
;WITH cte_plan_cache_usage_by_obj_type AS 
(
SELECT COALESCE(objtype, '-- ALL Plans') objtype,
           Sum(Cast(size_in_bytes AS BIGINT)) / 
              1024 / 1024 size_mb
    FROM   sys.dm_exec_cached_plans
    GROUP  BY rollup ( objtype )
)
SELECT objtype,
       size_mb,
   'percent' = 
   ( size_mb * 100 ) / 
                       (SELECT size_mb
FROM   cte_plan_cache_usage_by_obj_type
WHERE  objtype = '-- ALL Plans'
   ) 
FROM   cte_plan_cache_usage_by_obj_type
ORDER  BY size_mb DESC
Wow, the ad hoc queries are using the whopping 94% !!!!.  

This workload is certainly almost all ad hoc and no, I am not in position to tell the application team to modify their queries, not in the short run at least. And to paraphrase a famous quote:

“In the long run we will be all dead”

- John Maynard Keynes


So, any adjustments to improve things, have to be done on the backend SQL Server. 

Fortunately for this types of workloads, SQL Server offers a configurable setting “optimize for ad hoc workloads” to limit/reduce the amount of memory used by such single-use ad hoc queries, here the keyword is single-use.  How? When this setting is enabled, the optimizer will not cache the full execution plan for ad hoc query at first, instead it will cache only a much smaller plan "stub"  and its query hash and plan hash.  This plan "stub" is not an execution plan at all, so it is not reusable for subsequent runs of the same query. However, if the same query is re-run in near future, then the optimizer will cache the reusable, full execution plan. Where the single-use, ad hoc queries make up relatively large portion of workload, this setting can end up saving sizable amount of memory, and making it available to others.

Note: SQL Server requires an exact text match for two ad-hoc queries before reuse can occur. The text match is both case- and space-sensitive, even on case-insensitive servers


By default, this setting is off, though I sometimes run into cases where I found it best to turn this setting on.  Why? because the memory in SQL Server is precious, but probably a bigger reason for me is that the sql servers often have very limited memory allocated and as much as (or as little) I would like to, I cannot convince myself let alone my manager to give every damn server 2TB of RAM.

In the example that I have shown, without knowing the context, the 30GB plan cache does seem very high, hack most of my sql servers don’t even have that much total memory.  So, here is some context regarding this SQL Server:

SQL Version: 

Microsoft SQL Server 2019 (RTM-CU18)

(KB5017593) - 15.0.4261.1 (X64)

Enterprise Edition: Core-based

Licensing (64-bit)

Windows Server 2019 Standard 10.0

<X64> (Build 17763: ) (Hypervisor)

Database Size is 60TB

Application: Bigdata/Data Warehouse/Analytical Reports

CPU and Memory Configuration:

(MEMORY INFO IS IN GIGABYTES)



While MAXDOP of 8 is the recommended best practice, I have my doubts if it is the best setting for this SQL Server with mainly analytical workload and 112 cores. Maybe this should be set dynamically  depending on the time of the day and the type of expected workload during that time. For example if there is a massive ETL job running nightly for 5 hours that locks out every other user, maybe that ETL job should have ability to use higher MAXDOP value. But, that I will investigate later.
In this context where the SQL instance has more than 2TB of RAM available, the 30GB size cache is less than 1.5%, which on average is lower than the most of other SQL Servers. But still high in absolute terms, so it is worth investigating.
Lets first check what is the current setting is for the optimize for ad hoc workloads.
It is not enabled, which is the default, and of course it still is in turned off state. Normally I would not spend more time in investigating this further, I would simply turn the setting on. But here I am hired to only investigate and make recommendations, I am not allowed to make any changes myself.

So, I need to continue my investigation bit further. But before going any further, lets discuss what really makes an execution plan an ad hoc plan?

As I mentioned at the beginning, some of us often think of an ad hoc plan or ad hoc query is something that is run only once. This is only partially true when you think of how sql server labels a query plan as ad hoc or otherwise in the cache:

“Adhoc: Ad hoc query. Refers to Transact-SQL submitted as language events by using osql or sqlcmd instead of as remote procedure calls”


That description is straight from this MS article:

And even though it doesn’t mention SSMS, queries run from SSMS are also labelled as ad hoc by the optimizer.

It is not about how many times a query gets executed that defines whether it is an ad hoc query. However the number of query executions does have bearing on whether the "optimize for ad hoc workloads" will cache the complete execution plan or only a stub.
Practically speaking, ad hoc queries are those that do not have parameters or cannot be auto-parameterized by the optimizer, are not part of a database object (stored procedure, function, views etc.). Take for example this simple query:

SELECT NAME FROM SYS.DATABASES DBS WHERE  NAME = 'tempdb';


Note that the query is using a constant value 'tempdb' in the WHERE clause, not a variable/parameter. The optimizer still can and will try to parameterize this

query if it is part of a stored procedure or sent as a prepared sql query. The

interactive client tools like sqlcmd, osql, ssms etc. don’t execute such

queries as prepared statements therefore optimizer will treat it as an ad hoc

query and will cache execution plans accordingly.  If you have something that runs certain queries

frequently using such tools, you can take advantage of sp_executesql so that they are cached as prepared plans that will get

reused and thereby improve query performance.

 

Let’s look at the following example, where the same sql statement is

executed in three different ways to get the same results:

-- Ad hoc query
SELECT NAME FROM sys.databases dbs WHERE  NAME = 'tempdb'; 
GO
-- Prepared SQL statement
EXEC Sp_executesql N'SELECT NAME FROM sys.databases dbs WHERE  NAME = @db_name',
                   N'@db_name sysname',
                   N'tempdb' 
GO
-- Stored Procedure
IF Object_id('sp_test_adhoc_plans_cache', 'P') IS NOT NULL
  DROP PROCEDURE sp_test_adhoc_plans_cache
GO
CREATE PROCEDURE sp_test_adhoc_plans_cache
(@db_name SYSNAME = 'tempdb')
AS
    SELECT NAME FROM sys.databases dbs WHERE  NAME = @db_name;
GO
EXEC sp_test_adhoc_plans_cache
GO
Now look at how the optimizer has processed/cached those queries:
SELECT cp.plan_handle,
       cp.objtype,
       cp.cacheobjtype,
       cp.size_in_bytes,
       cp.usecounts,
       qt.text,
       qs.query_hash,
       qs.query_plan_hash
FROM   sys.dm_exec_cached_plans cp
       inner join sys.dm_exec_query_stats qs on qs.plan_handle = cp.plan_handle
       CROSS apply sys.Dm_exec_sql_text(cp.plan_handle) qt
WHERE  qt.text LIKE '%SELECT NAME FROM sys.databases dbs WHERE  NAME =%'
       AND qt.text NOT LIKE 'SELECT cp.plan_handle%'

For better visibility of it's content, I have split the results into two screenshots:

Notice that the query hash and plan hash values for them are same. However, the plan handles and objtype are different, even for the same input value, tempdb.  While the cached plan for ad hoc query is still much smaller than the other plans, it is still larger compared to if the 'optimize for ad hoc workloads' was enabled. Let’s verify that:

EXEC sp_configure 'optimize for ad

hoc workloads', 1;

reconfigure with override;


Let's remove the already cached plans:

DBCC FREEPROCCACHE

(0x060001002C0F151A10D0B0C3D30100000100000000000000000000000000000);

DBCC FREEPROCCACHE

(0x060001008A0FCA2810A6C9B5D3010000010000000000000000000000000000);

DBCC FREEPROCCACHE

(0x05000100D563663F50416C7CE30100000100000000000000000000000000000);


Re-run the same 3 queries again and look at the cache information:


Note that the plan_handle values are different than the previous screenshot because I cleared the previous plans from the cache.

Notice the size of the ad hoc plan, only 456 bytes, and the cacheobjtype is Compiled Plan Stub. That is right it is only a stub; it doesn’t have an execution plan and therefore technically there is no cached plan to reuse. However, if the same ad hoc query is run again in near future, optimizer will create a full execution plan for it that can be reused by the subsequent invocation of the same query. Let's verify that:


-- Ad hoc query

SELECT NAME FROM sys.databases dbs WHERE  NAME = 'tempdb';
GO

Here is the info from sys.dm_exec_cached_plans after running the ad hoc query the second time:

The cacheobjtype for the Adhoc plan is no longer a stub.

In a nutshell, when optimize for ad hoc workloads is enabled:

  • The SQL Server optimizer will not cache the full execution plan for the Ad hoc queries at first.  In other words, this setting only affects queries the optimizer deems Ad hoc.
  • Optimizer will however cache the plan stub, along with the query and plan hash values so that it will know if the same query is re-executed in which case it will remove the stub and cache the complete execution plan

Considering that, would I now recommend that the optimizer for ad hoc workloads setting should be turned on? In this case the workload is more than 90% ad hoc and the the risks seems low enough.   But, I want to look at one more piece of information.  Following query returns the aggregate memory used for:

  • Ad hoc query plans
  • Plan is not a stub
  • With use count less than or equal to 2
  • Have not been used by a query in more than 5 hours

;with cte_cached_adhoc_plans as
(
SELECT plan_handle,
       MAX(last_execution_time) last_execution_time,
       SUM(execution_count)     execution_count
FROM   sys.dm_exec_query_stats
GROUP  BY plan_handle 
)
SELECTCOUNT(*) [Plan Count],
SUM(CAST(size_in_bytes AS BIGINT)) / 
              1024 / 1024 [Size MB],
MAX(cte_cached_adhoc_plans.execution_count) [Max Exec Count],
SUM(cte_cached_adhoc_plans.execution_count) [Total Exec Count]
FROM       sys.dm_exec_cached_plans cp 
INNER JOIN cte_cached_adhoc_plans ON cte_cached_adhoc_plans.plan_handle = cp.plan_handle 
WHERE cp.objtype = 'Adhoc'
  AND cte_cached_adhoc_plans.last_execution_time < DATEADD(HOUR, -5, GETDATE())
  AND cp.objtype != 'Compiled Plan Stub'
  AND cte_cached_adhoc_plans.execution_count <= 2
That is almost 28GB so in case this I would recommend that the optimize for ad hoc workload setting should be turned on.  Of course like any recommendation for changing configuration, it comes with caveats and caution that any change should be thoroughly tested as it can have potentially unintended and often negative effects and, the organization must have a solid change control procedure in place, including a back-out plan.
As an alternative, following query can be used to generate a script to remove the individual ad hoc query plans from the cache. Please adjust the filter conditions to suite your needs. This can be scheduled to run at suitable hours, say for example after the peak business hours and just before the night ETL or maintenance jobs about to begin.

;with cte_cached_adhoc_plans as
(
SELECT plan_handle,
       MAX(last_execution_time) last_execution_time,
       SUM(execution_count)     execution_count
FROM   sys.dm_exec_query_stats
GROUP  BY plan_handle 
)
SELECTTOP 1000
[Remove Cached Plan] = 
'DBCC FREEPROCCACHE (0x' + convert(varchar(max), cte_cached_adhoc_plans.plan_handle, 2) + ');' 
FROM       sys.dm_exec_cached_plans cp 
INNER JOIN cte_cached_adhoc_plans ON cte_cached_adhoc_plans.plan_handle = cp.plan_handle 
WHERE cp.objtype = 'Adhoc'
  AND cte_cached_adhoc_plans.last_execution_time < DATEADD(HOUR, -5, GETDATE())
  AND cp.objtype != 'Compiled Plan Stub'
  AND cte_cached_adhoc_plans.execution_count <= 2

On a final note, as we know, SQL Server will push out cached plans, cached data pages from the buffer cache etc. as and when it needs to, using it's ever evolving algorithm. So why not let it decide? I am in agreement, with a caveat of course. The default settings and behavior work very well for most SQL installations in most situations, but not for all SQL Servers and certainly not all of the times. 
PS: Being a huge fan of Sherlock Holmes stories, the title of this post is a part reference to the famous A Study in Scarlet story.

Original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating