I'm working with chreo on this problem. Please let me provide some answers + add some fresh info on the problem.
In the following post someone described similar behavior: https://social.technet.microsoft.com/Forums/en-US/537b2179-10ed-4c69-9b27-bb2c81f7708c/sql-server-2016-memory-pressure-response-flushing-plan-cache?forum=sqlkjmanageability
@Perry - the problem can be observed on 1 instance at a time. So far we experienced it 4 times on 2 instances. 3 times on most busy one and 1 timeon medium load one. Twice the problem occurred on the next day after migration to SQLL 2016, so practically we had 2 production incidents with this issue. Just today we spotted it on different smaller sql instance (this was the 4th incident of this type)
@gilamonster– Total & Target Memory counters stay on the level equal to max server memory and does not change/decrease during the incident occurrence. OS freememory stays on safe level. OS has over 30GB of memory available for itself.
The analysis of the problem we did so far lead us to the conclusion that thereis some internal or external to SQL server factor that is causing the memory pressure. We can recognize the problem by SQL becoming unresponsive which is aconsequence of excessive query compilations/s and thus much longer executionof each query, it results in very high CPU at 100%. We are able to see it in the error logs that at the moment when CPU goes 100% + sql compilations/suddenly increase several times. We managed to discover that for some reasonSQL server does not store most of the compiled plans in the query cache. The problem which is described here troubled us few times since we migrated 1,5 month ago. As our system requires high availability, we have little time for reaction and resolution. During the first occurrence of the problem we figured out that the only measure that solves temporarily situation is SQLServer restart. (I know, one may say that this is bad practice, but in thissituation we can't afford the system being unavailable any longer). After this all gets back to normal, compilations return to the normal level, same with CPU etc.
So we have two questions:
* what causes the memory pressure?
* why SQL server behaves in such a way that it does not store compiled plans inthe cache any more?
As we have no much time when the problem occurs we prepared a plan to gather asmuch data as possible on the next occurrence, before we restart the SQL. What do we want to see:
*stats from sys.dm_exec_cached_plans
* output of sys.dm_os_memory_clerks
* output of sys.dm_os_memory_cache_clock_hands
And "luckily" today we had 4th occurrence of this problem, but on different, smaller and less busy SQL instance. We observed SQL compilations very high, fewhundred times more than normally (see the charts below - blue line is for thatinstance, dropped after SQL restart), on the other hand CPU was"only" 2 times higher.
We suspect that the users could work but with the degraded performance. We concluded that after we noticed that the first symptoms of the memory pressurewere visible on 21.12 (one of them is "app domain is marked for unload dueto memory pressure" in the errorlog) and the problem persisted through thewhole Christmas period until today (27.12 11:38). See another longer term chart(blue line)
As planned I’ve executed the planned in advance queries saved them to the excel file and restarted SQL. Half an hour after SQL server restart I’ve run the queries once again and saved the to the same excel (see attached). For each of 3 queries you will see 2 sheets in theexcel. The one suffixed with B means Before restart and the one suffixed with A means After the restart.
My first observations and comments after looking at the statistics are:
Ad Hoc and Prepared plans were not stored inthe cache plan. You could see one for a while and was immediately remove dafterwards. In the statistics of cached plans you will see ~1400 plans beforethe restart and ~ 13 000 after the restart.
The size of the cache plan before the restart was 3.8GB and 2.1GB after the restart. This is very strange considering that the 2.1GB cache was able to store 13k objects.
Probably there was kind of external memory pressure. When I summed up all objects from the cache there was 28GB before the restart and 32GB after the restart.
Any suggestions from your side?