SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Help with SQL memory pressure investigation


Help with SQL memory pressure investigation

Author
Message
chreo
chreo
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 89
Hello,
Please help me with SQL memory pressure investigation.

1 month ago we upgraded from SQL2005 (on WindowsServer2008) to SQL2016 (on WindowsServer2016).
Since that moment we had few serious incidents caused by memory pressure.
How do I know that we had memory pressure?
SQL plans size dropped to 100MB instead of standard 8GB.
Multiple errors in log like: "AppDomain 22 (XXX.dbo[runtime].28) is marked for unload due to memory pressure."

Are there any well known reasons of internal/external memory pressure?
I suspect that it was internal memory pressure because:
- there was enough free system memory on server (not used by SQL)
- problem was solved by restart of SQL Server service

Our system description:
SQL 2016 13.0.4451.0 (SP1+CU5)
Server Memory: 262048 MB - divided between 3 SQL instances (MaxMemory: 160GB/32GB/16GB + over 40GB free for System)
Availability Groups enabled and replicated to Secondary Server.
Query Store currently disabled since first problem occured (it was suspected as source of problem).
Lock Pages in Memory and Instant File Initialization - enabled
DB Compatibility level = SQL2016 (few times we changed to SQL2012 because CPU was better, and it seems that we didn't have memory pressure).
Legacy Cardinality Estimation = ON (much better plans). Query optimizer fixes = ON
optimize for ad hoc workloads option = ENABLED

How can we prepare for next memory pressure disaster?
We monitor such parameters like #Compilations or PageLifeExpectancy so we noticed that there is disaster in progress.
But we need to monitor something to find reason of memory pressure.
Can we enable some additional logging?
Some useful event logs in WindowsServer?

Any well known memory pressure sources from your experience?



Super Cat
Super Cat
SSCarpal Tunnel
SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)

Group: General Forum Members
Points: 4324 Visits: 3028
Hi,

I would be interested in transaction\sec and batches\sec.
Along with an idea of what is actually running at that point to cause the spike.

Is it more transactions or bad code or process.
Profiler might be able to point you in the direction of what code is running.
Or you might want to use SP_WHOISACTIVE and turn on logging.

It would seem to me something runs or was running in that 10-12 min period which had a drain on resources.

SC
Super Cat
Super Cat
SSCarpal Tunnel
SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)

Group: General Forum Members
Points: 4324 Visits: 3028
I suspect bad code or missing indexes or possibly a report pointing to this server\db.
Again you need to find out what was\is running profiler or WHOISACTIVE is the best bet.
Grant Fritchey
Grant Fritchey
SSC Guru
SSC Guru (274K reputation)SSC Guru (274K reputation)SSC Guru (274K reputation)SSC Guru (274K reputation)SSC Guru (274K reputation)SSC Guru (274K reputation)SSC Guru (274K reputation)SSC Guru (274K reputation)

Group: General Forum Members
Points: 274217 Visits: 33815
The reason for something like this is the queries being sent to your system. You need to capture those queries. I'd suggest one of two approaches. First, the most accurate, but the hardest to set up and maintain, create an extended events session that will capture the rpc_completed and sql_batch_completed events on your server so that you can see the behavior of the queries. You'll immediately see the ones that have high reads/writes and possibly long duration that are surely messing with your memory. The other option, easier to set up and maintain, but far less accurate for this type of investigation, would be to enable Query Store. That will capture plans and query metrics including the reads and writes. The capture is aggregated, by default, every 60 minutes, so you'll only be able to see an aggregated average (with min, max and standard deviation) for the time frame you're interested in, but you will be able to identify the queries that are using your resources.

----------------------------------------------------
The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
Theodore Roosevelt

The Scary DBA
Author of: SQL Server Query Performance Tuning and SQL Server Execution Plans
Product Evangelist for Red Gate Software
chreo
chreo
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 89
@SuperCat:
- SQL Profiler - we had it, but we fuc**d up and Trace file was overwritten. We will have info next time when disaster appear again
- WhoIsActive - we run it every 30 seconds and save results to table, but when disaster started then WhoIsActive couldn't complete in 7 minutes and finally was killed.
(last WhoIsActive results 30 seconds before disaster didn't show any problematic queries, but we will analyze it again, because this is all we have now)

@Grant:
- QueryStore - we used this tool since day one after migration to SQL2016. But we have seen many CPU problems each time we tried to see stats in QueryStore.
Finally we decided to turn it off, which also decreased total CPU usage on server (not sure if that was 100% related, but it was much better without QueryStore)
also I've seen some forum posts where QueryStore was suspected of memory pressure
- ExtendedEvents - no experience with this tool but I guess we need to start using it Smile

What if problem wasn't caused by some query? We think we know all bad queries in our system.
What if some other process slowly eats memory and doesn't release it causing unexpected memory pressure after few days.
Is there some info that memory pressure will appear soon (when we reach some limit)?
Grant Fritchey
Grant Fritchey
SSC Guru
SSC Guru (274K reputation)SSC Guru (274K reputation)SSC Guru (274K reputation)SSC Guru (274K reputation)SSC Guru (274K reputation)SSC Guru (274K reputation)SSC Guru (274K reputation)SSC Guru (274K reputation)

Group: General Forum Members
Points: 274217 Visits: 33815
chreo - Thursday, December 14, 2017 7:59 AM
@SuperCat:
- SQL Profiler - we had it, but we fuc**d up and Trace file was overwritten. We will have info next time when disaster appear again
- WhoIsActive - we run it every 30 seconds and save results to table, but when disaster started then WhoIsActive couldn't complete in 7 minutes and finally was killed.
(last WhoIsActive results 30 seconds before disaster didn't show any problematic queries, but we will analyze it again, because this is all we have now)

@Grant:
- QueryStore - we used this tool since day one after migration to SQL2016. But we have seen many CPU problems each time we tried to see stats in QueryStore.
Finally we decided to turn it off, which also decreased total CPU usage on server (not sure if that was 100% related, but it was much better without QueryStore)
also I've seen some forum posts where QueryStore was suspected of memory pressure
- ExtendedEvents - no experience with this tool but I guess we need to start using it Smile

What if problem wasn't caused by some query? We think we know all bad queries in our system.
What if some other process slowly eats memory and doesn't release it causing unexpected memory pressure after few days.
Is there some info that memory pressure will appear soon (when we reach some limit)?


It's possible that it's some weird memory leak, sure. However, you haven't eliminated through the data presented that it's a query or queries causing the problems. Most of the time, it's something to do with queries, data structures, indexes, missing indexes, and out of date or missing statistics. These things cause about 85% of all performance problems. You guys are just capturing system metrics. Those can tell you where a problem is, disk or memory or cpu, but not what. You need to drill down further. The majority of cases the problem is very straight forward, not obscure memory leaks that are only happening to you. That's just now how this stuff works.

As to Query Store causing problems, it can, like any other monitoring tool. However, once more, you turned it off without any sort of indication that it was the problem. There are Query Store specific wait statistics that would give you an indication of what was wrong. Also, by turning it off, you've made it so that you're not monitoring queries (since you don't have extended events up & running). You're just not gathering a full set of metrics for making your decisions.

In addition to capturing query metrics, what about wait statistics. Knowing you have memory pressure doesn't say where the slow down is occurring. You need more and better data.

----------------------------------------------------
The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
Theodore Roosevelt

The Scary DBA
Author of: SQL Server Query Performance Tuning and SQL Server Execution Plans
Product Evangelist for Red Gate Software
GilaMonster
GilaMonster
SSC Guru
SSC Guru (673K reputation)SSC Guru (673K reputation)SSC Guru (673K reputation)SSC Guru (673K reputation)SSC Guru (673K reputation)SSC Guru (673K reputation)SSC Guru (673K reputation)SSC Guru (673K reputation)

Group: General Forum Members
Points: 673929 Visits: 48433
What did the two counters Total Server Memory and Target Server Memory look like over that period?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


Perry Whittle
Perry Whittle
SSC Guru
SSC Guru (168K reputation)SSC Guru (168K reputation)SSC Guru (168K reputation)SSC Guru (168K reputation)SSC Guru (168K reputation)SSC Guru (168K reputation)SSC Guru (168K reputation)SSC Guru (168K reputation)

Group: General Forum Members
Points: 168967 Visits: 18494

chreo - Thursday, December 14, 2017 2:41 AM

Hello,
Please help me with SQL memory pressure investigation.

1 month ago we upgraded from SQL2005 (on WindowsServer2008) to SQL2016 (on WindowsServer2016).
Since that moment we had few serious incidents caused by memory pressure.

How do I know that we had memory pressure?
SQL plans size dropped to 100MB instead of standard 8GB.
Multiple errors in log like: "AppDomain 22 (XXX.dbo[runtime].28) is marked for unload due to memory pressure."

Are there any well known reasons of internal/external memory pressure?
I suspect that it was internal memory pressure because:
- there was enough free system memory on server (not used by SQL)
- problem was solved by restart of SQL Server service

Our system description:
SQL 2016 13.0.4451.0 (SP1+CU5)
Server Memory: 262048 MB - divided between 3 SQL instances (MaxMemory: 160GB/32GB/16GB + over 40GB free for System)


On the one instance or all 3, you haven't stated, it's not clear



-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" ;-)
Tom300
Tom300
Grasshopper
Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)

Group: General Forum Members
Points: 10 Visits: 9

I'm working with chreo on this problem. Please let me provide some answers + add some fresh info on the problem.
In the following post someone described similar behavior:
https://social.technet.microsoft.com/Forums/en-US/537b2179-10ed-4c69-9b27-bb2c81f7708c/sql-server-2016-memory-pressure-response-flushing-plan-cache?forum=sqlkjmanageability

@Perry - the problem can be observed on 1 instance at a time. So far we experienced it 4 times on 2 instances. 3 times on most busy one and 1 timeon medium load one. Twice the problem occurred on the next day after migration to SQLL 2016, so practically we had 2 production incidents with this issue. Just today we spotted it on different smaller sql instance (this was the 4th incident of this type)

@GilaMonster– Total & Target Memory counters stay on the level equal to max server memory and does not change/decrease during the incident occurrence. OS freememory stays on safe level. OS has over 30GB of memory available for itself.

The analysis of the problem we did so far lead us to the conclusion that thereis some internal or external to SQL server factor that is causing the memory pressure. We can recognize the problem by SQL becoming unresponsive which is aconsequence of excessive query compilations/s and thus much longer executionof each query, it results in very high CPU at 100%. We are able to see it in the error logs that at the moment when CPU goes 100% + sql compilations/suddenly increase several times. We managed to discover that for some reasonSQL server does not store most of the compiled plans in the query cache. The problem which is described here troubled us few times since we migrated 1,5 month ago. As our system requires high availability, we have little time for reaction and resolution. During the first occurrence of the problem we figured out that the only measure that solves temporarily situation is SQLServer restart. (I know, one may say that this is bad practice, but in thissituation we can't afford the system being unavailable any longer). After this all gets back to normal, compilations return to the normal level, same with CPU etc.
So we have two questions:
* what causes the memory pressure?
* why SQL server behaves in such a way that it does not store compiled plans inthe cache any more?

As we have no much time when the problem occurs we prepared a plan to gather asmuch data as possible on the next occurrence, before we restart the SQL. What do we want to see:

*stats from sys.dm_exec_cached_plans
* output of sys.dm_os_memory_clerks
* output of sys.dm_os_memory_cache_clock_hands

And "luckily" today we had 4th occurrence of this problem, but on different, smaller and less busy SQL instance. We observed SQL compilations very high, fewhundred times more than normally (see the charts below - blue line is for thatinstance, dropped after SQL restart), on the other hand CPU was"only" 2 times higher.

We suspect that the users could work but with the degraded performance. We concluded that after we noticed that the first symptoms of the memory pressurewere visible on 21.12 (one of them is "app domain is marked for unload dueto memory pressure" in the errorlog) and the problem persisted through thewhole Christmas period until today (27.12 11:38). See another longer term chart(blue line)

As planned I’ve executed the planned in advance queries saved them to the excel file and restarted SQL. Half an hour after SQL server restart I’ve run the queries once again and saved the to the same excel (see attached). For each of 3 queries you will see 2 sheets in theexcel. The one suffixed with B means Before restart and the one suffixed with A means After the restart.

My first observations and comments after looking at the statistics are:

  • Ad Hoc and Prepared plans were not stored inthe cache plan. You could see one for a while and was immediately remove dafterwards. In the statistics of cached plans you will see ~1400 plans beforethe restart and ~ 13 000 after the restart.

  • The size of the cache plan before the restart was 3.8GB and 2.1GB after the restart. This is very strange considering that the 2.1GB cache was able to store 13k objects.

  • Probably there was kind of external memory pressure. When I summed up all objects from the cache there was 28GB before the restart and 32GB after the restart.

Any suggestions from your side?




Attachments
BeforeAndAfterSQLRestart.xlsx (12 views, 254.00 KB)
Bill Talada
Bill Talada
SSCrazy Eights
SSCrazy Eights (8K reputation)SSCrazy Eights (8K reputation)SSCrazy Eights (8K reputation)SSCrazy Eights (8K reputation)SSCrazy Eights (8K reputation)SSCrazy Eights (8K reputation)SSCrazy Eights (8K reputation)SSCrazy Eights (8K reputation)

Group: General Forum Members
Points: 8014 Visits: 2296
From the graphs I see high CPU, low PageLifeExpectancy, and high Compilations. All these happen when doing a table Rebuild or REINDEX or Auto Update Statistics.

I would do something very simple such as logging all sysprocesses every minute for a day and comparing processes just before, during, and after the event.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search