Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

SQL Memory usage suddenly very high - Causing Cluster Failovers Expand / Collapse
Author
Message
Posted Friday, March 22, 2013 3:32 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, August 22, 2014 2:19 AM
Points: 109, Visits: 267
We've noticed that memory usage on the DB cluster had suddenly changed. Normally PLE is fairly stable staying at around 6000 seconds. In the afternoon the PLE suddenly dropped from 6000 to under 300 and bounced between 0 & 600 averaging 300 for the rest of the day. At 6:11pm the cluster failed over, again at 00:12 and latest failover at 6:17am this morning.

After the last failover, the PLE and memory usage went back till normal until approx 8:07am when the PLE suddenly went to 0 and SQL memory usage jumped up by approx 4GB

In the SQL Error log before each cluster failover we get these messages:
03/21/2013 18:00:13,spid27s,Unknown,AppDomain 796 (mssqlsystemresource.sys[runtime].1143) unloaded.
03/21/2013 18:00:13,spid1s,Unknown,AppDomain 796 (mssqlsystemresource.sys[runtime].1143) is marked for unload due to memory pressure.

then

03/21/2013 18:11:53,Logon,Unknown,Error: 18451<c/> Severity: 14<c/> State: 1.
03/21/2013 18:11:52,Logon,Unknown,Login failed for user 'swconsumer'. Only administrators may connect at this time. [CLIENT: 10.0.100.38]
03/21/2013 18:11:52,Logon,Unknown,Error: 18451<c/> Severity: 14<c/> State: 1.
03/21/2013 18:11:52,Logon,Unknown,Login failed for user 'swbroker'. Only administrators may connect at this time. [CLIENT: 10.0.200.31]
03/21/2013 18:11:52,Logon,Unknown,Error: 18451<c/> Severity: 14<c/> State: 1.
03/21/2013 18:11:52,Logon,Unknown,Login failed for user 'swapi'. Only administrators may connect at this time. [CLIENT: 10.0.200.27]
03/21/2013 18:11:52,Logon,Unknown,Error: 18451<c/> Severity: 14<c/> State: 1.
03/21/2013 18:11:52,spid327,Unknown,The client was unable to reuse a session with SPID 327<c/> which had been reset for connection pooling. The failure ID is 23. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

at which point SQL Shuts down and causes a cluster fail-over

We had a software release yesterday, so I assume that a change from that is causing these issues. Question is, where do I start looking to find the cause of this?

Any help would be much appreciated

Andrew



Post #1434166
Posted Friday, March 22, 2013 4:06 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 7:42 AM
Points: 4,417, Visits: 10,718
I would start looking into sys.dm_exec_query_stats to find the queries that are not behaving as normal per your baseline.
Queries that use a lot of memory must be reading lots of data from disk, so I would start looking for rows with high total_logical_reads / execution_count-

Also, the error you are getting is likely a symptom that MAX_SERVER_MEMORY was not set correctly and both SQL Server and OS are starving. Set MAX_SERVER_MEMORY to a reasonable limit. If you already set it, you are probably getting more multi-page allocations than usual. As you probably know, MAX_SERVER_MEMORY does not limit multi-page allocations and you should be looking for those in sys.dm_os_memory_clerks.

Hope this helps
Gianluca


--
Gianluca Sartori

How to post T-SQL questions
spaghettidba.com
@spaghettidba
Post #1434181
Posted Friday, March 22, 2013 4:38 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, August 22, 2014 2:19 AM
Points: 109, Visits: 267
Thanks for the reply

The server has 24GB mem with SQL set to a max of 20GB and win 2008R2 having 4GB. SQL normally sits at 16GB but something is using the extra 4GB.

Am going through the DMVs you suggested to see if i can find the culprits

Andrew
Post #1434190
Posted Friday, March 22, 2013 5:09 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, August 22, 2014 2:19 AM
Points: 109, Visits: 267
ok running this query

select *
from sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st
ORDER BY qs.total_logical_reads DESC

. Am assuming that this DMV data is since the last fail over at 6:17am this morning.

1 proc has been executed 999 times with a total_logical_read = 453124440

Having never looked at these figures before its hard to analyse but, iirc that indicates that this proc is trying to read an awful lot of data for only 1000 executions. I assume that wont help memory issues

Looking in sys.dm_os_memory_clerks the biggest user of Multi_page_kb is MEMORYCLERK_SQLCLR with multi_page_kb = 59240

My only worry is that we don't use CLR code on our servers yet CLR was mentioned in the errorlog.

Post #1434200
Posted Friday, March 22, 2013 5:18 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, October 20, 2014 3:33 AM
Points: 1,325, Visits: 2,591
mealies (3/22/2013)
ok running this query

select *
from sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st
ORDER BY qs.total_logical_reads DESC

. Am assuming that this DMV data is since the last fail over at 6:17am this morning.

1 proc has been executed 999 times with a total_logical_read = 453124440

Having never looked at these figures before its hard to analyse but, iirc that indicates that this proc is trying to read an awful lot of data for only 1000 executions. I assume that wont help memory issues


Take a look at the execution plan of this procedure. If it is doing some large table/index scans then it means it is missing some required indexes.



Sujeet Singh
Post #1434202
Posted Friday, March 22, 2013 5:59 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, August 22, 2014 2:19 AM
Points: 109, Visits: 267
The execution plan wasn't pretty but no missing indexes reported. Although the SSMS toolkit plugin warns the query plan takes 5mb of space.

Looked into the SQL stack dumps and its showing the CLR is causing the memory pressure and we think we found the proc responsible. Its using Spatial Indexes and I believe spatial indexes use CLR

Going to see what calls this proc and see if we can stop it for now to see if it solves the issue.
Post #1434215
Posted Friday, March 22, 2013 6:33 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 7:56 AM
Points: 4,401, Visits: 6,262
mealies (3/22/2013)
The execution plan wasn't pretty but no missing indexes reported. Although the SSMS toolkit plugin warns the query plan takes 5mb of space.

Looked into the SQL stack dumps and its showing the CLR is causing the memory pressure and we think we found the proc responsible. Its using Spatial Indexes and I believe spatial indexes use CLR

Going to see what calls this proc and see if we can stop it for now to see if it solves the issue.


Spatial is indeed a CLR consumer under the covers - you don't need CLR enabled for it to function. Spatial is something few people have much experience with and there are a LOT of ways to zig when you should zag. You may want to get a professional to remote in and take a look at what is going on. I hate seeing production systems failing while people hunt around on forums for assistance!!

I didn't see it mentioned, but is this a 32 bit system by any chance? Those are notoriously BAD with memory issues due to the memtoleave situation and 2GB lower memory limitations. There are some things you can do to help out there though.

Also, run dbcc memorystatus and see if anything else there pops up as a problem.

It IS possible that the large-IO-consuming query is responsible, but if it didn't show very large scans and hashes that may not be it. If it has large numbers of nested-loop iterations that would explain high IO and no missing indexes. That situation is likely result of bad estimates (which have MANY causes).


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1434234
Posted Friday, March 22, 2013 7:22 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, August 22, 2014 2:19 AM
Points: 109, Visits: 267
Thanks for all the replies, and yes, hunting around on forums whilst the production system crashes & burns is never fun to watch.

We got it in the end, it was Tivoli backup process that although was turned off somehow was still being called. It was continually trying to fire and taking memory each time, until crashing SQL.

We have fully removed it from the system and the memory counters are looking like normal again.

Am off for a beer :)

Andrew
Post #1434259
Posted Friday, March 22, 2013 8:09 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 7:42 AM
Points: 4,417, Visits: 10,718
Ha! I tend to look at things inside the RDBMS first, while the problem could actually lie somewhere else. My bad.
Glad you sorted it out.

Cheers
Gianluca


--
Gianluca Sartori

How to post T-SQL questions
spaghettidba.com
@spaghettidba
Post #1434289
Posted Friday, March 22, 2013 8:25 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 7:56 AM
Points: 4,401, Visits: 6,262
Third party apps have been the cause of TREMENDOUS grief for clients over the years!! Backup, "management" apps, antivirus (LOTS of issues!!), etc...

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1434298
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse