SQL Memory usage suddenly very high - Causing Cluster Failovers

  • We've noticed that memory usage on the DB cluster had suddenly changed. Normally PLE is fairly stable staying at around 6000 seconds. In the afternoon the PLE suddenly dropped from 6000 to under 300 and bounced between 0 & 600 averaging 300 for the rest of the day. At 6:11pm the cluster failed over, again at 00:12 and latest failover at 6:17am this morning.

    After the last failover, the PLE and memory usage went back till normal until approx 8:07am when the PLE suddenly went to 0 and SQL memory usage jumped up by approx 4GB

    In the SQL Error log before each cluster failover we get these messages:

    03/21/2013 18:00:13,spid27s,Unknown,AppDomain 796 (mssqlsystemresource.sys[runtime].1143) unloaded.

    03/21/2013 18:00:13,spid1s,Unknown,AppDomain 796 (mssqlsystemresource.sys[runtime].1143) is marked for unload due to memory pressure.

    then

    03/21/2013 18:11:53,Logon,Unknown,Error: 18451<c/> Severity: 14<c/> State: 1.

    03/21/2013 18:11:52,Logon,Unknown,Login failed for user 'swconsumer'. Only administrators may connect at this time. [CLIENT: 10.0.100.38]

    03/21/2013 18:11:52,Logon,Unknown,Error: 18451<c/> Severity: 14<c/> State: 1.

    03/21/2013 18:11:52,Logon,Unknown,Login failed for user 'swbroker'. Only administrators may connect at this time. [CLIENT: 10.0.200.31]

    03/21/2013 18:11:52,Logon,Unknown,Error: 18451<c/> Severity: 14<c/> State: 1.

    03/21/2013 18:11:52,Logon,Unknown,Login failed for user 'swapi'. Only administrators may connect at this time. [CLIENT: 10.0.200.27]

    03/21/2013 18:11:52,Logon,Unknown,Error: 18451<c/> Severity: 14<c/> State: 1.

    03/21/2013 18:11:52,spid327,Unknown,The client was unable to reuse a session with SPID 327<c/> which had been reset for connection pooling. The failure ID is 23. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

    at which point SQL Shuts down and causes a cluster fail-over

    We had a software release yesterday, so I assume that a change from that is causing these issues. Question is, where do I start looking to find the cause of this?

    Any help would be much appreciated

    Andrew

  • I would start looking into sys.dm_exec_query_stats to find the queries that are not behaving as normal per your baseline.

    Queries that use a lot of memory must be reading lots of data from disk, so I would start looking for rows with high total_logical_reads / execution_count-

    Also, the error you are getting is likely a symptom that MAX_SERVER_MEMORY was not set correctly and both SQL Server and OS are starving. Set MAX_SERVER_MEMORY to a reasonable limit. If you already set it, you are probably getting more multi-page allocations than usual. As you probably know, MAX_SERVER_MEMORY does not limit multi-page allocations and you should be looking for those in sys.dm_os_memory_clerks.

    Hope this helps

    Gianluca

    -- Gianluca Sartori

  • Thanks for the reply

    The server has 24GB mem with SQL set to a max of 20GB and win 2008R2 having 4GB. SQL normally sits at 16GB but something is using the extra 4GB.

    Am going through the DMVs you suggested to see if i can find the culprits

    Andrew

  • ok running this query

    select *

    from sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st

    ORDER BY qs.total_logical_reads DESC

    . Am assuming that this DMV data is since the last fail over at 6:17am this morning.

    1 proc has been executed 999 times with a total_logical_read = 453124440

    Having never looked at these figures before its hard to analyse but, iirc that indicates that this proc is trying to read an awful lot of data for only 1000 executions. I assume that wont help memory issues

    Looking in sys.dm_os_memory_clerks the biggest user of Multi_page_kb is MEMORYCLERK_SQLCLR with multi_page_kb = 59240

    My only worry is that we don't use CLR code on our servers yet CLR was mentioned in the errorlog.

  • mealies (3/22/2013)


    ok running this query

    select *

    from sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st

    ORDER BY qs.total_logical_reads DESC

    . Am assuming that this DMV data is since the last fail over at 6:17am this morning.

    1 proc has been executed 999 times with a total_logical_read = 453124440

    Having never looked at these figures before its hard to analyse but, iirc that indicates that this proc is trying to read an awful lot of data for only 1000 executions. I assume that wont help memory issues

    Take a look at the execution plan of this procedure. If it is doing some large table/index scans then it means it is missing some required indexes.


    Sujeet Singh

  • The execution plan wasn't pretty but no missing indexes reported. Although the SSMS toolkit plugin warns the query plan takes 5mb of space.

    Looked into the SQL stack dumps and its showing the CLR is causing the memory pressure and we think we found the proc responsible. Its using Spatial Indexes and I believe spatial indexes use CLR

    Going to see what calls this proc and see if we can stop it for now to see if it solves the issue.

  • mealies (3/22/2013)


    The execution plan wasn't pretty but no missing indexes reported. Although the SSMS toolkit plugin warns the query plan takes 5mb of space.

    Looked into the SQL stack dumps and its showing the CLR is causing the memory pressure and we think we found the proc responsible. Its using Spatial Indexes and I believe spatial indexes use CLR

    Going to see what calls this proc and see if we can stop it for now to see if it solves the issue.

    Spatial is indeed a CLR consumer under the covers - you don't need CLR enabled for it to function. Spatial is something few people have much experience with and there are a LOT of ways to zig when you should zag. You may want to get a professional to remote in and take a look at what is going on. I hate seeing production systems failing while people hunt around on forums for assistance!! :w00t:

    I didn't see it mentioned, but is this a 32 bit system by any chance? Those are notoriously BAD with memory issues due to the memtoleave situation and 2GB lower memory limitations. There are some things you can do to help out there though.

    Also, run dbcc memorystatus and see if anything else there pops up as a problem.

    It IS possible that the large-IO-consuming query is responsible, but if it didn't show very large scans and hashes that may not be it. If it has large numbers of nested-loop iterations that would explain high IO and no missing indexes. That situation is likely result of bad estimates (which have MANY causes).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks for all the replies, and yes, hunting around on forums whilst the production system crashes & burns is never fun to watch.

    We got it in the end, it was Tivoli backup process that although was turned off somehow was still being called. It was continually trying to fire and taking memory each time, until crashing SQL.

    We have fully removed it from the system and the memory counters are looking like normal again.

    Am off for a beer 🙂

    Andrew

  • Ha! I tend to look at things inside the RDBMS first, while the problem could actually lie somewhere else. My bad.

    Glad you sorted it out.

    Cheers

    Gianluca

    -- Gianluca Sartori

  • Third party apps have been the cause of TREMENDOUS grief for clients over the years!! Backup, "management" apps, antivirus (LOTS of issues!!), etc...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • @arvisam on #sqlhelp has asked me to provide memorystatus output. Here is truncated output

    Memory Manager KB

    ---------------------------------------- -----------

    VM Reserved 34954104

    VM Committed 21951148

    Locked Pages Allocated 0

    Reserved Memory 1024

    Reserved Memory In Use 0

    (5 row(s) affected)

    Memory node Id = 0 KB

    ---------------------------------------- -----------

    VM Reserved 11200

    VM Committed 10776

    Locked Pages Allocated 0

    MultiPage Allocator 9080

    SinglePage Allocator 1312344

    (5 row(s) affected)

    Memory node Id = 1 KB

    ---------------------------------------- -----------

    VM Reserved 34939064

    VM Committed 21936680

    Locked Pages Allocated 0

    MultiPage Allocator 156432

    SinglePage Allocator 1312344

    (5 row(s) affected)

    Memory node Id = 64 KB

    ---------------------------------------- -----------

    VM Reserved 2752

    VM Committed 2656

    Locked Pages Allocated 0

    MultiPage Allocator 2576

    SinglePage Allocator 1312344

    (5 row(s) affected)

    MEMORYCLERK_SQLGENERAL (node 0) KB

    ---------------------------------------- -----------

    VM Reserved 0

    VM Committed 0

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 4040

    MultiPage Allocator 11664

    (7 row(s) affected)

    MEMORYCLERK_SQLGENERAL (node 1) KB

    ---------------------------------------- -----------

    VM Reserved 0

    VM Committed 0

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 104

    MultiPage Allocator 0

    (7 row(s) affected)

    MEMORYCLERK_SQLGENERAL (Total) KB

    ---------------------------------------- -----------

    VM Reserved 0

    VM Committed 0

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 4144

    MultiPage Allocator 11664

    (7 row(s) affected)

    MEMORYCLERK_SQLBUFFERPOOL (node 0) KB

    ---------------------------------------- -----------

    VM Reserved 25214976

    VM Committed 21289936

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 0

    MultiPage Allocator 592

    (7 row(s) affected)

    MEMORYCLERK_SQLQUERYEXEC (node 0) KB

    ---------------------------------------- -----------

    VM Reserved 0

    VM Committed 0

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 496

    MultiPage Allocator 0

    (7 row(s) affected)

    MEMORYCLERK_SQLOPTIMIZER (node 0) KB

    ---------------------------------------- -----------

    VM Reserved 0

    VM Committed 0

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 408

    MultiPage Allocator 944

    (7 row(s) affected)

    MEMORYCLERK_SQLUTILITIES (node 0) KB

    ---------------------------------------- -----------

    VM Reserved 120

    VM Committed 120

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 176

    MultiPage Allocator 0

    (7 row(s) affected)

    MEMORYCLERK_SQLSTORENG (node 0) KB

    ---------------------------------------- -----------

    VM Reserved 2304

    VM Committed 2304

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 8904

    MultiPage Allocator 984

    (7 row(s) affected)

    MEMORYCLERK_SQLSTORENG (node 1) KB

    ---------------------------------------- -----------

    VM Reserved 0

    VM Committed 0

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 1488

    MultiPage Allocator 3800

    (7 row(s) affected)

    MEMORYCLERK_SQLSTORENG (Total) KB

    ---------------------------------------- -----------

    VM Reserved 2304

    VM Committed 2304

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 10392

    MultiPage Allocator 4784

    (7 row(s) affected)

    MEMORYCLERK_SQLCONNECTIONPOOL (node 0) KB

    ---------------------------------------- -----------

    VM Reserved 0

    VM Committed 0

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 2152

    MultiPage Allocator 0

    (7 row(s) affected)

    MEMORYCLERK_SQLCONNECTIONPOOL (node 1) KB

    ---------------------------------------- -----------

    VM Reserved 0

    VM Committed 0

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 2104

    MultiPage Allocator 0

    (7 row(s) affected)

    MEMORYCLERK_SQLCONNECTIONPOOL (Total) KB

    ---------------------------------------- -----------

    VM Reserved 0

    VM Committed 0

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 4256

    MultiPage Allocator 0

    (7 row(s) affected)

    MEMORYCLERK_SQLCLR (node 0) KB

    ---------------------------------------- -----------

    VM Reserved 9467900

    VM Committed 415868

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 1672

    MultiPage Allocator 58776

    (7 row(s) affected)

    MEMORYCLERK_SQLSERVICEBROKER (node 0) KB

    ---------------------------------------- -----------

    VM Reserved 0

    VM Committed 0

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 152

    MultiPage Allocator 544

    (7 row(s) affected)

    MEMORYCLERK_SQLHTTP (node 0) KB

    ---------------------------------------- -----------

    VM Reserved 0

    VM Committed 0

    Locked Pages Allocated 0

    SM Reserved 0

    SM Committed 0

    SinglePage Allocator 8

    MultiPage Allocator 0

    (7 row(s) affected)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply