Memory Problem: Low buffer cache,low PLE, but High free page

  • Hello All,

    Our database server is currently having a strange memory issue. 

    Our database is about 800G, memory is 120G, and it is a very busy . Before, our buffer cache can went up and stayed around 90G, and PLE can be up to 3000 minutes. 
    Started from last week, the buffer cache went down between 16G to 40G, PLE went to only 40 minutes, all the others are free pages in sql memory
    Bellow is a graph on spotlight shows current memory status, about 43G is buffer cache, but the free pages(yellow line) is 65G.

    My question is : Why free pages is so high while database buffer cache and PLE is very low?

    One thing had changed since last is I started a daily job to move some some data to archive database then delete those data from production, the job is run at midnight, and runs for about 2 hours every day. Could that be the problem and why?

    Please help!!!

  • 1) You posted this on SQL 2012 forum, but I thought Free Pages went away in that version

    https://blogs.msdn.microsoft.com/vsanil/2012/11/21/did-some-performance-counters-like-free-pages-disappear-in-sql-2012/

    2) How is your IO amount and performance compared to before this started happening?

    3) Are users complaining or things not happening in an acceptable (or roughly the same) time?

    4) Are you on a virtualized server?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi Kevin,

    Thanks for the quick reply. To answer your question:

    1) You posted this on SQL 2012 forum, but I thought Free Pages went away in that version
    -----I think you are right.  Then question is:  the resource manager shows our 128G memory is all full, 6G is  for OS, Buffer cache is 44.6G,  procedure cache is 7.7G, what does the rest 76G used for?

    2) How is your IO amount and performance compared to before this started happening?
      --- Our disk IO wait time is much higher than before. 

    3) Are users complaining or things not happening in an acceptable (or roughly the same) time?
     ---   Yes, web pages ars slower, more slow queries in our log. And when the buffer was only 16G last Friday, our applications complains have many queries timed out.

    4) Are you on a virtualized server? 
    ---    We are on a physical server with 16 CPU and 128G memory

  • graciez - Tuesday, February 21, 2017 10:50 AM

    Hi Kevin,

    Thanks for the quick reply. To answer your question:

    1) You posted this on SQL 2012 forum, but I thought Free Pages went away in that version
    -----I think you are right.  Then question is:  the resource manager shows our 128G memory is all full, 6G is  for OS, Buffer cache is 44.6G,  procedure cache is 7.7G, what does the rest 76G used for?

    2) How is your IO amount and performance compared to before this started happening?
      --- Our disk IO wait time is much higher than before. 

    3) Are users complaining or things not happening in an acceptable (or roughly the same) time?
     ---   Yes, web pages ars slower, more slow queries in our log. And when the buffer was only 16G last Friday, our applications complains have many queries timed out.

    4) Are you on a virtualized server? 
    ---    We are on a physical server with 16 CPU and 128G memory

    A) I have no idea where your numbers are coming from. Try out dbcc memorystatus. You can find documentation online about it's output. It is helpful to SET NOCOUNT ON first and output to text.

    B) What changed? ANYTHING other than your moving data to an archive database? Do users access that at all? If so it is properly indexed? That's a common mistake I see...

    C) What else is running on the server and how much memory are those things taking?

    D) What build are you on? SQL Server 2012 got a completely rewritten memory management system and there were numerous bugs that were later patched.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks Kevin,
    A) DBCC MEMORYSTATUS: 
    Process/System Counts       Value
    ---------------------------------------- --------------------
    Available Physical Memory      373526528
    Available Virtual Memory      8209852002304
    Available Paging File       130824531968
    Working Set          131080753152
    Percent of Committed Memory in WS   100
    Page Faults          2170864460
    System physical memory high     1
    System physical memory low     0
    Process physical memory low     0
    Process virtual memory low     0

    Memory Manager         KB
    ---------------------------------------- -----------
    VM Reserved          571738128
    VM Committed          127745612
    Locked Pages Allocated       0
    Large Pages Allocated       0
    Emergency Memory         1024
    Emergency Memory In Use      16
    Target Committed         127748472
    Current Committed        127745616
    Pages Allocated         58964568
    Pages Reserved         0
    Pages Free           63667928
    Pages In Use          15877400
    Page Alloc Potential       105652560
    NUMA Growth Phase        2
    Last OOM Factor         0
    Last OS Error          0

    Memory node Id = 0        KB
    ---------------------------------------- -----------
    VM Reserved          571730960
    VM Committed          63874220
    Locked Pages Allocated       0
    Pages Allocated         45770952
    Pages Free           14661696
    Target Committed         63874224
    Current Committed        63874224
    Foreign Committed        1435912
    Away Committed         0
    Taken Away Committed       0

    Memory node Id = 1        KB
    ---------------------------------------- -----------
    VM Reserved          7104
    VM Committed          63871372
    Locked Pages Allocated       0
    Pages Allocated         13193624
    Pages Free           49006232
    Target Committed         63874224
    Current Committed        63871376
    Foreign Committed        495180
    Away Committed         0
    Taken Away Committed       0

    B) What changed? ANYTHING other than your moving data to an archive database? Do users access that at all? If so it is properly indexed? That's a common mistake I see...
    ----No, only the moving data script.  Users barley read the archived data and archived data is also indexed link production. 

    C) What else is running on the server and how much memory are those things taking?
    --- Only SQL server is runninng on the server. 

    D) What build are you on? SQL Server 2012 got a completely rewritten memory management system and there were numerous bugs that were later patched.
    Our version is :
    --- Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

  • Is the archiving of the data something new or you just changed when it runs? If it's new, it could be that you're getting out of date statistics after the data move leading to poor plan choices. That can affect memory and other things.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • There's a lot more to dbcc memorystatus than what you posted. What did your review of the documentation online reveal? 

    https://support.microsoft.com/en-us/help/907877/how-to-use-the-dbcc-memorystatus-command-to-monitor-memory-usage-on-sql-server-2005

    Also review this:

    https://www.simple-talk.com/sql/database-administration/why-is-that-sql-server-instance-under-stress/

    And go get Glenn Berry's SQL Server Diagnostic Scripts. Lots of Awesomesauce in it.

    BTW, you didn't give the build of sql server. You gave the build of the OS. And I will never understand people paying for Enterprise Edition of SQL Server and hobbling it with a paltry 128GB of RAM.

    Have you tried rebooting the server? If you have a memory leak (there were several identified in SQL 2012 over the years) that should get you functional again until you can test out upgrading to the most recent build.

    Given that this has been going on for days and significant performance problems exist (including user queries timing out), I STRONGLY recommend you hire a performance tuning consultant for a quick review, especially if a reboot didn't help or you cannot do one soon.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • The archiving is something new. It moves one days of data from production to archive db to keep production only contain 180 days data.
    After archiving job completed every night, I run sp_updatestats to update statistics, should that fix the statistics? If not, what can I do with it?
    Also, most of our queries is query most recent data(few weeks the most), they all have date range in where clause.  
    I don't understand who archiving old data can affect low memory usage?

  • OK, so sp_updatestats uses sampling to create the statistics? Do you also do index rebuilds at any point? These create statistics using full scan. You may just be putting your stats out of date using sp_updatestats. Hard to know for sure, but this is a possible culprit, and it's the one thing that you're sure you've changed. If the stats are out of date or incorrect, you can get plans that allocate more memory than you use. Also, since we're talking 2012, you don't have access to the new cardinality estimation engine. This means that new rows that are added to an index fall outside the statistics. In 2012 and prior, the optimizer assumes one row. That usually results in under-, not over-estimates of memory, but this could be adding to your problems.

    You need to determine if your execution plans have changed because of the changes you've introduced in the system.

    Also, if you're deleting a lot of data, that could be flushing the cache and certainly could be killing PLE (which is a very secondary measurement, don't get hooked on worrying about it). Although, if most of the queries are only after recent data and you're moving old data, this may be a minimal impact. However, it's still likely.

    When system behavior changes, and you've changed a process, that's almost always the culprit in one way or another.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • TheSQLGuru - Tuesday, February 21, 2017 11:54 AM

    There's a lot more to dbcc memorystatus than what you posted. What did your review of the documentation online reveal? 

    https://support.microsoft.com/en-us/help/907877/how-to-use-the-dbcc-memorystatus-command-to-monitor-memory-usage-on-sql-server-2005

    Also review this:

    https://www.simple-talk.com/sql/database-administration/why-is-that-sql-server-instance-under-stress/

    And go get Glenn Berry's SQL Server Diagnostic Scripts. Lots of Awesomesauce in it.

    BTW, you didn't give the build of sql server. You gave the build of the OS. And I will never understand people paying for Enterprise Edition of SQL Server and hobbling it with a paltry 128GB of RAM.

    Have you tried rebooting the server? If you have a memory leak (there were several identified in SQL 2012 over the years) that should get you functional again until you can test out upgrading to the most recent build.

    Given that this has been going on for days and significant performance problems exist (including user queries timing out), I STRONGLY recommend you hire a performance tuning consultant for a quick review, especially if a reboot didn't help or you cannot do one soon.

    thanks for these information, I will read the document. 
    Is this build of sql server?  

    Microsoft SQL Server 2012 (SP2-CU9) (KB3098512) - 11.0.5641.0 (X64)
        Oct 20 2015 16:40:19
        Copyright (c) Microsoft Corporation
        Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

    We just move to this server on March 2016. Do you think a reboot is needed?
    I had stopped the archiving job and check if buffer cache can go up again. 

  • That is the SQL Server build number. You are 14 months behind as CU7 for SP3, build 6579, was released Jan 17, 2017. It is absolutely possible that you have memory (and/or other) issues simply because you are not patched up.

    I cannot see that running sp_updatestats will get you out of date stats. It could make them a bit less accurate than the full scan you would get on index-based stats if you happen to do index rebuilds regularly. But in my experience it would be truly exceptional for this to be causing a signficant problem.

    It also sounds like you are having performance problems regularly, not just after you do the batch delete, right? If so then if they were the sole cause I would definitely expect your system to bounce back pretty quickly as needed pages were lifted back up into memory as they were accessed.

    As Grant says it is still likely that your issues are somehow due to the new process. But it may not be the actual movement that is at fault. Maybe something weird about your table/index structure, like you accidentally put your FILLFACTOR at 1. :w00t:

    DOH!! It just dawned on me that the picture in your initial post indicates that you almost certainly have a monitoring system in place! How do all relevant indicators track in the days/weeks leading up to your new process with how they track after it? Also, are there significant variations at any point since the change was made?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • If you can get past the crisis, you may want to look at a good solid jump in the overall RAM on this box.   You stated you have an 800 GB database and that it's fairly busy.   That just seems extremely under-powered.   It may turn out that you were on the edge of a resource shortage, and the change in process just pushed you over that edge.   You should probably also get folks to consider some kind of schedule of regular re-boots.   If you can get current on your SQL Server version, that may become less important, but until you do, it might be a good shot in the arm.   This problem is just kind of "screaming in my ear" that there's a lack of RAM, and there may be multiple culprits.   Going over a resource cliff can be a bit of a "rude awakening", and suggests that even if you ultimately determine that you need to bump up the RAM significantly, you may not have the kind of monitoring in place that can alert you to an approaching edge for resource consumption.   I wish I had a useful monitoring recommendation, but the best I can do there is echo Grant Fritchey, as he's a RedGate product evangelist, and I've heard a lot of good things about their tools.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thank you! We are actually planning move to a more power server. 
    I had stopped the archive script for 2 days but buffer cache did not go up. It stays around 40G, 2 weeks ago, it was around 60 to 80G. 
    Another guess, cause we don't have any changes on applications or database schema/data,   could it be a hard ware issue? Memory fragmentation?

  • graciez - Thursday, February 23, 2017 1:11 PM

    Thank you! We are actually planning move to a more power server. 
    I had stopped the archive script for 2 days but buffer cache did not go up. It stays around 40G, 2 weeks ago, it was around 60 to 80G. 
    Another guess, cause we don't have any changes on applications or database schema/data,   could it be a hard ware issue? Memory fragmentation?

    So you're saying that you stop running the script and everything is pretty much fine?    In my mind, that says resource cliff.   Sit on the edge and be careful, and you're fine, but lean too far over to try and see the bottom of the canyon, and you fall over it...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • graciez - Thursday, February 23, 2017 1:11 PM

    Thank you! We are actually planning move to a more power server. 
    I had stopped the archive script for 2 days but buffer cache did not go up. It stays around 40G, 2 weeks ago, it was around 60 to 80G. 
    Another guess, cause we don't have any changes on applications or database schema/data,   could it be a hard ware issue? Memory fragmentation?

    I am sorry to say this, but it is very unlikely that we can do anything more for you via a forum graciez. There are just too many things that could be at play here. I do not however, believe that memory fragmentation would be one of them though.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply