Sudden PLE drop

  • VastSQL

    SSC-Dedicated

    Points: 35924

    Hi Experts,

     

    We are observing sudden drop in PLE. Buffer cache rit hatio, check point pages/sec ,lazy writes/ sec are all normal. I tried running trace to find if any query causing issue but didnt find any. The DB is getting logged from BizTalk servers.

    Can you guide me on how to troubleshoot this?

  • Cebisa

    SSC Enthusiast

    Points: 112

    I would start by down loading Glenn Berry s diagnostic queries and running these.

    With page life expectancy, look at the long term trend. See Jonathan Kehayias blog about it in sql skills

    http://www.sqlskills.com/blogs/Jonathan/finding-what-queries-in-the-plan-cache-use-a-specific-index

     

  • Grant Fritchey

    SSC Guru

    Points: 396763

    And you're seeing exactly why PLE is a horrible measure, standing on it's own. In combination with other measures, it may tell you things. Just PLE, nope. No real information there beyond the fact that you're seeing thrash in your memory. That may just be normal operation and it isn't affecting overall performance in any way. I largely ignore this measure when looking at systems now. It just doesn't convey enough information. Waits and queues are good. Sec per I/O. And, just the general query performance. All better measures of system health than PLE.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • VastSQL

    SSC-Dedicated

    Points: 35924

    Cebisa wrote:

    I would start by down loading Glenn Berry s diagnostic queries and running these.

    With page life expectancy, look at the long term trend. See Jonathan Kehayias blog about it in sql skills

    http://www.sqlskills.com/blogs/Jonathan/finding-what-queries-in-the-plan-cache-use-a-specific-index

     

    Thanks Cebica.

  • VastSQL

    SSC-Dedicated

    Points: 35924

    Grant Fritchey wrote:

    And you're seeing exactly why PLE is a horrible measure, standing on it's own. In combination with other measures, it may tell you things. Just PLE, nope. No real information there beyond the fact that you're seeing thrash in your memory. That may just be normal operation and it isn't affecting overall performance in any way. I largely ignore this measure when looking at systems now. It just doesn't convey enough information. Waits and queues are good. Sec per I/O. And, just the general query performance. All better measures of system health than PLE.

    Thanks Grant,

    As mentioned in OP Buffer Cache Hit ratio, Lazy Rights\sec, Checkpoint\sec are all normal. PLE drops from 800 to 10 all of a sudden and is there anyway to track what is causing this?

  • Grant Fritchey

    SSC Guru

    Points: 396763

    Mainly you need to look for what's using memory in such a way that the pages will be flushed. Again, if everything else is fine, this is probably just normal operation. Like I said, PLE shouldn't be driving concerns and decisions.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • Jeff Moden

    SSC Guru

    Points: 997335

    VastSQL wrote:

    Grant Fritchey wrote:

    And you're seeing exactly why PLE is a horrible measure, standing on it's own. In combination with other measures, it may tell you things. Just PLE, nope. No real information there beyond the fact that you're seeing thrash in your memory. That may just be normal operation and it isn't affecting overall performance in any way. I largely ignore this measure when looking at systems now. It just doesn't convey enough information. Waits and queues are good. Sec per I/O. And, just the general query performance. All better measures of system health than PLE.

    Thanks Grant,

    As mentioned in OP Buffer Cache Hit ratio, Lazy Rights\sec, Checkpoint\sec are all normal. PLE drops from 800 to 10 all of a sudden and is there anyway to track what is causing this?

    I agree with Grant and can also confirm that, sometimes, this just happens especially when something big is executed.  If it stays at less than 10 for hours on end, then you have a problem.  A sudden drop followed even by as much as a 10 or 15 minute "low period" and then it starts rising again isn't a sign of a real problem.  It could be a sign of some code that needs looking at but it's not a problem to bring full guns to bear on.  This is especially true for "large batch runs" that run on a system that's also used for customer facing OLTP in a big way.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • VastSQL

    SSC-Dedicated

    Points: 35924

    Jeff Moden wrote:

    VastSQL wrote:

    Grant Fritchey wrote:

    And you're seeing exactly why PLE is a horrible measure, standing on it's own. In combination with other measures, it may tell you things. Just PLE, nope. No real information there beyond the fact that you're seeing thrash in your memory. That may just be normal operation and it isn't affecting overall performance in any way. I largely ignore this measure when looking at systems now. It just doesn't convey enough information. Waits and queues are good. Sec per I/O. And, just the general query performance. All better measures of system health than PLE.

    Thanks Grant,

    As mentioned in OP Buffer Cache Hit ratio, Lazy Rights\sec, Checkpoint\sec are all normal. PLE drops from 800 to 10 all of a sudden and is there anyway to track what is causing this?

    I agree with Grant and can also confirm that, sometimes, this just happens especially when something big is executed.  If it stays at less than 10 for hours on end, then you have a problem.  A sudden drop followed even by as much as a 10 or 15 minute "low period" and then it starts rising again isn't a sign of a real problem.  It could be a sign of some code that needs looking at but it's not a problem to bring full guns to bear on.  This is especially true for "large batch runs" that run on a system that's also used for customer facing OLTP in a big way.

    Thanks Jeff

  • VastSQL

    SSC-Dedicated

    Points: 35924

    Grant Fritchey wrote:

    Mainly you need to look for what's using memory in such a way that the pages will be flushed. Again, if everything else is fine, this is probably just normal operation. Like I said, PLE shouldn't be driving concerns and decisions.

    Thanks Grant

  • Sergiy

    SSC Guru

    Points: 109884

    Look for queries which are heavy on BLOB processing.

    You mentioned BizTalk - do you have some massive FOR XML queries? It may be either massive XML (JSON, you name it) objects, or some smaller size BLOB’s taken in numbers. Or, most likely, it’s a bunch of XML’s being transformed to be consumed by BizTalk.

    The other metrics mentioned in OP are mainly about writes, they won’t show anything when the Server is on it’s knees under the pressure from BizTalk. They actually may even improve as other queries have to wait in line for memory allocation till BizTalk is done with its thing.

  • Jeff Moden

    SSC Guru

    Points: 997335

    Sergiy wrote:

    Look for queries which are heavy on BLOB processing.

    You mentioned BizTalk - do you have some massive FOR XML queries? It may be either massive XML (JSON, you name it) objects, or some smaller size BLOB’s taken in numbers. Or, most likely, it’s a bunch of XML’s being transformed to be consumed by BizTalk.

    The other metrics mentioned in OP are mainly about writes, they won’t show anything when the Server is on it’s knees under the pressure from BizTalk. They actually may even improve as other queries have to wait in line for memory allocation till BizTalk is done with its thing.

    Hmmm... prior to 2005, LOBs such as Text, NText, and Image defaulted to being stored out of row.  In 2005, the new MAX datatypes and bloody XML datatype defaulted to being stored in-row.  That's been a huge problem on my machines because of the way SQL Server is compelled to store data in the page the key says it belongs on.  I've proven that can and frequently does cause (what I call) "trapped short rows" (sometimes as little as 1 INT per page) because it has no LOB info and the adjacent pages are chock-full with a single row or two with rather large LOBs that fill the page.  It's a huge waste of memory and disk space not to mention a huge source of massive page splits if the rows are inserted without the LOBs and then later updated, which is a guaranteed massive "ExpAnsive" update.

    As a result, I've saved a shedload on the number of pages that a Clustered Index actually contains.  I've also done similar with large VARCHARs (change them to MAX, force them to be out of row).  If a table contains LOBs AND has a bloody Modified_BY varchar column, I've found that changing those to a MAX datatype wedges them nicely amongst the larger out of rows without taking up extra space.  It's kind of like getting some free storage because, although the out of row data is still stored on pages, the way the pages are used for out of row storage is quite different.

    Of course, to prevent page splits on the CI with out of row lobs, you have to provide (best done with a default) at least a 1 byte default to cause the pointer in the pointer column of the CI to materialize during the insert instead as a later "ExpAnsive" update.

    Doing that has cut way down on performance issues due to page splits (frequently NOT controllable just by reducing the FILL FACTOR because, except for one rare instance, INSERTs will ignore the FILL FACTOR and fill pages to as close to 100% as possible).

    Of course, the best thing is to NOT store XML, JSON, and lengthy comments in your database but, if that's unavoidable, folks should try forcing them out of row (use the table option followed by "in-place" updates to do so).

    As a bit of a sidebar, it's a damned shame that they didn't make it a database option instead of just a table option because there's just no merit to storing LOBs in-row if they happen to fit.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • VastSQL

    SSC-Dedicated

    Points: 35924

    We have identified a query which was running on a huge table . The query was selecting the whole table (*) without an index on where clause and the values changes on every run.

    Thanks everyone for the support.

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply