Sudden PLE drop

Question

Post reply

Sudden PLE drop

VastSQL

SSC-Dedicated

Points: 36287
More actions
October 7, 2020 at 9:00 am

#3793537

Hi Experts,

We are observing sudden drop in PLE. Buffer cache rit hatio, check point pages/sec ,lazy writes/ sec are all normal. I tried running trace to find if any query causing issue but didnt find any. The DB is getting logged from BizTalk servers.
Can you guide me on how to troubleshoot this?

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Cebisa SSC Enthusiast Points: 151 More actions · Answer 1

I would start by down loading Glenn Berry s diagnostic queries and running these.

With page life expectancy, look at the long term trend. See Jonathan Kehayias blog about it in sql skills

http://www.sqlskills.com/blogs/Jonathan/finding-what-queries-in-the-plan-cache-use-a-specific-index

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 2

And you're seeing exactly why PLE is a horrible measure, standing on it's own. In combination with other measures, it may tell you things. Just PLE, nope. No real information there beyond the fact that you're seeing thrash in your memory. That may just be normal operation and it isn't affecting overall performance in any way. I largely ignore this measure when looking at systems now. It just doesn't convey enough information. Waits and queues are good. Sec per I/O. And, just the general query performance. All better measures of system health than PLE.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

VastSQL SSC-Dedicated Points: 36287 More actions · Answer 3

Cebisa wrote:

I would start by down loading Glenn Berry s diagnostic queries and running these.
With page life expectancy, look at the long term trend. See Jonathan Kehayias blog about it in sql skills
http://www.sqlskills.com/blogs/Jonathan/finding-what-queries-in-the-plan-cache-use-a-specific-index

Thanks Cebica.

VastSQL SSC-Dedicated Points: 36287 More actions · Answer 4

Grant Fritchey wrote:

And you're seeing exactly why PLE is a horrible measure, standing on it's own. In combination with other measures, it may tell you things. Just PLE, nope. No real information there beyond the fact that you're seeing thrash in your memory. That may just be normal operation and it isn't affecting overall performance in any way. I largely ignore this measure when looking at systems now. It just doesn't convey enough information. Waits and queues are good. Sec per I/O. And, just the general query performance. All better measures of system health than PLE.

Thanks Grant,

As mentioned in OP Buffer Cache Hit ratio, Lazy Rights\sec, Checkpoint\sec are all normal. PLE drops from 800 to 10 all of a sudden and is there anyway to track what is causing this?

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 5

Mainly you need to look for what's using memory in such a way that the pages will be flushed. Again, if everything else is fine, this is probably just normal operation. Like I said, PLE shouldn't be driving concerns and decisions.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 6

VastSQL wrote:

Grant Fritchey wrote:
And you're seeing exactly why PLE is a horrible measure, standing on it's own. In combination with other measures, it may tell you things. Just PLE, nope. No real information there beyond the fact that you're seeing thrash in your memory. That may just be normal operation and it isn't affecting overall performance in any way. I largely ignore this measure when looking at systems now. It just doesn't convey enough information. Waits and queues are good. Sec per I/O. And, just the general query performance. All better measures of system health than PLE.
Thanks Grant,
As mentioned in OP Buffer Cache Hit ratio, Lazy Rights\sec, Checkpoint\sec are all normal. PLE drops from 800 to 10 all of a sudden and is there anyway to track what is causing this?

I agree with Grant and can also confirm that, sometimes, this just happens especially when something big is executed. If it stays at less than 10 for hours on end, then you have a problem. A sudden drop followed even by as much as a 10 or 15 minute "low period" and then it starts rising again isn't a sign of a real problem. It could be a sign of some code that needs looking at but it's not a problem to bring full guns to bear on. This is especially true for "large batch runs" that run on a system that's also used for customer facing OLTP in a big way.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

VastSQL SSC-Dedicated Points: 36287 More actions · Answer 7

Jeff Moden wrote:

VastSQL wrote:
Grant Fritchey wrote:
And you're seeing exactly why PLE is a horrible measure, standing on it's own. In combination with other measures, it may tell you things. Just PLE, nope. No real information there beyond the fact that you're seeing thrash in your memory. That may just be normal operation and it isn't affecting overall performance in any way. I largely ignore this measure when looking at systems now. It just doesn't convey enough information. Waits and queues are good. Sec per I/O. And, just the general query performance. All better measures of system health than PLE.
Thanks Grant,
As mentioned in OP Buffer Cache Hit ratio, Lazy Rights\sec, Checkpoint\sec are all normal. PLE drops from 800 to 10 all of a sudden and is there anyway to track what is causing this?
I agree with Grant and can also confirm that, sometimes, this just happens especially when something big is executed. If it stays at less than 10 for hours on end, then you have a problem. A sudden drop followed even by as much as a 10 or 15 minute "low period" and then it starts rising again isn't a sign of a real problem. It could be a sign of some code that needs looking at but it's not a problem to bring full guns to bear on. This is especially true for "large batch runs" that run on a system that's also used for customer facing OLTP in a big way.

Thanks Jeff

VastSQL SSC-Dedicated Points: 36287 More actions · Answer 8

Grant Fritchey wrote:

Mainly you need to look for what's using memory in such a way that the pages will be flushed. Again, if everything else is fine, this is probably just normal operation. Like I said, PLE shouldn't be driving concerns and decisions.

Thanks Grant

Sergiy SSC Guru Points: 110209 More actions · Answer 9

Look for queries which are heavy on BLOB processing.

You mentioned BizTalk - do you have some massive FOR XML queries? It may be either massive XML (JSON, you name it) objects, or some smaller size BLOB’s taken in numbers. Or, most likely, it’s a bunch of XML’s being transformed to be consumed by BizTalk.

The other metrics mentioned in OP are mainly about writes, they won’t show anything when the Server is on it’s knees under the pressure from BizTalk. They actually may even improve as other queries have to wait in line for memory allocation till BizTalk is done with its thing.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 10

Sergiy wrote:

Look for queries which are heavy on BLOB processing.
You mentioned BizTalk - do you have some massive FOR XML queries? It may be either massive XML (JSON, you name it) objects, or some smaller size BLOB’s taken in numbers. Or, most likely, it’s a bunch of XML’s being transformed to be consumed by BizTalk.
The other metrics mentioned in OP are mainly about writes, they won’t show anything when the Server is on it’s knees under the pressure from BizTalk. They actually may even improve as other queries have to wait in line for memory allocation till BizTalk is done with its thing.

Hmmm... prior to 2005, LOBs such as Text, NText, and Image defaulted to being stored out of row. In 2005, the new MAX datatypes and bloody XML datatype defaulted to being stored in-row. That's been a huge problem on my machines because of the way SQL Server is compelled to store data in the page the key says it belongs on. I've proven that can and frequently does cause (what I call) "trapped short rows" (sometimes as little as 1 INT per page) because it has no LOB info and the adjacent pages are chock-full with a single row or two with rather large LOBs that fill the page. It's a huge waste of memory and disk space not to mention a huge source of massive page splits if the rows are inserted without the LOBs and then later updated, which is a guaranteed massive "ExpAnsive" update.

As a result, I've saved a shedload on the number of pages that a Clustered Index actually contains. I've also done similar with large VARCHARs (change them to MAX, force them to be out of row). If a table contains LOBs AND has a bloody Modified_BY varchar column, I've found that changing those to a MAX datatype wedges them nicely amongst the larger out of rows without taking up extra space. It's kind of like getting some free storage because, although the out of row data is still stored on pages, the way the pages are used for out of row storage is quite different.

Of course, to prevent page splits on the CI with out of row lobs, you have to provide (best done with a default) at least a 1 byte default to cause the pointer in the pointer column of the CI to materialize during the insert instead as a later "ExpAnsive" update.

Doing that has cut way down on performance issues due to page splits (frequently NOT controllable just by reducing the FILL FACTOR because, except for one rare instance, INSERTs will ignore the FILL FACTOR and fill pages to as close to 100% as possible).

Of course, the best thing is to NOT store XML, JSON, and lengthy comments in your database but, if that's unavoidable, folks should try forcing them out of row (use the table option followed by "in-place" updates to do so).

As a bit of a sidebar, it's a damned shame that they didn't make it a database option instead of just a table option because there's just no merit to storing LOBs in-row if they happen to fit.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

VastSQL SSC-Dedicated Points: 36287 More actions · Answer 11

We have identified a query which was running on a huge table . The query was selecting the whole table (*) without an index on where clause and the values changes on every run.

Thanks everyone for the support.