PLE (Page life expectancy).

  • On a production system we have a PLE which is low at some moments.

    So since yesterday afternoon we have run a monitoring batch registering the PLE every two minutes and the cache size for each database.

    Most cache is used by only one database.

    Backup does not influence the PLE.

    CheckDB clears the complete cache.

    CheckDB is started at 04:00.

    The lowest size of the cache we see for the specific databas is

    just over 7000 Mb.

    The lowest PLE we see is 7.

    My conclusion is that 7 seconds is the average PLE. So the complete cache been replaces in the last 14 seconds. So that would be 0.5 Gb per second. (7 Gb in 14 seconds gives an average PLE of 7 seconds)?

    This seems a lot to me, is my reasoning in error?

    CheckDB effectively destroys the cache. (Why?).

    Another thing I saw.

    (delta PLE) / (delta time) is often 0 (zero), 0.5 or 1

    Why is this ?

    The number 1 is logical, during a delta time, the ple can gain the same time, not more.

    I cannot explain the 0.5 and the 0. I see periods of 25 minutes where the PLE is gaining 1 minute every two minutes.

    Outside the checkDB timeframe the larges drops I see in the PLE is from 6952 to 3830 in two minutes, from 4103 to 1547 in two minutes and from 3650 to 2058 in two minutes. (Cache size around 8000 Mb).

    Can I calculate how much MB is read in the two minutes from this ?

    (My estimate is that abouth half of the 8 Gb must have been read in those 2 minutes to between the drop of PLE. Large question to me).

    Before and after the drop the PLE often did not change at all for some timeperiods. I do not understand this behavior.

    (PLE long time no change, significant drop, then again long time no change).

    Any suggestions how to catch the processes which cause the drop in PLE ?

    Today I will be monitoring, now a PLE is registered every 30 seconds.

    Yesterday during the day the PLE was as low as about 200. (no timing, no registering of that). Today we hope to catch the situation and the cullprit.

    Any advise is welcome.

    Thanks for your time and attention.

    Ben Brugman

  • ben.brugman (1/22/2013)


    Any suggestions how to catch the processes which cause the drop in PLE ?

    it means you have buffer cache issue and IO contentions

    see these links

    http://www.sqlservercentral.com/articles/Performance+Tuning/analyzingmemoryrequirementsforsqlserver/2498/

    http://sqlblog.com/blogs/buck_woody/archive/2010/06/29/the-windows-page-file-and-sql-server.aspx

    http://support.microsoft.com/kb/889654/en-us

    and take help from these

    /**********************************************************

    * top procedures memory consumption per execution

    * (this will show mostly reports & jobs)

    ***********************************************************/

    SELECT TOP 100 *

    FROM

    (

    SELECT

    DatabaseName = DB_NAME(qt.dbid)

    ,ObjectName = OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)

    ,DiskReads = SUM(qs.total_physical_reads) -- The worst reads, disk reads

    ,MemoryReads = SUM(qs.total_logical_reads) --Logical Reads are memory reads

    ,Executions = SUM(qs.execution_count)

    ,IO_Per_Execution = SUM((qs.total_physical_reads + qs.total_logical_reads) / qs.execution_count)

    ,CPUTime = SUM(qs.total_worker_time)

    ,DiskWaitAndCPUTime = SUM(qs.total_elapsed_time)

    ,MemoryWrites = SUM(qs.max_logical_writes)

    ,DateLastExecuted = MAX(qs.last_execution_time)

    FROM sys.dm_exec_query_stats AS qs

    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt

    GROUP BY DB_NAME(qt.dbid), OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)

    ) T

    ORDER BY IO_Per_Execution DESC

    /**********************************************************

    * top procedures memory consumption total

    * (this will show more operational procedures)

    ***********************************************************/

    SELECT TOP 100 *

    FROM

    (

    SELECT

    DatabaseName = DB_NAME(qt.dbid)

    ,ObjectName = OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)

    ,DiskReads = SUM(qs.total_physical_reads) -- The worst reads, disk reads

    ,MemoryReads = SUM(qs.total_logical_reads) --Logical Reads are memory reads

    ,Total_IO_Reads = SUM(qs.total_physical_reads + qs.total_logical_reads)

    ,Executions = SUM(qs.execution_count)

    ,IO_Per_Execution = SUM((qs.total_physical_reads + qs.total_logical_reads) / qs.execution_count)

    ,CPUTime = SUM(qs.total_worker_time)

    ,DiskWaitAndCPUTime = SUM(qs.total_elapsed_time)

    ,MemoryWrites = SUM(qs.max_logical_writes)

    ,DateLastExecuted = MAX(qs.last_execution_time)

    FROM sys.dm_exec_query_stats AS qs

    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt

    GROUP BY DB_NAME(qt.dbid), OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)

    ) T

    ORDER BY Total_IO_Reads DESC

    /**********************************************************

    * top adhoc queries memory consumption total

    ***********************************************************/

    SELECT TOP 100 *

    FROM

    (

    SELECT

    DatabaseName = DB_NAME(qt.dbid)

    ,QueryText = qt.text

    ,DiskReads = SUM(qs.total_physical_reads) -- The worst reads, disk reads

    ,MemoryReads = SUM(qs.total_logical_reads) --Logical Reads are memory reads

    ,Total_IO_Reads = SUM(qs.total_physical_reads + qs.total_logical_reads)

    ,Executions = SUM(qs.execution_count)

    ,IO_Per_Execution = SUM((qs.total_physical_reads + qs.total_logical_reads) / qs.execution_count)

    ,CPUTime = SUM(qs.total_worker_time)

    ,DiskWaitAndCPUTime = SUM(qs.total_elapsed_time)

    ,MemoryWrites = SUM(qs.max_logical_writes)

    ,DateLastExecuted = MAX(qs.last_execution_time)

    FROM sys.dm_exec_query_stats AS qs

    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt

    WHERE OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid) IS NULL

    GROUP BY DB_NAME(qt.dbid), qt.text, OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)

    ) T

    ORDER BY Total_IO_Reads DESC

    /**********************************************************

    * top adhoc queries memory consumption per execution

    ***********************************************************/

    SELECT TOP 100 *

    FROM

    (

    SELECT

    DatabaseName = DB_NAME(qt.dbid)

    ,QueryText = qt.text

    ,DiskReads = SUM(qs.total_physical_reads) -- The worst reads, disk reads

    ,MemoryReads = SUM(qs.total_logical_reads) --Logical Reads are memory reads

    ,Total_IO_Reads = SUM(qs.total_physical_reads + qs.total_logical_reads)

    ,Executions = SUM(qs.execution_count)

    ,IO_Per_Execution = SUM((qs.total_physical_reads + qs.total_logical_reads) / qs.execution_count)

    ,CPUTime = SUM(qs.total_worker_time)

    ,DiskWaitAndCPUTime = SUM(qs.total_elapsed_time)

    ,MemoryWrites = SUM(qs.max_logical_writes)

    ,DateLastExecuted = MAX(qs.last_execution_time)

    FROM sys.dm_exec_query_stats AS qs

    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt

    WHERE OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid) IS NULL

    GROUP BY DB_NAME(qt.dbid), qt.text, OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)

    ) T

    ORDER BY IO_Per_Execution DESC

    AND

    --Bpool stats

    select (bpool_committed * 8192)/ (1024*1024) as bpool_committed_mb

    , (cast(bpool_commit_target as bigint) * 8192) / (1024*1024) as bpool_target_mb,

    (bpool_visible * 8192) / (1024*1024) as bpool_visible_mb

    from sys.dm_os_sys_info

    go

    -- Get me physical RAM installed

    -- and size of user VAS

    select physical_memory_in_bytes/(1024*1024) as phys_mem_mb,

    virtual_memory_in_bytes/(1024*1024) as user_virtual_address_space_size

    from sys.dm_os_sys_info

    go

    --

    -- Get me other information about system memory

    --

    select total_physical_memory_kb/(1024) as phys_mem_mb,

    available_physical_memory_kb/(1024) as avail_phys_mem_mb,

    system_cache_kb/(1024) as sys_cache_mb,

    (kernel_paged_pool_kb+kernel_nonpaged_pool_kb)/(1024) as kernel_pool_mb,

    total_page_file_kb/(1024) as total_virtual_memory_mb,

    available_page_file_kb/(1024) as available_virtual_memory_mb,

    system_memory_state_desc

    from sys.dm_os_sys_memory

    go

    -- Get me memory information about SQLSERVR.EXE process

    -- GetMemoryProcessInfo() API used for this

    -- physical_memory_in_use_kb

    select physical_memory_in_use_kb/(1024) as sql_physmem_inuse_mb,

    locked_page_allocations_kb/(1024) as awe_memory_mb,

    total_virtual_address_space_kb/(1024) as max_vas_mb,

    virtual_address_space_committed_kb/(1024) as sql_committed_mb,

    memory_utilization_percentage as working_set_percentage,

    virtual_address_space_available_kb/(1024) as vas_available_mb,

    process_physical_memory_low as is_there_external_pressure,

    process_virtual_memory_low as is_there_vas_pressure

    from sys.dm_os_process_memory

    go

    select * from sys.dm_os_ring_buffers

    where ring_buffer_type like 'RING_BUFFER_RESOURCE%'

    go

    select memory_node_id as node, virtual_address_space_reserved_kb/(1024) as VAS_reserved_mb,

    virtual_address_space_committed_kb/(1024) as virtual_committed_mb,

    locked_page_allocations_kb/(1024) as locked_pages_mb,

    single_pages_kb/(1024) as single_pages_mb,

    multi_pages_kb/(1024) as multi_pages_mb,

    shared_memory_committed_kb/(1024) as shared_memory_mb

    from sys.dm_os_memory_nodes

    where memory_node_id != 64

    go

    with vasummary(Size,reserved,free) as ( select size = vadump.size,

    reserved = SUM(case(convert(int, vadump.base) ^ 0) when 0 then 0 else 1 end),

    free = SUM(case(convert(int, vadump.base) ^ 0x0) when 0 then 1 else 0 end)

    from

    (select CONVERT(varbinary, sum(region_size_in_bytes)) as size,

    region_allocation_base_address as base

    from sys.dm_os_virtual_address_dump

    where region_allocation_base_address <> 0x0

    group by region_allocation_base_address

    UNION(

    select CONVERT(varbinary, region_size_in_bytes),

    region_allocation_base_address

    from sys.dm_os_virtual_address_dump

    where region_allocation_base_address = 0x0)

    )

    as vadump

    group by size)

    select * from vasummary

    go

    -- Get me all clerks that take some memory

    --

    select * from sys.dm_os_memory_clerks

    where (single_pages_kb > 0) or (multi_pages_kb > 0)

    or (virtual_memory_committed_kb > 0)

    go

    -- Get me stolen pages

    --

    select (SUM(single_pages_kb)*1024)/8192 as total_stolen_pages

    from sys.dm_os_memory_clerks

    go

    -- Breakdown clerks with stolen pages

    select type, name, sum((single_pages_kb*1024)/8192) as stolen_pages

    from sys.dm_os_memory_clerks

    where single_pages_kb > 0

    group by type, name

    order by stolen_pages desc

    go

    -- Get me the total amount of memory consumed by multi_page consumers

    --

    select SUM(multi_pages_kb)/1024 as total_multi_pages_mb

    from sys.dm_os_memory_clerks

    go

    -- What about multi_page consumers

    --

    select type, name, sum(multi_pages_kb)/1024 as multi_pages_mb

    from sys.dm_os_memory_clerks

    where multi_pages_kb > 0

    group by type, name

    order by multi_pages_mb desc

    go

    -- Let's now get the total consumption of virtual allocator

    --

    select SUM(virtual_memory_committed_kb)/1024 as total_virtual_mem_mb

    from sys.dm_os_memory_clerks

    go

    -- Breakdown the clerks who use virtual allocator

    --

    select type, name, sum(virtual_memory_committed_kb)/1024 as virtual_mem_mb

    from sys.dm_os_memory_clerks

    where virtual_memory_committed_kb > 0

    group by type, name

    order by virtual_mem_mb desc

    go

    -- Is anyone using AWE allocator?

    --

    select SUM(awe_allocated_kb)/1024 as total_awe_allocated_mb

    from sys.dm_os_memory_clerks

    go

    -- Who is the AWE user?

    --

    select type, name, sum(awe_allocated_kb)/1024 as awe_allocated_mb

    from sys.dm_os_memory_clerks

    where awe_allocated_kb > 0

    group by type, name

    order by awe_allocated_mb desc

    go

    -- What is the total memory used by the clerks?

    --

    select (sum(multi_pages_kb)+

    SUM(virtual_memory_committed_kb)+

    SUM(awe_allocated_kb))/1024

    from sys.dm_os_memory_clerks

    go

    --

    -- Does this sync up with what the node thinks?

    --

    select SUM(virtual_address_space_committed_kb)/1024 as total_node_virtual_memory_mb,

    SUM(locked_page_allocations_kb)/1024 as total_awe_memory_mb,

    SUM(single_pages_kb)/1024 as total_single_pages_mb,

    SUM(multi_pages_kb)/1024 as total_multi_pages_mb

    from sys.dm_os_memory_nodes

    where memory_node_id != 64

    go

    --

    -- Total memory used by SQL Server through SQLOS memory nodes

    -- including DAC node

    -- What takes up the rest of the space?

    select (SUM(virtual_address_space_committed_kb)+

    SUM(locked_page_allocations_kb)+

    SUM(multi_pages_kb))/1024 as total_sql_memusage_mb

    from sys.dm_os_memory_nodes

    go

    --

    -- Who are the biggest cache stores?

    select name, type, (SUM(single_pages_kb)+SUM(multi_pages_kb))/1024

    as cache_size_mb

    from sys.dm_os_memory_cache_counters

    where type like 'CACHESTORE%'

    group by name, type

    order by cache_size_mb desc

    go

    --

    -- Who are the biggest user stores?

    select name, type, (SUM(single_pages_kb)+SUM(multi_pages_kb))/1024

    as cache_size_mb

    from sys.dm_os_memory_cache_counters

    where type like 'USERSTORE%'

    group by name, type

    order by cache_size_mb desc

    go

    --

    -- Who are the biggest object stores?

    select name, type, (SUM(single_pages_kb)+SUM(multi_pages_kb))/1024

    as cache_size_mb

    from sys.dm_os_memory_clerks

    where type like 'OBJECTSTORE%'

    group by name, type

    order by cache_size_mb desc

    go

    select mc.type, mo.type from sys.dm_os_memory_clerks mc

    join sys.dm_os_memory_objects mo

    on mc.page_allocator_address = mo.page_allocator_address

    group by mc.type, mo.type

    order by mc.type, mo.type

    go

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • ben.brugman (1/22/2013)


    CheckDB effectively destroys the cache. (Why?).

    Because it reads every single allocated page in the database and, like all other operations, reads them through the buffer pool. Unless you happen to have a buffer pool far larger than the database, running CheckDB will throw just about all other pages out of the buffer pool as it runs.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Additional, we now saw the PLE drop from a few thousand to 7 within the 30 second time frame.

    Size of the cache at that moment was 8 Gb.

    This means that SQL-server expects that within 14 seconds the total memory will be refressed.

    the process that was running at that time was running for 13 seconds.

    (This was probably the process wich did cause the 'effect').

    But how is it possible that SQL-server Expects that the memory can be refressed (8Gb of it) in 14 seconds ?

    I do not think normal disk IO (SAN) can handle that amount of IO.

    How is the PLE determined?

    Thanks for your time and attention.

    ben brugman

    sorry for shouting.:-D

  • ben.brugman (1/22/2013)


    But how is it possible that SQL-server Expects that the memory can be refressed (8Gb of it) in 14 seconds ?

    I do not think normal disk IO (SAN) can handle that amount of IO.

    How is the PLE determined?

    Why don't you stop inventing your own definition of PLE and read the documentation?

    http://msdn.microsoft.com/en-us/library/ms189628.aspx


    Page life expectancy - Indicates the number of seconds a page will stay in the buffer pool without references.

    And based on the above please try to explain what PLE has to do with disk IO and SAN?


    Alex Suprun

  • Alexander Suprun (1/22/2013)


    ben.brugman (1/22/2013)


    But how is it possible that SQL-server Expects that the memory can be refressed (8Gb of it) in 14 seconds ?

    I do not think normal disk IO (SAN) can handle that amount of IO.

    How is the PLE determined?

    Why don't you stop inventing your own definition of PLE and read the documentation?

    http://msdn.microsoft.com/en-us/library/ms189628.aspx


    Page life expectancy - Indicates the number of seconds a page will stay in the buffer pool without references.

    The definition status that PLE is a prediction.

    Predictions often are based on something from the past.

    So I would like to know how PLE is determined.

    And based on the above please try to explain what PLE has to do with disk IO and SAN?

    I have seen very low PLE's starting this thread 7 was the lowest. (Now I have seen even lower). A Page Life Expectancy of 7 seconds for 8 Gb memory, suggest that the next 14 seconds all pages in Memory will be replaced by reading in new pages. (That is 8 Gb in 14 seconds this is over 500 Mb per second. In real life a disk (even a SAN) can not deliver this amount of data.

    Other scenario's are possible, but for the other scenario's the number of bytes read per second is even higher. To get to an PLE of 7 an enourmous amount of data has to be replaces in a short time.

    (I asume the PLE is some sort of average).

    So to understand what PLE means, I would like to know how this prediction is calculated. (My expectation about the lowest a PL (average Page Life) can become is higher than 7)

    Thanks for your reaction,

    Ben Brugman

    Additional, (It is a bit more complex)

    The PLE of 7 actually does not expect a page to be replaced within an average of 7 seconds it expects a page to be replaced or touched (referenced) within those 7 (or is it 14) seconds. The text above ignores the fact that a large number of pages can be referenced within the timeperiod.

  • GilaMonster (1/22/2013)


    ben.brugman (1/22/2013)


    CheckDB effectively destroys the cache. (Why?).

    Because it reads every single allocated page in the database and, like all other operations, reads them through the buffer pool. Unless you happen to have a buffer pool far larger than the database, running CheckDB will throw just about all other pages out of the buffer pool as it runs.

    So for checkDB you need a maintenance window, first because it uses resources, secondly because it potentially can 'destroy' performance for other queries.

    Because the backup does not do that, I thought wrongly that checkDB would not do that either.

    Thanks for setting me strait,

    Ben Brugman

  • Bhuvnesh (1/22/2013)


    ben.brugman (1/22/2013)


    Any suggestions how to catch the processes which cause the drop in PLE ?

    it means you have buffer cache issue and IO contentions

    see these links

    http://www.sqlservercentral.com/articles/Performance+Tuning/analyzingmemoryrequirementsforsqlserver/2498/

    http://sqlblog.com/blogs/buck_woody/archive/2010/06/29/the-windows-page-file-and-sql-server.aspx

    http://support.microsoft.com/kb/889654/en-us

    and take help from these

    code removed

    Thanks for the links and the script.

    It did help to get some insight.

    At the moment I am missing the link with the Windows Page file. (second and third link). Because I assume (correct me if I am wrong) that the SQL-server pages are not 'swapped' out and therefore not a part of the pagefile which can be swapped out.

    (Server is a dedicated SQL-server server so almost all of the memory is for SQL-server. Although some usage of SSMS is possible).

    Thanks,

    Ben

  • ben.brugman (1/24/2013)


    At the moment I am missing the link with the Windows Page file. (second and third link). Because I assume (correct me if I am wrong) that the SQL-server pages are not 'swapped' out and therefore not a part of the pagefile which can be swapped out.

    They can be, but that's if Windows decides to move a portion of SQL's memory into the page file (memory pressure and SQL not releasing memory when requested). It isn't something that will affect PLE, which is a SQL measurement and solely related to SQL's internal memory management.

    I'd personally ignore PLE during CheckDB execution. Because it reads the entire database and because it marks pages brought in as 'disfavoured', meaning they'll be the first to be thrown out of cache when space is needed, PLE will be low, it's kinda expected.

    Now, if PLE is dropping that badly during normal operation when CheckDB isn't running, you may well have a problem of too much data being read. It doesn't mean you have an IO contention problem, it means you're reading more data into memory than memory can handle. The fix is more memory, optimise queries or both.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (1/24/2013)


    ben.brugman (1/24/2013)


    I'd personally ignore PLE during CheckDB execution. Because it reads the entire database and because it marks pages brought in as 'disfavoured', meaning they'll be the first to be thrown out of cache when space is needed, PLE will be very, it's kinda expected.

    So even if the PLE goes extremely low during CheckDB it doesn't mean that the cache gets completely 'refreshed', so effectively pages which are touched a lot remain in the cache ?

    I am stil wondering how PLE is calculated.

    Suppose four pages.

    Where the second number is the number of seconds the page has not been referred.

    1. 200

    2. 300

    3. 400

    4 8000

    The average page life (since referral) is 2225.

    After reading a new page.

    1. 200

    2. 300

    3. 400

    5 0

    The average page life (since referral) is 225.

    So this is not used for calculating the PLE. If the PLE is calculated of pages leaving the memory (overwriting them). Then with a PLE of 7 seconds and 8Gb of memory. There must be an awfull lot of overwriting.

    At the moment we are looking into the PLE as a preventive measure, not because we have problems, but we want to be able to signal problems before they occure.

    As for improving one of my suggestions was Clustering on the most important key, for some tables. Within those tables the number of pages read would be 1/5 th of the pages now with the heap.

    Yes more memory would help, and rewriting the queries would help as wel. Rewriting the queries would be very expensive. Clustering and optimising for that would be less expensive. (But I am not making the decisions :crying:)

    Thanks for your explenation about DBcheck,

    Ben

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply