Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

NUMA and PLE on SQL Server 2012 Expand / Collapse
Author
Message
Posted Tuesday, February 05, 2013 7:26 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 4:45 PM
Points: 1,157, Visits: 3,247
I've read both Paul and Jonathan's blogs regarding this issue (http://www.sqlskills.com/blogs/paul/page-life-expectancy-isnt-what-you-think/) and started looking at the PLE counters for each individual NUMA node. I can't seem to wrap my head around why there is such a widespread discrepancy between the NUMA nodes. We are running SQL Server 2012 Enterprise Core. Any insight would be greatly appreciated.

Thanks,
Tommy

https://skydrive.live.com/redir?resid=EB98D18648791013!7197&authkey=!AN6DKRDzgyJlarU



Tommy

Post #1415833
Posted Tuesday, February 05, 2013 8:27 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 4:48 AM
Points: 14,802, Visits: 27,278
I'm pretty sure that it just comes down to the fact that while processes are distributed, the work isn't and one node can be done most/all of the work while others are effectively asleep. In general, but absolutely not always, I've seen the distribution between the nodes to be reasonably consistent over time, but weird spikes are fairly normal.

----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #1415888
Posted Tuesday, February 05, 2013 9:46 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 4:45 PM
Points: 1,157, Visits: 3,247
Thank you Grant -

Tommy

Post #1415959
Posted Tuesday, March 12, 2013 7:27 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 4:45 PM
Points: 1,157, Visits: 3,247
FYI, we opened an SR for this with MS about a few weeks ago and it is currently escalated to the DEV team. The consensus thus far is that there does appear to be a defect calculating the PLE value in 2012; I will update when more information from MS is available in the event anyone else is experiencing this.


Some additional info:
SQL Server 2012 Core Edition
4 Socket 8 Core Xeon w/ HT enabled, 128GB, 112GB max memory

We have been collecting the ring buffer outputs for RING_BUFFER_RESOURCE_MONITOR and they do not show any new entries around the time when PLE dropped. Moreover during this timeframe, Page Reads/sec value do not fluctuate much indicating that we did not have to read a bunch of new pages into memory. The query with max reads around this time frame was 32K


Tommy

Post #1429802
Posted Wednesday, March 13, 2013 12:23 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, March 27, 2014 8:06 PM
Points: 316, Visits: 1,119
I've been watching PLE closely on our new 2012 box (see thread: PLE drops to sub 10 on 128gb server). Grabbing two nodes and graphing them over several days shows that the lines match almost perfectly a lot of the time, but after large drops (eg, from 30,000+ to low hundreds) they tend to deviate from each other, wiggle around a little bit, and then resynch as PLE regrows back into the several hundreds / thousands.

allmhuran.com - download the SSMSDeploy addin for SSMS 2008
Blog on sqlservercentral
Post #1430233
Posted Wednesday, March 13, 2013 6:52 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 4:45 PM
Points: 1,157, Visits: 3,247
Thanks for the info, I will be sure to check out the thread. We are seeing the same exact behavior. Physical memory is always reported high during these large drops which seem to occur at random and oddly enough during periods of low activity with the largest read at 32K. I opened a connect item for this prior to the Sev-B case (which has been open for over a month now) however the product team closed it rather quickly opting to work offline with us on it instead...

Tommy

Post #1430353
Posted Thursday, March 14, 2013 7:30 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 8:32 AM
Points: 4,128, Visits: 5,839
One thing I don't see mentioned is if PERFORMANCE or actual IO or other metrics change when PLE falls off a cliff. If it doesn't, I would chalk it up as an internal bug that is "meaningless" (and which thus might get a lower priority by MS to fix). If it does, then the issue becomes finding out WHY stuff actually gets flushed out of the buffer pool and address that.

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1430957
Posted Thursday, March 14, 2013 7:38 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 4:45 PM
Points: 1,157, Visits: 3,247
TheSQLGuru (3/14/2013)
One thing I don't see mentioned is if PERFORMANCE or actual IO or other metrics change when PLE falls off a cliff. If it doesn't, I would chalk it up as an internal bug that is "meaningless" (and which thus might get a lower priority by MS to fix). If it does, then the issue becomes finding out WHY stuff actually gets flushed out of the buffer pool and address that.


Hey Kevin, you are correct. For this particular issue, we are not experiencing any performance degradation whatsoever before or after the PLE drop and neither MS nor our team has been able to correlate it with any other metrics, IO or otherwise. So in a sense it is a "meaningless" internal bug that in all likelihood will be a much lower priority for MS to fix.


Tommy

Post #1430967
Posted Thursday, March 14, 2013 5:00 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, March 27, 2014 8:06 PM
Points: 316, Visits: 1,119
Ah, well that's interesting. I am seeing performance degradation with the PLE drops as well as the movement of other related counters (eg disk % jumping to 100,000%... yes... one hundred thousand!) when the collapses occur

allmhuran.com - download the SSMSDeploy addin for SSMS 2008
Blog on sqlservercentral
Post #1431294
Posted Thursday, March 14, 2013 7:49 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 4:45 PM
Points: 1,157, Visits: 3,247
allmhuran (3/14/2013)
Ah, well that's interesting. I am seeing performance degradation with the PLE drops as well as the movement of other related counters (eg disk % jumping to 100,000%... yes... one hundred thousand!) when the collapses occur


Interesting; perhaps the particular issue you are experiencing is unrelated to ours. Nevertheless, there are a host of collections we ran to rule out other issues, if you would like, please feel free to contact email me offline and I will be more than happy to provide the scripts, collections, etc.

I would start by dumping the ring buffer resource monitor (script below) before, during, and after the event.

SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime],
dateadd (ms, (rbf.[timestamp] - tme.ms_ticks), GETDATE()) as [Notification_Time],
cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') AS [Notification_type],
cast(record as xml).value('(//Record/MemoryRecord/MemoryUtilization)[1]', 'bigint') AS [MemoryUtilization %],
cast(record as xml).value('(//Record/MemoryNode/@id)[1]', 'bigint') AS [Node Id],
cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsProcess)[1]', 'int') AS [Process_Indicator],
cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsSystem)[1]', 'int') AS [System_Indicator],
cast(record as xml).value('(//Record/MemoryNode/ReservedMemory)[1]', 'bigint')/1024 AS [SQL_ReservedMemory_MB],
cast(record as xml).value('(//Record/MemoryNode/CommittedMemory)[1]', 'bigint')/1024 AS [SQL_CommittedMemory_MB],
cast(record as xml).value('(//Record/MemoryNode/AWEMemory)[1]', 'bigint')/1024 AS [SQL_AWEMemory_MB],
cast(record as xml).value('(//Record/MemoryNode/PagesMemory)[1]', 'bigint') AS [PagesMemory_MB],
cast(record as xml).value('(//Record/MemoryRecord/TotalPhysicalMemory)[1]', 'bigint')/1024 AS [TotalPhysicalMemory_MB],
cast(record as xml).value('(//Record/MemoryRecord/AvailablePhysicalMemory)[1]', 'bigint')/1024 AS [AvailablePhysicalMemory_MB],
cast(record as xml).value('(//Record/MemoryRecord/TotalPageFile)[1]', 'bigint')/1024 AS [TotalPageFile_MB],
cast(record as xml).value('(//Record/MemoryRecord/AvailablePageFile)[1]', 'bigint')/1024 AS [AvailablePageFile_MB],
cast(record as xml).value('(//Record/MemoryRecord/TotalVirtualAddressSpace)[1]', 'bigint')/1024 AS [TotalVirtualAddressSpace_MB],
cast(record as xml).value('(//Record/MemoryRecord/AvailableVirtualAddressSpace)[1]', 'bigint')/1024 AS [AvailableVirtualAddressSpace_MB],
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type],
cast(record as xml).value('(//Record/@time)[1]', 'bigint') AS [Record Time],
tme.ms_ticks as [Current Time],
cast(record as xml) as [Record]
from sys.dm_os_ring_buffers rbf
cross join sys.dm_os_sys_info tme
where rbf.ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR'
order by rbf.timestamp ASC



Tommy

Post #1431325
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse