RE: Page life expectancy diving to sub-10 on a 128 GB server

SSCertifiable

Points: 5937

April 24, 2013 at 6:35 pm

By way of comparison, here's what it was on SP1 without the private fix. Also three days.

As a reminder, this is a 40 physical core, 4 NUMA node, 128 gig server, 110 allocated to the SQL instance. Which suggests to me that if SQL has consumed all the memory allocated to it and then PLE on one of the nodes crashes to zero (which, as the graphs show, happens far less on RTM than SP1), it represents a truly massive amount of cache flush. Almost 30 gig. Microsoft is suggesting that the hotfix works (except for the dll version issues) and that the remaining craziness is probably related to query activity.

Them's some mighty big queries if you ask me.

To try and get some quick wins, our next steps are to enable compression (obvious choice with that much CPU power) and set up an always-on replica to act as the source for operational SSRS reports. These are things we always planned to do, but when we saw things go bad after migrating to 2012 attention was, for obvious reasons, shifted.