• Depending on your workload, even with MSSQL 2012 SP1 CU3 in place, there may be significant gains in running with startup trace flags 8015 and 8048 in place. 8015 tells SQL Server to ignore NUMA: one bpool, one scheduler group. The cost: a single lazy writer and a single IO completion port instead of one of each per NUMA node. And, lower level of memory affinity.

    Trace flag 8048 removes spinlock contention within a scheduler group during query memory allocation. By default, all members of the scheduler group (usually a NUMA node) go through one serialization point for query memory allocation. With trace flag 8015 in place, instead of all schedulers within a NUMA node at risk for contention, it becomes ALL schedulers. Trace flag 8048 promotes serialization at the NUMA node/scheduler group level to serialization to the core level, removing this bottleneck.

    Spent lots of time trying to replicate huge numbers of foreign pages on multiple NUMA nodes, something I saw in the field across numerous 2, 4, and 8 node systems.

    Couldn't reproduce the foreign pages. But, comparing test runs of a batch report workflow (thousands of reports, 120 concurrent reports) with no trace flags (and no foreign pages after max server memory) to TF 8015 + TF 8048 showed a reduction of approximately 25% in disk IO and approximately 10% elapsed time. Some workflows just work better with a single bpool and a single scheduler group.