SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Trace Flag 8048, 8015, or SUMA?


Trace Flag 8048, 8015, or SUMA?

Author
Message
sql_handle
sql_handle
Valued Member
Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)

Group: General Forum Members
Points: 66 Visits: 114
Hola!

Recently included SQL Server startup Trace Flag 8048 to resolve a serious spinlock contention issue in a SQL Server 2008 R2 system.

Interested to hear from others who have found usage cases where performance value was delivered by trace flag 8048 (promote query memory grant strategy from per-NUMA node to per-core), trace flag 8015 (SQL Server ignores physical NUMA), or SUMA (interleaved sufficiently uniform memory access).

Trace flag 8048
http://blogs.msdn.com/b/psssql/archive/2011/09/01/sql-server-2008-2008-r2-on-newer-machines-with-more-than-8-cpus-presented-per-numa-node-may-need-trace-flag-8048.aspx

Trace flag 8015
http://blogs.msdn.com/b/psssql/archive/2010/04/02/how-it-works-soft-numa-i-o-completion-thread-lazy-writer-workers-and-memory-nodes.aspx

SUMA, or interleaved memory
http://msdn.microsoft.com/en-us/library/ms178144(v=sql.105).aspx
"If you have hardware NUMA, it may be configured to use interleaved memory instead of NUMA. In that case, Windows and therefore SQL Server will not recognize it as NUMA."

Gory details of system workload, gathered metrics from troubled system, and gathered metrics from the system after intervention in posts to follow.

Peace!
tw: @sql_handle
sql_handle
sql_handle
Valued Member
Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)

Group: General Forum Members
Points: 66 Visits: 114
About the system:
- 4 hex core Xeon E7540 @ 2.00GHz, hyperthreaded
- 128 GB RAM
- WS2008R2
- MSSQL 2008 R2 SP2
- maxdop 6

About the workload:
- 1000s of Batch scheduled/queued reports driven from 2 report application servers.
- 3 flavors of batches: daily, weekly, monthly
- All report application servers connections to SQL Server are made as a single service account
- Maximum report concurrency = 90
sql_handle
sql_handle
Valued Member
Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)

Group: General Forum Members
Points: 66 Visits: 114
Key findings on the troubled system:
- From Perfmon, 15 second intervals
- - System remains at 95%-100% CPU busy
- - SQL Server buffer page lookups < 10000 per/second

- From wait and spinlock DMVs, 5 minute intervals
- - High CMEMTHREAD waiters and wait time
- - High SOS_SUSPEND_QUEUE spins and backoffs

Bob Dorr's CSS Engineer Blog post on trace flag 8048 indicates that systems with more than 8 cores per NUMA node can run into similar symptoms due to bottleneck in query memory grant. Trace flag 8048 will change the strategy to per-core instead of per-NUMA node.

MSSQL was restarted with -T8048 in place. The difference was immediately evident: buffer page lookup rate rose over 1 million and spike to 8 million per second. The troubled batch workload, which previously couldn't complete in 24 hours, completed in less than 4 hours. Another batch workload which was not the focus of investigation or intervention was submitted as part of validating the corrective value of trace flag 8048 (and ensuring that its unwanted side effects were minimal). This report batch previously completed in 2 hours; with trace flag 8048 in place the report batch completed in roughly 20 minutes.

Nightly ETL also encountered a benefit. ETL time dropped from roughly 60 minutes to 40 minutes.

Pulling together information from several places, I speculate that the high degree of report queuing, the concurrent report count greater than hardware thread count, and the single user account for all reports combined to put pressure on one NUMA node until worker thread pressure caused it to be disfavored for the next incoming connection request for the same user account, at which point the next NUMA node would get some number of connections near instantly. Each NUMA node would end up with a high probability of stressing the query memory grant bottleneck.

Opening more lanes for query memory grant removed the bottleneck. But, I'm not sure the cost. Bob Dorr's CSS post makes it clear that there is additional memory overhead with trace flag 8048. Is that overhead within the single-page allocator region governed by MSSQL 2008 R2 max server memory? If so, I guess the system will just have some number fewer database pages in the buffer pool cache. If not, should max server memory be lowered to accomodate?

Finally... trace flag 8048 was a 'fix', but was it the best fix? Would SQL Server ignoring physical NUMA due to trace flag 8015 have accomplished the same thing? What about setting the BIOS to interleave memory, leaving the server with SMP-imitating SUMA behavior instead of NUMA behavior?

Thanks!
@sql_handle
SQLCharger
SQLCharger
Old Hand
Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)

Group: General Forum Members
Points: 324 Visits: 1432
I'm thinking of using this flag and I'm looking around for meaningful feedback.

Hadn't seriously considered it before because I thought it was only meant for new servers with 10 or more cores per socket.
Turns out the >8 figure includes HT cores, so servers with 6 or 8 cores/socket will be caught if HT is enabled. Ooops.Blink

Any known gotchas?

Cheers,

JohnA

MCM: SQL2008
lonnyniederstadt
lonnyniederstadt
Valued Member
Valued Member (51 reputation)Valued Member (51 reputation)Valued Member (51 reputation)Valued Member (51 reputation)Valued Member (51 reputation)Valued Member (51 reputation)Valued Member (51 reputation)Valued Member (51 reputation)

Group: General Forum Members
Points: 51 Visits: 88
We've watched trace flag 8048 go into numerous systems now, and have not observed any negative effects. We've been watching stolen memory, all waits and spinlocks in 5 minute increments, etc.

It helps that we are concerned with only two primary workflows, a batch ETL and a batch report window, both with query concurrency higher than server core count.

In our case, the best solution was to disable sql server NUMA optimization with trace flag 8015 and eliminate query memory spinlock contention with trace flag 8048. A single large database buffer pool and a single large scheduler group is much more advantageous for our workflows than smaller buffer pools and scheduler groups per NUMA node. Too easy to unevenly stress CPU and memory across the NUMA nodes, given the round robin distribution of connections, the tendency of parallel query workers to remain in the same NUMA node with the connection, and the insertion of all database blocks for a thread that aren't referenced from cache on another NUMA node into the cache on the NUMA node associated with the worker.

Our internal testing (query concurrency of 120 on 24 core server, 4 hex cores) showed 25% drop in total io and 10% drop in execution time with trace flag 8048 and 8015 vs no trace flags.
SQLCharger
SQLCharger
Old Hand
Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)

Group: General Forum Members
Points: 324 Visits: 1432
TF8015 - never thought to use it on modern machinery.
Would it disable NUMA detection at startup and make all memory access appears as 'foreign'?

Cheers,

JohnA

MCM: SQL2008
sql_handle
sql_handle
Valued Member
Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)Valued Member (66 reputation)

Group: General Forum Members
Points: 66 Visits: 114
I've rambled about it somewhat incoherently at sql-sasquatch.blogspot.com. (sql_handle, lonnyniederstadt, sql_sasquatch are all me - there's an idea behind my multiple identites that I'm notmanaging too well yet across multiple devices Smile Folks might not come back to my blog to read more since they haven't been able to figure out where I'm going yet - kind of stream of consciousness until I've got the spreadsheets and graphs to show. But what I am seeing in SQL Server basically aligns with what Oracle must have seen somewhere along the line before Oracle 11g. They disable database NUMA support by default on all platforms Smile

Trace flag 8015 makes SQL Server ignore the NUMA node boundaries for scheduler groups and memory management. The server effectively becomes one big NUMA node... or like one big VM without pass-through NUMA.

My world is all about optimizing a heavy ETL that has more queries than logical processors, and a batch report workload with more queries than logical processors. ETL and batched reports both hit disk hard.

With repeatable and repeated performance test loads, trace flag 8015 + 8048 led to approximately 25% reduction in disk IO and 10% reduction in elapsed time even when canceling out SAN cache effect, etc. Statistics were never updated between perf tests, plans didn't change... PLE in each NUMA node was almost balanced with NUMA enabled. Our primary workloads just work better with a single memory resource and a single scheduler group, on both SQL Server 2008R2 and SQL Server 2012.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search