RE: Performance monitoring SQL Server and vendor applications

SSCertifiable

Points: 5871

November 23, 2015 at 9:12 pm

#1841913

If the vendor failed to present the evidence described in http://blogs.msdn.com/b/sqlmeditation/archive/2012/12/12/meditation-on-sql-trace-performance-impact.aspx, the vendor is spreading FUD. Because tracing is being replaced by extended events, MS is unlikely to make modifications to its architecture or its wait types. I have seen significant TRACEWRITE + OLEDB waits in SQL Server 2012, so I slapped the only sysadmin who is foolish enough to trace everything but the kitchen sink via SQL Server Profiler (a client side trace).

When tracing, it all comes down to ensuring that the event generation rate does not exceed the event consumption rate. When the event generation is greater, locks pile up in SQL Server until the client (with a client side or a server side trace) consumes the results. Some events (such as Locks:Acquired and Locks:Released) are not very big (XXX bytes) but are extremely chatty. They can cause the event generation to skyrocket, but otherwise are not too useful. Other events, such as Performance:Showplan XML Statistics Profile, can be extremely large (XXKB), are run once per statement. They can cause the event generation to skyrocket, but can be useful. The problem with SQL Server Profiler or client-side tracing is that the rate at which events are generated is not very obvious (they scroll by too quickly and SQL Server Profiler's IO & memory consumption are often overlooked). A server-side trace would allow you to watch file sizes (and its disk performance). I trace over 2000 batches/sec to a 100GB SSD, dedicated for tracing needs. Even so, I am careful when a problem concerns high CPU consumption (unexpected high rates of execution is one my systems' more common "problems").

The oddest situation I ever came across was a team that decided to do a "server-side" trace to a network share. They filtered their trace so that "only" durations longer than 2 minutes would be sent to the share. They didn't care about SQL Server being busy, because an event must first be generated, only after which can it be filtered. Their problem was ASYNC_NETWORKIO waits. As their network bandwidth got constrained, more statements started waiting on clients to fetch (ASYNC_NETWORK_IO), so those statements blocked other statements, thus more statements took longer than 2 minutes, hence more tracing to their share was needed. And the twist was that their share also suffered from the same lack of network bandwidth.... That positive feedback loop brought their SQL Server down in a matter of minutes. They had to stop tracing to a share, and (more importantly, IMHO) they had to increase their network bandwidth. A similar problem can happen if a trace is done to the same disk (or controller) that is being used to serve database or log file IO, when the salient performance bottleneck concerns disk IO (PAGEIOLATCH%, WRITELOG, waits etc.).

If you were to call MS with a performance problem, they will send you the internal version of pssdiag, which is available on codeplex.com. Pssdiag can harness some extended events, but server-side tracing still handles the lion's share of diagnostic data collection (a perfmon log and output of 10 second polled DMVs are its "other two legs"). Pssdiag's general performance template collects batch-level, not statement level events. It's detailed template collects statement level events plus Showplan Statistics Profile (the least expensive Performance Profile event). You upload that data to Microsoft. Codeplex's SQL Nexus (run on a upon-production SQL Server box) uses readtrace to consume and aggregate what pssdiag collected, and presents summaries as graphs and tables (which is usually MS sends back to you :). I would be careful about using anything but pssdiag's general performance template on a busy 16-way or greater (SharePoint being one system that does not always behave well when detailed traces are run).

As far as "less intrusive" monitoring tools go, you will find a variety of suggestions and I have no preference for one over another. But given this web site's sponsor I suspect you will find a predilection to suggest redgate. In all honesty, which works best depends upon your needs and your system. Take a few out for a free test drive - see which you like. You should also ask your vendor what they use, and then choose a different one. That way, there is less chance of the wool being pulled over both their eyes and yours :cool:.

And don't forget there are two sides to a performance "coin": Inadequate supply versus excessive demand. Best to not take sides. Strive to determine which is cheaper or quicker to fix (within a budget).