Welcome to the second Tuesday of the month. And in the database world of SQL Server and the SQL Server community, that means it is time for TSQL2SDAY. This month the host is Jen McCown (blog / twitter), half of the MidnightDBA team, and the topic that she wants us to write about is: “Strategies for managing an enterprise”. Specifically, she wants to know “How do you manage an enterprise? Grand strategies? Tips and tricks? Techno hacks? Do tell.”
For me, this month, I will be just doing a real quick entry. I have been more focused on my 60 Days of Extended Events series and was looking for something that might tie into both really well that won’t necessarily be covered in the series, but that might work well as an “Enterprise” worthy topic.
Wait, isn’t the system_health session one of those things that is configured per Instance?
Yes it is!
The system_health session is a default Extended Events session that is running by default on every instance of SQL Server (keyword is default) since SQL Server 2008. Whether you want it to be running or not is an entirely different conversation. But by default it is running.
There is a small problem with that default though. That problem is in the 2008 and 2008 R2 flavors of SQL Server. The default behavior is that the session only dumps the events to the ring buffer. And if you are only dumping the events to the ring buffer, you can imagine this is not entirely that useful. Why? Well, the ring buffer is just a memory target and is considerably more volatile than to write the event session data out to a file. One need not try terribly hard to see why this can be frustrating (unless of course you didn’t even know it was there).
So what to do to help push this in a more enterprise friendly direction? The answer is to add a file target like was done in the 2012 (and up) flavors of SQL Server. Here is the entire system_health session as defined in u_tables.sql (the backup script of the session deployed to the install directory):
-- Extended events default session IF EXISTS ( SELECT * FROM sys.server_event_sessions WHERE name = 'system_health' ) DROP EVENT SESSION system_health ON SERVER; GO -- The predicates in this session have been carefully crafted to minimize impact of event collection -- Changing the predicate definition may impact system performance -- CREATE EVENT SESSION system_health ON SERVER ADD EVENT sqlserver.error_reported ( ACTION ( package0.callstack, sqlserver.session_id, sqlserver.sql_text, sqlserver.tsql_stack ) -- Get callstack, SPID, and query for all high severity errors ( above sev 20 ) WHERE severity >= 20 -- Get callstack, SPID, and query for OOM errors ( 17803 , 701 , 802 , 8645 , 8651 , 8657 , 8902 ) OR ( error = 17803 OR error = 701 OR error = 802 OR error = 8645 OR error = 8651 OR error = 8657 OR error = 8902 ) ), ADD EVENT sqlos.scheduler_monitor_non_yielding_ring_buffer_recorded, ADD EVENT sqlserver.xml_deadlock_report, ADD EVENT sqlos.wait_info ( ACTION ( package0.callstack, sqlserver.session_id, sqlserver.sql_text ) WHERE ( duration > 15000 AND ( ( wait_type > 31 -- Waits for latches and important wait resources (not locks ) that have exceeded 15 seconds. AND ( ( wait_type > 47 AND wait_type < 54 ) OR wait_type < 38 OR ( wait_type > 63 AND wait_type < 70 ) OR ( wait_type > 96 AND wait_type < 100 ) OR ( wait_type = 107 ) OR ( wait_type = 113 ) OR ( wait_type > 174 AND wait_type < 179 ) OR ( wait_type = 186 ) OR ( wait_type = 207 ) OR ( wait_type = 269 ) OR ( wait_type = 283 ) OR ( wait_type = 284 ) ) ) OR ( duration > 30000 -- Waits for locks that have exceeded 30 secs. AND wait_type < 22 ) ) ) ), ADD EVENT sqlos.wait_info_external ( ACTION ( package0.callstack, sqlserver.session_id, sqlserver.sql_text ) WHERE ( duration > 5000 AND ( ( -- Login related preemptive waits that have exceeded 5 seconds. ( wait_type > 365 AND wait_type < 372 ) OR ( wait_type > 372 AND wait_type < 377 ) OR ( wait_type > 377 AND wait_type < 383 ) OR ( wait_type > 420 AND wait_type < 424 ) OR ( wait_type > 426 AND wait_type < 432 ) OR ( wait_type > 432 AND wait_type < 435 ) ) OR ( duration > 45000 -- Preemptive OS waits that have exceeded 45 seconds. AND ( ( wait_type > 382 AND wait_type < 386 ) OR ( wait_type > 423 AND wait_type < 427 ) OR ( wait_type > 434 AND wait_type < 437 ) OR ( wait_type > 442 AND wait_type < 451 ) OR ( wait_type > 451 AND wait_type < 473 ) OR ( wait_type > 484 AND wait_type < 499 ) OR wait_type = 365 OR wait_type = 372 OR wait_type = 377 OR wait_type = 387 OR wait_type = 432 OR wait_type = 502 ) ) ) ) ) /* --add this line to the system_health session --on 2008 and R2 - ensure to change the path appropriately */ ADD TARGET package0.asynchronous_file_target ( SET filename = N'C:\Database\XE\system_health.xel' ), ADD TARGET package0.ring_buffer -- Store events in the ring buffer target ( SET max_memory = 4096 ) WITH ( STARTUP_STATE = ON ); GO
Now, with all of the session data going out to disk, you can also schedule a scraper to copy the files to a central log folder on the network. Unfortunately, placing the files directly on a UNC share (via mapped drive or via UNC naming) does not work in 2008 or R2. I have a few more configurations to run on that still, but it doesn’t look good.
At least by dumping the session data to an event file, you are closer to an enterprise worthy solution. Just remember to do it!
One last thing. After you alter the system_health session, make sure you start it again.
ALTER EVENT SESSION system_health ON SERVER STATE = START