SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

One Easy Strategy for the Database Enterprise

TSQL2sDay

 

Welcome to the second Tuesday of the month. And in the database world of SQL Server and the SQL Server community, that means it is time for TSQL2SDAY. This month the host is Jen McCown (blog / twitter), half of the MidnightDBA team, and the topic that she wants us to write about is: “Strategies for managing an enterprise”. Specifically, she wants to know “How do you manage an enterprise? Grand strategies? Tips and tricks? Techno hacks? Do tell.”

For me, this month, I will be just doing a real quick entry. I have been more focused on my 60 Days of Extended Events series and was looking for something that might tie into both really well that won’t necessarily be covered in the series, but that might work well as an “Enterprise” worthy topic.

ussenterpriseSo, what I decided to land on was the system_health session.

Enterprise

Wait, isn’t the system_health session one of those things that is configured per Instance?

Yes it is!

The system_health session is a default Extended Events session that is running by default on every instance of SQL Server (keyword is default) since SQL Server 2008. Whether you want it to be running or not is an entirely different conversation. But by default it is running.

There is a small problem with that default though. That problem is in the 2008 and 2008 R2 flavors of SQL Server. The default behavior is that the session only dumps the events to the ring buffer. And if you are only dumping the events to the ring buffer, you can imagine this is not entirely that useful. Why? Well, the ring buffer is just a memory target and is considerably more volatile than to write the event session data out to a file. One need not try terribly hard to see why this can be frustrating (unless of course you didn’t even know it was there).

So what to do to help push this in a more enterprise friendly direction? The answer is to add a file target like was done in the 2012 (and up) flavors of SQL Server. Here is the entire system_health session as defined in u_tables.sql (the backup script of the session deployed to the install directory):

-- Extended events default session
IF EXISTS ( SELECT *
				FROM sys.server_event_sessions
				WHERE name = 'system_health' )
	DROP EVENT SESSION system_health ON SERVER;
GO
-- The predicates in this session have been carefully crafted to minimize impact of event collection
-- Changing the predicate definition may impact system performance
--
CREATE EVENT SESSION system_health ON SERVER
ADD EVENT sqlserver.error_reported (
	ACTION ( package0.callstack, sqlserver.session_id, sqlserver.sql_text,
	sqlserver.tsql_stack )
	-- Get callstack, SPID, and query for all high severity errors ( above sev 20 )
	WHERE severity >= 20
	-- Get callstack, SPID, and query for OOM errors ( 17803 , 701 , 802 , 8645 , 8651 , 8657 , 8902 )
	OR ( error = 17803
			OR error = 701
			OR error = 802
			OR error = 8645
			OR error = 8651
			OR error = 8657
			OR error = 8902
		) ),
ADD EVENT sqlos.scheduler_monitor_non_yielding_ring_buffer_recorded,
ADD EVENT sqlserver.xml_deadlock_report,
ADD EVENT sqlos.wait_info (
	ACTION ( package0.callstack, sqlserver.session_id, sqlserver.sql_text )
	WHERE 
	( duration > 15000
		AND ( ( wait_type > 31	-- Waits for latches and important wait resources (not locks ) that have exceeded 15 seconds. 
				AND ( ( wait_type > 47
						AND wait_type < 54
						)
						OR wait_type < 38
						OR ( wait_type > 63
								AND wait_type < 70
							)
						OR ( wait_type > 96
								AND wait_type < 100
							)
						OR ( wait_type = 107 )
						OR ( wait_type = 113 )
						OR ( wait_type > 174
								AND wait_type < 179
							)
						OR ( wait_type = 186 )
						OR ( wait_type = 207 )
						OR ( wait_type = 269 )
						OR ( wait_type = 283 )
						OR ( wait_type = 284 )
					)
				)
				OR ( duration > 30000		-- Waits for locks that have exceeded 30 secs.
						AND wait_type < 22
					)
			)
	) ),
ADD EVENT sqlos.wait_info_external (
	ACTION ( package0.callstack, sqlserver.session_id, sqlserver.sql_text )
	WHERE 
	( duration > 5000
		AND ( (	-- Login related preemptive waits that have exceeded 5 seconds.
				( wait_type > 365
					AND wait_type < 372
				)
				OR ( wait_type > 372
						AND wait_type < 377
					)
				OR ( wait_type > 377
						AND wait_type < 383
					)
				OR ( wait_type > 420
						AND wait_type < 424
					)
				OR ( wait_type > 426
						AND wait_type < 432
					)
				OR ( wait_type > 432
						AND wait_type < 435
					)
				)
				OR ( duration > 45000 	-- Preemptive OS waits that have exceeded 45 seconds. 
						AND ( ( wait_type > 382
								AND wait_type < 386
								)
								OR ( wait_type > 423
										AND wait_type < 427
									)
								OR ( wait_type > 434
										AND wait_type < 437
									)
								OR ( wait_type > 442
										AND wait_type < 451
									)
								OR ( wait_type > 451
										AND wait_type < 473
									)
								OR ( wait_type > 484
										AND wait_type < 499
									)
								OR wait_type = 365
								OR wait_type = 372
								OR wait_type = 377
								OR wait_type = 387
								OR wait_type = 432
								OR wait_type = 502
							)
					)
			)
	) )
/*
--add this line to the system_health session 
--on 2008 and R2 - ensure to change the path appropriately
*/
ADD TARGET package0.asynchronous_file_target (  SET filename = N'C:\Database\XE\system_health.xel' ),
ADD TARGET package0.ring_buffer		-- Store events in the ring buffer target
	( SET max_memory = 4096 )
WITH ( STARTUP_STATE = ON );
GO

Now, with all of the session data going out to disk, you can also schedule a scraper to copy the files to a central log folder on the network. Unfortunately, placing the files directly on a UNC share (via mapped drive or via UNC naming) does not work in 2008 or R2. I have a few more configurations to run on that still, but it doesn’t look good.

At least by dumping the session data to an event file, you are closer to an enterprise worthy solution. Just remember to do it!

One last thing. After you alter the system_health session, make sure you start it again.

ALTER EVENT SESSION system_health ON SERVER
STATE = START

 

Comments

Leave a comment on the original post [jasonbrimhall.info, opens in a new window]

Loading comments...