Blog Post

Database Recovery Monitoring with XE

,

On of the greatest benefits of Extended Events (xe) is how the tool simplifies some of the otherwise more difficult tasks.

Recently, I wrote a rewrite of my database recovery progress report script. That script touches on both the error log and some DMVs along with some fuzzy logic to join the data sets together. That script may not be the most complex script out there, but it is more more complex than using the power of XE.

Database recovery (crash recovery) is a nerve wrenching situation under the wrong conditions. It can be as bad as a root canal and just as necessary to endure that pain at times. When the business is waiting on you waiting on the server to finish crash recovery, you feel nervous at best. If you can be of some use and provide some information back to the business, that anxiety dissipates and the business becomes more calm as well. While the previous script can help you get that information easily enough, I want to introduce the easiest method to capture that information currently available.

If you are interested in a history lesson first, here are the first couple of versions of the aforementioned script (here and here).

Discovery First

As always, I like to explore the event repository to see if there is an event that may be applicable to my situation. This can be done via TSQL script or from the XE Gui. I will almost always break out my scripts to figure out if an event is there or not.

DECLARE @TermDescription VARCHAR(64) = 'Database_recovery'
, @ColumnDesc VARCHAR(64) = NULL;
SELECT c.OBJECT_NAME AS EventName
,p.name AS PackageName
,o.description AS EventDescription
,ch.Channel
FROM sys.dm_xe_objects o
INNER JOIN sys.dm_xe_object_columns c 
ON o.name = c.OBJECT_NAME
and o.package_guid = c.object_package_guid
INNER JOIN sys.dm_xe_packages p
ON o.package_guid = p.guid
INNER JOIN (SELECT c.object_name AS EventName,c.object_package_guid AS PkgGuid, v.map_value AS Channel
FROM sys.dm_xe_object_columns c
INNER JOIN sys.dm_xe_map_values v
ON c.type_name = v.name
AND c.column_value = CAST(v.map_key AS NVARCHAR)
WHERE c.name = 'channel') ch
ON ch.EventName = c.object_name
AND ch.PkgGuid = c.object_package_guid
WHERE o.object_type='event'
AND c.name = 'channel'
AND (c.object_name LIKE '%' + @TermDescription + '%'
OR c.description LIKE '%' + ISNULL(@ColumnDesc,@TermDescription) + '%'
OR o.description LIKE '%' + @TermDescription + '%')
ORDER BY o.package_guid;
GO

This query will yield any events that match my description. In this case, I am looking for events related to “database_recovery”. This search will yield four relevant events we can use to track our database recovery progress. Those events are shown in the following image (with the event names being circled in green).

If I explore the events a little more, I will eventually come across an attribute in the database_recovery_progress_report event that leads to a map. This map is called recovery_phase. For me, that is an interesting attribute/map and makes me want to look at it further.

Things are coming together a little bit now. We all know (or should know) that there are the analysis, redo and undo phases to crash recovery. This aligns with that knowledge and throws in a couple more phases of the recovery process.

So, now we know there are four relevant events for us to use and that one of the events will tell us specifically which phase of recovery is currently processing. We have enough information that an event session can now be built.

USE master;
GO
-- Create the Event Session
IF EXISTS
(
SELECT
*
FROM sys.server_event_sessions
WHERE name = 'DBRecoveryProgress'
)
DROP EVENT SESSION DBRecoveryProgress ON SERVER;
GO
EXECUTE xp_create_subdir 'C:DatabaseXE';
GO
CREATE EVENT SESSION [DBRecoveryProgress]
ON SERVER
ADD EVENT sqlserver.database_recovery_progress_report
(ACTION
 (
 /* yes this may be overkill on the actions - it is for discovery*/ sqlserver.database_id
   , sqlserver.session_id
   , sqlserver.database_name
   , sqlserver.tsql_stack
   , package0.event_sequence
   , sqlserver.sql_text
   , sqlserver.nt_username
   , sqlserver.username
   , sqlserver.session_nt_username
   , sqlserver.client_app_name
   , sqlserver.context_info
   , sqlserver.client_connection_id
   , sqlserver.query_hash
   , sqlserver.query_plan_hash
   , package0.callstack
 )
)
  , ADD EVENT sqlserver.database_recovery_trace
(ACTION
 (
 sqlserver.database_id
   , sqlserver.session_id
   , sqlserver.database_name
   , sqlserver.tsql_stack
   , package0.event_sequence
   , sqlserver.sql_text
   , sqlserver.nt_username
   , sqlserver.username
   , sqlserver.session_nt_username
   , sqlserver.client_app_name
   , sqlserver.context_info
   , sqlserver.client_connection_id
   , sqlserver.query_hash
   , sqlserver.query_plan_hash
   , package0.callstack
 )
)
  , ADD EVENT sqlserver.database_recovery_times
(ACTION
 (
 sqlserver.database_id
   , sqlserver.session_id
   , sqlserver.database_name
   , sqlserver.tsql_stack
   , package0.event_sequence
   , sqlserver.sql_text
   , sqlserver.nt_username
   , sqlserver.username
   , sqlserver.session_nt_username
   , sqlserver.client_app_name
   , sqlserver.context_info
   , sqlserver.client_connection_id
   , sqlserver.query_hash
   , sqlserver.query_plan_hash
   , package0.callstack
 )
)
  , ADD EVENT sqlserver.database_recovery_lsn_report
(ACTION
 (
 sqlserver.database_id
   , sqlserver.session_id
   , sqlserver.database_name
   , sqlserver.tsql_stack
   , package0.event_sequence
   , sqlserver.sql_text
   , sqlserver.nt_username
   , sqlserver.username
   , sqlserver.session_nt_username
   , sqlserver.client_app_name
   , sqlserver.context_info
   , sqlserver.client_connection_id
   , sqlserver.query_hash
   , sqlserver.query_plan_hash
   , package0.callstack
 )
)
ADD TARGET package0.event_file
(SET filename = 'C:DatabaseXEDBRecoveryProgress.xel', max_file_size = 5, max_rollover_files = 4)
WITH
(
MAX_MEMORY = 4MB
  , EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS
  , TRACK_CAUSALITY = ON
  , MAX_DISPATCH_LATENCY = 5 SECONDS
  , STARTUP_STATE = ON
);
GO
ALTER EVENT SESSION DBRecoveryProgress ON SERVER STATE = START;
GO

You may notice that I have thrown a lot of actions including the kitchen sink at this event session. Some of that is for consistency across sessions and some of it is simply for exploratory wants (and not needs). Feel free to add/remove actions form this list as you explore the use of this session in your environment.

Here is what that session produces on my test server with a simple stop/start of the SQL Server instance.

In the preceding image, I have the different events circled in red. I have also added the event_sequence action so I can see the relationship between these events as the progress from one to the next. If you note the items circled in green (and connected by green arrow), you will see a couple of different things such as the trace message, the database name, the database id, and the recovery time remaining). Circled in blue are the “destress” items that let us know that the recovery is 100% complete.

Wrap

SQL Server recovery is a safeguard to protect the data in the event of an unexpected failure. The recovery process is necessary and has several phases to roll through in order to bring the database back online.

Just because you need to have patience during the crash recovery process does not mean you have to work hard to get a status of the process or that you need to stress throughout the process. This XE event session will take a lot of work and stress out of the process. I would recommend having this lightweight session running on the server with the startup state set to enabled. This will make your job easier and definitely can make you look like a rockstar DBA.

This article has demonstrated the power of Extended Events, for a lot more reading on the topic, here is a list of over 100 articles.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating