Automating Extended Events data collection

It’s not difficult to set up an extended event session, but reviewing and saving the data can be complex. In this article, Edward Pollack demonstrates how to collect extended event data.

Extended Events are an excellent way to collect data about a SQL Server that provides a vast array of events that can be used for performance monitoring, troubleshooting, or auditing a server.

While using Extended Events is not overly complex, building a reliable system to collect, parse, and store events over time without any data loss can be challenging.

This article dives into one method for collecting and retaining all event data for a specific set of events. A system like this is highly customizable and provides a solid starting point for Extended Events development. This should be viewed as a collection of puzzle pieces; individual pieces can be adjusted as needed to produce a monitoring solution that fits the needs of a given situation, even if it is vastly different from what is demonstrated here.

Recap

The previous article walked through the creation, configuration, and use of Extended Events with code and customization provided for the reader. The next steps are to turn a temporary process into a permanent one.

Creating the process requires permanent database objects to manage XML indexing and store event XML and the resulting event data. For data to be useful for monitoring, alerting, and analytics, it needs a permanent place to reside. How this data is managed after collection is up to the user. It can be retained forever, for a shorter time frame (if forever is too long), and/or centralized to a location where event data from many servers is aggregated. Centralization may be beneficial regardless of the source details if the central location is a server built for analytics.

Often in projects like this, alerting and monitoring occur close to the source data. In contrast, analytics occur elsewhere to ensure a separation between these processes, their cost in computing resources, and their function.

Note that the stored procedure created in this article uses xp_cmdshell. PowerShell may be easily substituted for this system stored procedure if needed. For the purposes of demonstration, though, having all code in a single script made these concepts easier to understand than by combining the script with PowerShell.

Reading Event Data into permanent tables

Having a permanent target table to store the XML and the results of the XML parsing process is an important step. Some organizations may choose to move the files to another server for processing or store data in another system. For the demonstration, tables will be created in SQL Server to store both the XML and the values extracted from it. Depending on its size, a columnstore index may be beneficial for the table containing the resulting data. If tens of millions of rows are to be stored, use a columnstore index; otherwise, page compression should suffice.

The following code creates two tables: One for the XML and one for the metrics that result from its shredding. It also creates an XML collection as a prerequisite that will allow for indexing the XML column:

The tables are structured to allow the data to be organized chronologically. dbo.extended_events_xml is a temporary repository for XML retrieved from Extended Events files and will not retain data longer than is needed to process it. dbo.extended_events_data will contain the fields shredded from the XML and is intended to be a more permanent home for Extended Events data.

What remains are two steps to manage event data:

  1. Read XML data from the Extended Events file and write it to dbo.extended_events_xml.
  2. Shred that XML and insert the results into dbo.extended_events_data.

This example reads the file created in the first article. Reading the XML takes the function sys.fn_xe_file_target_read_file and inserts the results into dbo.extended_events_xml:

Note that the three NULLs in the function can be replaced with parameters that adjust how to read the file. For our work, the entire file will be read from start to finish. Therefore, the additional parameters may be left NULL. The following confirms that work was done:

The contents of the table can be confirmed by selecting from the XML table:

 

Clicking on any of the XML results displays the raw XML:

The results show some of the fields available in the XML, such as query duration, logical reads, and CPU time. The next step is to parse that XML into those data elements and insert them into dbo.extended_events_data, where they can then be used for analysis:

This parses the XML column and pulls out each component of interest to this demo. Additional data elements available in the XML are not used here (result, spills, statement, etc.) but could be if needed. Note that not every XML field is populated in every event, therefore NULL (or defaults) may be needed for some columns in the output table. The results can be confirmed by selecting from dbo.extended_events_data:

The results show an easy-to-use dataset that has been derived from the event XML. Data will likely be found that may not be useful for a given application, such as:

  • Events from system databases
  • Events with no reads/writes/CPU/rows returned
  • Events from specific logins/hosts
  • Events with zero duration

Feel free to add filters to the Extended Events session to remove any scenarios that are not useful for a given application. Unneeded event data will consume extra computing resources throughout this process, and therefore it is beneficial to trim the data set to only what is needed and nothing more.

Automating the data dollection

With each of the building blocks defined, the fun part begins! A process can be built to automate everything introduced thus far, reducing the need for manual intervention, cleanup, or maintenance of Extended Events or the resulting data. The prerequisites for this process are the creation of the following objects, and all introduced previously in this article:

  • The XML schema collection, called dbo.extended_events_xml_schema_collection
  • The XML table created above, called dbo.extended_events_xml
  • The event data table created above, called dbo.extended_events_data

There are many ways to approach this challenge, and no one way is the “right” way. It is up to the consumer of this code to decide how to implement it and adjust it to best fit their database environment.

The simplest way to do this is to encapsulate each of the following T-SQL code blocks into their own chunks of dynamic SQL:

  1. Create Extended Events session
  2. Start Extended Events session
  3. Stop Extended Events session
  4. Drop Extended Events session
  5. Read Extended Events log files
  6. Delete old Extended Events log files

This process allows for code reuse, improved maintainability, and the ability to modify code more easily. The guts of this process will be simplified into a short sequence of dynamic SQL or stored procedure calls, allowing that logic to be adjusted quickly and with minimal effort and, more importantly, minimal risk. (The complete stored procedure can be downloaded here.)

A stored procedure will be created that accepts three parameters:

The first two parameters specify a generic Extended Events session name and a place to store event data. This will be modified later on with a suffix to ensure that a new session can quickly be created while events from the old session are read.

The last parameter is used to clean up all Extended Events sessions and data for the session specified in the other parameters. When set to 1, it:

  1. Stops the Extended Events session, if started.
  2. Drops the Extended Events session, if it exists.
  3. Deletes event log files for Extended Events session matching the prefix of the session name provided in @extended_events_session_name

This parameter is useful for testing, cleanup, or when the use of this process is complete.

To streamline this code, as many steps as possible are encapsulated into variables or blocks of T-SQL. The first of these steps retrieves details about the current Extended Events session, if one exists:

This code retrieves 3 details about Extended Events:

  • Does an Extended Events session with the specified name exist?
  • Is the Extended Events session started?
  • What is the currently running Extended Events session name, if one exists?

This information simplifies code later on as it allows a variable to be checked, rather than system views. It also allows for reuse of this info whenever needed.

Extended Events created by this process will automatically append a sequential suffix to the Extended Events session name that is passed into the stored procedure. This ensures uniqueness and the ability to read data from old sessions after the new session has started. Code is needed to try and collect the suffix and will do so like this:

This populates two variables that will contain the previous session suffix, as well as the next session suffix. If no session exists, then the new session will be suffixed with “1”.

The next step reuses code from earlier in this article. The only difference is that variables such as file name and session name are spliced into dynamic SQL based on the parameters supplied to the stored procedure. For example, the following command will create the Extended Events session:

Now that this code is stored in @extended_events_session_create_command, it can be executed using sp_executesql any time. This process will be followed for all subsequent processes, and the code will not be repeated here as it has already been introduced. The entirety of this stored procedure is attached to this article for testing, customization, and use.

After this work is complete, the following commands will exist:

  • @extended_events_session_create_command
  • @extended_events_session_start_command
  • @extended_events_session_stop_command
  • @extended_events_session_file_delete_command
  • @extended_events_session_file_delete_all_command***
  • @extended_events_session_drop_command
  • @extended_events_read_data_command

Note that @extended_events_session_file_delete_all_command is used when no previous sessions exist and removes any files with the Extended Events session name provided by @extended_events_session_name.

The last component of this process consists of the following logic:

  1. If @stop_and_drop_all_extended_events_sessions_and_files = 1
    1. Does @is_event_session_started = 1?
      1. If yes, then stop the current session.
    2. If @does_event_session_exist = 1
      1. Then drop the current session.
    3. Delete all log files for the session name provided.
  2. If @stop_and_drop_all_extended_events_sessions_and_files = 0
    1. If @does_event_session_exist = 0
      1. Then Delete all log files for the session name provided.
      2. Create the new session.
      3. Start the new session.
    2. If @does_event_session_exist = 1
      1. If @is_event_session_started = 1
        1. Stop the existing session.
      2. Create the new session.
      3. Start the new session.
      4. Read data from the previous session.
      5. Drop the previous session.
      6. Delete log files from the previous session.

If the parameter is passed in to stop and drop all sessions, then the logic above checks for the state of the previous session before stopping it, dropping it, and deleting files.

Otherwise, the process checks to see if a session already exists and if it is started, and then walks through the steps needed to stop it, create a new session, and read data from the previous session.

Why is this logic so involved? The goal is to minimize lost events. If the old session were read before the new session was started, then any events that occur in between would never be captured by Extended Events and would be lost to us.

The table dbo.extended_events_xml is truncated after each use. Once events are read, there is no need to retain the raw XML anymore.

Customize! Customize! Customize!

This process is expected to be modified to meet the needs of a given organization. With near certainty, the collected events will be adjusted to capture different events, actions, and possibly other bits of metadata collected along the way (such as the database server name).

The XML table can be reused as-is without any modifications.

The data table will need to be adjusted to account for different data elements being parsed from the XML. This table does not need to include columns for XML elements that will be ignored but should have columns for any elements that are to be retained.

Deadlock example

If there was a need to automate event collection for deadlocks, the following could be used as the Extended Events session create command:

To start, there would be value in dumping the Extended Events data into the XML table and examining it to determine which columns should be maintained and which may not be needed. There are likely some columns that are not needed and many that will be the same as those used thus far in this article. Others, such as the deadlock graph, would be essential.

Note that @next_extended_events_session_name was previously calculated above. The code used to populate this (and other dependant variables) can be reused.

Automating Extended Events data collection

The stored procedure presented (and attached) in this article represents a process that can be executed regularly and will automatically manage an Extended Events session.

By making each step of the process a modular component, the ability to test, customize, and maintain this code is far simpler than it would be otherwise. With this structure, it is possible to completely change how an Extended Events session is read without affecting the other building blocks of the process.

The learning curve to using Extended Events via T-SQL is not trivial, but focusing on macro commands rather than individual queries helps make it easier to dive into and become familiar with the process.

As with any homegrown process, customization is essential to getting the most out of it. Microsoft has documented Extended Events quite well:
Quickstart: Extended events in SQL Server – SQL Server | Microsoft Docs

Many writers have shared their tips and tricks for how to capture various events and view the resulting event data. Combining that knowledge with automation makes this into a process that is easy to deploy, manage, and scale, regardless of whether it is needed on one server or one hundred.