Distributed Availability Group Health: T-SQL and Zabbix

After creating a distributed availability group using the steps described in my article, SQL Server 2022 Clusterless Distributed Availability Group, how can we verify if the synchronization between data centers is healthy?

In SQL Server Management Studio, when you right-click a distributed availability group, you will notice that there is no dashboard to confirm whether SQL Server is properly synchronizing data across data centers:

SSMS Distributed AG options

Even if you check the dashboards of the primary and secondary data centers separately, you will not see if there is a problem in-between.

To solve this, we can run a query on the primary node of the primary data center:

CREATE TABLE #Info (
  AG_Name SYSNAME, is_distributed BIT, is_chained BIT, replica_server_name VARCHAR(256), [Database] SYSNAME,
  role_desc VARCHAR(10), synchronization_health TINYINT, synchronization_health_desc VARCHAR(20), log_send_queue_size BIGINT, log_send_rate BIGINT,
  redo_queue_size BIGINT, redo_rate BIGINT, suspend_reason_desc VARCHAR(30), last_sent_time DATETIME, last_hardened_time DATETIME,
  last_redone_time DATETIME, last_commit_time DATETIME, secondary_lag_seconds BIGINT);
INSERT INTO #Info
SELECT ag.name AG_Name, ag.is_distributed, 0 'is_chained', ar.replica_server_name AS [AG], dbs.name AS [Database],
       ars.role_desc, drs.synchronization_health, drs.synchronization_health_desc, drs.log_send_queue_size, drs.log_send_rate,
       drs.redo_queue_size, drs.redo_rate, drs.suspend_reason_desc, drs.last_sent_time, drs.last_hardened_time,
       drs.last_redone_time, drs.last_commit_time, drs.secondary_lag_seconds
  FROM sys.databases dbs
 INNER JOIN sys.dm_hadr_database_replica_states drs ON dbs.database_id = drs.database_id
 INNER JOIN sys.availability_groups ag ON drs.group_id = ag.group_id
 INNER JOIN sys.dm_hadr_availability_replica_states ars ON ars.replica_id = drs.replica_id
 INNER JOIN sys.availability_replicas ar ON ar.replica_id = ars.replica_id
UNION ALL
SELECT ag.name AG_Name, 1 'is_distributed', 1 'is_chained', ar.replica_server_name AS [AG], dbs.name AS [Database],
       ars.role_desc, drs.synchronization_health, drs.synchronization_health_desc, drs.log_send_queue_size, drs.log_send_rate,
       drs.redo_queue_size, drs.redo_rate, drs.suspend_reason_desc, drs.last_sent_time, drs.last_hardened_time,
       drs.last_redone_time, drs.last_commit_time, drs.secondary_lag_seconds
  FROM [VSRVSQLON].master.sys.databases dbs
 INNER JOIN [VSRVSQLON].master.sys.dm_hadr_database_replica_states drs ON dbs.database_id = drs.database_id
 INNER JOIN [VSRVSQLON].master.sys.availability_groups ag ON drs.group_id = ag.group_id
 INNER JOIN [VSRVSQLON].master.sys.dm_hadr_availability_replica_states ars ON ars.replica_id = drs.replica_id
 INNER JOIN [VSRVSQLON].master.sys.availability_replicas ar ON ar.replica_id = ars.replica_id;
UPDATE i
   SET i.last_sent_time = i2.last_sent_time, i.secondary_lag_seconds = i2.secondary_lag_seconds, i.synchronization_health = i2.synchronization_health
  FROM #Info i
 INNER JOIN #Info i2 ON i2.[Database] = i.[Database] AND i2.is_distributed = 1 AND i2.is_chained = 0 AND i.is_distributed = 1 AND i.role_desc='PRIMARY';
DELETE
  FROM #Info
 WHERE is_distributed = 1 AND is_chained = 0;
ALTER TABLE #Info DROP COLUMN is_chained;
SELECT AG_Name, is_distributed, replica_server_name, [Database], role_desc,
       synchronization_health, synchronization_health_desc, log_send_queue_size, log_send_rate, redo_queue_size,
       redo_rate, suspend_reason_desc, last_sent_time, last_hardened_time, last_redone_time,
       last_commit_time, secondary_lag_seconds
  FROM #Info;

This script requires a linked server to the primary node of the secondary data center. This linked server needs the security to “Be made using the login’s current security context”. Notice how we copy the distributed AG information (returned by the primary data center) to the corresponding record (primary node of the secondary data center); this ensures we only return one row per server.

Here is an example of the output for one database:

T-SQL output

As the number of databases grows, spotting synchronization issues manually becomes impossible; that’s why monitoring software like Zabbix is so valuable:

Zabbix AG Distributed dashboard

With it you can create a dashboard that shows the synchronization health of every database across all servers, and it can report immediately any disconnection. This also helps non-DBAs, and even non-technical people, to monitor things easily.

You can either download the prebuilt template from my GitHub repository and adjust it, or you can follow the detailed 10-step guide below to build your own.

Step 1: Create a folder for custom SQL queries

If you installed Zabbix under “C:\Program Files\Zabbix Agent 2\”, create the folder: C:\Program Files\Zabbix Agent 2\Custom Queries\MSSQL

Then add a file named “AGDistributed.sql” and paste the SQL command shown earlier.

Step 2: Configure the Zabbix MSSQL plugin

In “C:\Program Files\Zabbix Agent 2\zabbix_agent2.d\”, locate “mssql.conf” and add:

Plugins.MSSQL.CustomQueriesDir=C:\Program Files\Zabbix Agent 2\Custom Queries\MSSQL

Step 3: Restart the Zabbix agent and test

Restart the agent after making configuration changes and after modifying any SQL file:

net stop "Zabbix Agent 2"

First test connectivity to the database replacing the tags with their real values, unquoted (check in my previous article Monitor SQL Server using Zabbix, the part related to Macros):

& "C:\Program Files\Zabbix Agent 2\zabbix_agent2.exe" -t mssql.ping[{$MSSQL.URI}, {$MSSQL.USER}, {$MSSQL.PASSWORD}]

Then assess the custom query (omitting the “.sql” extension):

& "C:\Program Files\Zabbix Agent 2\zabbix_agent2.exe" -t mssql.custom.query[{$MSSQL.URI}, {$MSSQL.USER}, {$MSSQL.PASSWORD}, AGDistributed]

Finally, start the service:

net start "Zabbix Agent 2"

Step 4: Create a new template in Zabbix

In Zabbix, click Data Collection --> Templates. Click Create template (top-right corner):

Create template

Enter the following details, then click Add to save:

Template name: AG Distributed
Template groups: Templates/Database

New template

This template will hold all the items, rules, and dashboards.

Step 5: Add a template item

In the template list, locate AG Distributed, then click Items:

Template - Items

Click Create item (top-right corner) and fill in the details, then click Add:

Name: AG Distributed
Type: Zabbix agent (active)
Key: mssql.custom.query[{$MSSQL.URI},{$MSSQL.USER},{$MSSQL.PASSWORD},AGDistributed]
Type of information: Text
Update interval: 1m (one minute, increase later if needed)

Template new item

This item will capture the query results and make them available in JSON format.

Step 6: Associate the template with a host

Go to Monitoring --> Hosts, select the host where the SQL Server instance runs, under Configuration choose Host:

Host configuration

In the Templates, type “AG Distributed” or use the Select button to find it, then click Update:

Add template to host

Go to Monitoring --> Hosts, select the host where the SQL Server instance runs, under View, choose Latest data. In the Name field, search for “AG Distributed”. Hover over the Last value column; you should see the JSON output from the SQL script:

Item latest data

If you see an Info icon in the last column, there is a problem with the data collection.

Step 7: Add a discovery rule

Discovery rules let Zabbix automatically generate monitoring items for every availability group, SQL instance, and database.

In the template list, locate AG Distributed, then click Discovery:

Template - Discovery

Click Create discovery rule and fill in the details:

Name: AG Distributed
Type: Dependent item
Key: ag.distributed
Master item: select “AG Distributed” (the item created in step 5)

Create discovery rule

Switch to the LLD macros tab (LLD stands for Low Level Discovery). Add three macros that uniquely identify each record:

{#AGNAME}: Availability group name
{#REPLICASERVERNAME}: SQL Server instance name
{#DATABASE}: Database name

Low level discovery

Example: for a database named DB1 in two 2-node AGs (SQLPRODAG and SQLPREMAG), you will have:

SQLPRODAG CLOUD1 DB1
SQLPRODAG CLOUDDR2 DB1
SQLPREMAG ONPREMISE1 DB1
SQLPREMAG ONPREMISEDR2 DB1

LLD macros are keys created by Zabbix when the data comes. They must be uppercase, without spaces, underscores, or quotes. Choose any names that make sense to you – you will reference them in later objects.

The JSONPath is a query that returns a value; they must match exactly the name in the JSON (the names of the columns defined in our SQL query): they are case sensitive and do not allow spaces or quotes.

After saving, go to Monitoring --> Hosts, select the host where the SQL Server instance runs, and under Configuration choose Discovery. Confirm the new discovery rule, and any time you update this or to any underlying object, check the Info column for errors:

Created discovery rule

Step 8: Add an item prototype

In the template list, locate AG Distributed, then click Discovery. On the discovery rule, click Item prototypes:

Template - Item prototypes

Click Create item prototype and enter the details:

Name: Synchronization health of {#AGNAME} {#REPLICASERVERNAME} {#DATABASE}
Type: Dependent item
Key: avoid using commas, enter synchronization_health[{#AGNAME}-{#REPLICASERVERNAME}-{#DATABASE}]
Type of information: Numeric (unsigned)
Master item: select “AG Distributed”

Create item prototype

Go to the Preprocessing tab, and Add a Preprocessing step:

Name: JSONPath
Parameters: $[?(@.AG_Name == "{#AGNAME}" && @.replica_server_name == "{#REPLICASERVERNAME}" && @.Database == "{#DATABASE}")].synchronization_health.first()

Item preprocessing

This preprocessing searches in the JSON the three keys defined earlier in the LLD macros, then gets the field synchronization_health, and dynamically creates items; for example: Synchronization health of SQLPRODAG CLOUD1 DB1.

Search under Monitoring --> Hosts --> Items the name “synchronization health of”, and confirm no issues on Info on the discovered items:

Discovered items

Go to Monitoring --> Hosts, select the host where the SQL Server instance runs, under View choose Latest data. Search the name “synchronization health of” to see the values received, and confirm no issues on Info:

Discovered items data

Step 9: Add a trigger prototype

In the template list, locate AG Distributed, then click Discovery. On the discovery rule, click Trigger prototypes:

Click Create trigger prototype and enter the details:

Name: Synchronization health of {#AGNAME} {#REPLICASERVERNAME} {#DATABASE}
Severity: Disaster
Expression: click Add and enter the details:
- Item: click Select prototype and select Synchronization health of {#AGNAME} {#REPLICASERVERNAME} {#DATABASE}
- Condition: <> (not equal)
- Value: 2 (healthy; see the official documentation)

New trigger prototype

Tigger prototype condition

Once added, you will see alerts under Current problems or in the host's problems:

Current problems

If you have email alerts configured for Disaster, you will also receive one.

Step 10: Create the dashboard

Finally, let us visualize everything. In the template list, locate AG Distributed, then click Dashboards:

Template dashboards

Click Create dashboard (top-right), enter a name (i.e., “AG Distributed”) and click Apply. On the canvas, add your first widget in the upper left corner:

Type: Graph
Name: enter the data center and server name, e.g., SQLPRODAG CLOUD1
Data set #1 item patterns: enter Synchronization health of SQLPRODAG CLOUD1 * replacing the actual data center and server name; do not forget the asterisk
Draw: points
In the Problems tab, enable show problems

Dashboard widget

Add additional widgets as needed. Each widget displays up to fifty items, so create multiple depending on the number of availability groups, nodes, and databases. When done, click Save changes (top-right). You can now go to Monitoring --> Hosts, select the host where the SQL Server instance runs. Under View, choose Dashboards and select AG Distributed to view the synchronization health.

That completes the setup. With this dashboard, you will have a clear, near-real-time view of distributed availability group health. It makes detecting and resolving synchronization issues far easier, and it is simple enough for both DBAs and non-technical users to interpret.

Distributed Availability Group Health: T-SQL and Zabbix

Step 1: Create a folder for custom SQL queries

Step 2: Configure the Zabbix MSSQL plugin

Step 3: Restart the Zabbix agent and test

Step 4: Create a new template in Zabbix

Step 5: Add a template item

Step 6: Associate the template with a host

Step 7: Add a discovery rule

Step 8: Add an item prototype

Step 9: Add a trigger prototype

Step 10: Create the dashboard

Rate

Share

Categories

Share

Rate

Distributed Availability Group Health: T-SQL and Zabbix

Step 1: Create a folder for custom SQL queries

Step 2: Configure the Zabbix MSSQL plugin

Step 3: Restart the Zabbix agent and test

Step 4: Create a new template in Zabbix

Step 5: Add a template item

Step 6: Associate the template with a host

Step 7: Add a discovery rule

Step 8: Add an item prototype

Step 9: Add a trigger prototype

Step 10: Create the dashboard

Rate

Share

Categories

Share

Rate

Related content

Migrate from Linux to Windows with Availability Groups

SQL Server 2022 Clusterless Distributed Availability Group

Check Your Backup Compliance Against a Data Management Provider

Restore SQL Server database using data management provider REST API

Use Polybase to consume Oracle data from SQL Server on Linux 2019