SQL Availability Group lease expired with no warning

  • Good evening gents,

    Today was an interesting day. I got in this morning and was informed that for a brief period of time the AG was not responding to connection attempts.

    After diagnosing the issue, I am stumped as to why the issue happened. Here is what I have discovered:

    My Availability Group is set for FailureConditionLevel 3.

    WSFC reported an error at 9:31:14 stating that the cluster resource of type 'SQL Server Availability Group' failed. According to the WSFC logs, the hadrag monitor returned a failure.

    The AlwaysOn_health event_file for each replica reported the change to the RESOLVING_NORMAL state. The primary replica then initiated the process to reclaim the AG lease.

    The XE SQLDIAG log shows that the availability group was healthy as of 9:29 AM with a 'clean' status for every statistic. At 9:31:15 the AG lease failed to renew because it was invalid? The pair had been running fine for weeks until now.

    I have several questions I was hoping that someone with more experience could help with:

    1. Why did the lease fail to renew? I can't find anywhere in the AlwaysOn_health log nor the SQLDIAG log where the lease is renewing. Everything seems to be running fine now, but I'm worried about future failures.

    2. What is the recommended practice for FailureConditionLevel? Does FailureConditionLevel affect the output of sp_server_diagnostics or is it something else entirely?

    3. How is the 'warning' state calculated in sp_server_diagnostics? Several times throughout the day my query_processing result goes from warning to clean, with no indication of why it does so.

    Thanks in advance for any light you can shed on this issue!

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply