AG trouble shoot

  • Today we got issue the availability group is not failover automatically and got the error

    The WSFC cluster control API returned error code 1722.The WSFC service may not be running or may be inaccessible in its current state. For information about this error code, see "System Error Codes" in the Windows Development documentation.

    this is the error we observed in sql logs

    may I k now what is the reason for this how to trouble shoot

  • I think this is usually a Listener type issue. But AlwaysOn is COMPLEX and there is no telling what is messed up on your system. Did you test failover before this event happened and was it successful?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • NO when we checked in logs we observed failover is not happened due to missed heartbeats ..

  • Only a few things that can cause heartbeat to be missed I would think. Check and recheck your configuration. Obviously simplest thing is direct crossover cables on dedicated NICs. Have you tried that?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • iam not expert at windows please suggest how to proceed ?

  • can you supply more info. have look at the cluster events and also the system event logs

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • these are the logs which we found

    Cluster has missed two consecutive heartbeats for the local endpoint :~3343~ connected to remote endpoint :~3343~.

    Cluster has received notification that Adapter Microsoft Hyper-V Network Adapter is now in a disconnected state.

    after that the AG group went to failure ..

    automatic failover is not happen due to this ..

  • can you supply more detail on the environment configuration?

    have you run a cluster validation, what does this show?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • NO I didn't ran the cluster validation report and also the below message in logs

    Cluster has lost the UDP connection from local endpoint :~3343~ connected to remote endpoint :~3343~.

  • Having HyperV in the mix just adds to the things that can be misconfigured or go sideways.

    Can I ask why Always On was used without someone at the company that is experienced in troubleshooting it, and windows server and clustering and HyperV??

    Also, I believe you said "No" to my question about was failover ever successfully tested? That's clearly not a good thing.

    Has anything been changed on boxes recently/since configuration?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply