Clustered Primary AG won't fail back to original node

  • I wondered if anyone had any suggestions or ideas on why our clustered primary copy in an availability group will only move from it's initial node to the other clustered node and then won't move back again. Three nodes in the cluster (let's call them Svr1, 2 and 3). The SQL cluster is on Svr1 and 2 (Active / Passive). The AG is between this cluster (as primary) and Svr3 (readable secondary).

    Performing a manual SQL cluster failover from Svr1 to Svr2, the AG (as seen in Failover Cluster Manager) automatically chooses new active node (Svr2) as the "host" for the AG. In other words, it works as expected, as documented.

    Following this...

    Performing a manual SQL cluster failover from Svr2 to Svr1, the AG (as seen in Failover Cluster Manager) stays on Svr2, stops synchronising, and leaves the primary copy in "Not Synchronizing / Recovery Pending" as the DBCC isn't initiated until the AG comes "on line" which it never does because it's stuck on the now passive node Svr2. I can manually force the AG (with some tinkering) to move back to Svr1 using the failover cluster manager but every Microsoft article I read says not to do this.

    Anyone got any ideas what I've missed in the configuration? It seems odd that it works one way automatically but not the other way.

  • FNS (11/18/2016)


    I wondered if anyone had any suggestions or ideas on why our clustered primary copy in an availability group will only move from it's initial node to the other clustered node and then won't move back again. Three nodes in the cluster (let's call them Svr1, 2 and 3). The SQL cluster is on Svr1 and 2 (Active / Passive). The AG is between this cluster (as primary) and Svr3 (readable secondary).

    Hang on, have you got a 3 node WSFC on which all 3 servers sit, or something else? Have you got a 2 node FCI in the mix as well (or instead)?

    Performing a manual SQL cluster failover from Svr1 to Svr2

    Are you performing an AlwaysOn Availability Group failover or failing over an FCI? If yes, why are you performing the failover at the cluster level, not at the Availability Group level?

  • Yes, it's a 3 node WSFC.

    Only Svr1 and Svr2 are used for the SQL cluster (active / passive). We need to be able to fail back and forth between these at a cluster level.

    Svr3 is used to house the readable secondary of the AG with the primary copy sitting on whichever is the active node of the SQL cluster.

    No, we're not failing over the AG. We have no requirement (at this stage) to fail over the AG. The primary copy will be on the active node of the SQL cluster (Svr1 or Svr2) and the readable secondary on Svr3.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply