More AlwaysOn Availability Woes :)

  • Evening All,

    Thanks to all the help from here earlier, I have a fairly successful 3-Node/Replica AlwaysOn Cluster working. However, I do have a couple more questions:

    I have 3 Replicas R1, R2 and R3, with a single Availability Group (G1) that contains 3 databases (D1, D2 and D3).

    They are spread over 2 Subnets. R1 is in Subnet1, and R2 and R3 are in Subnet2

    When R1 owns G1:

    -> Shutting down R1 makes G1 move to R2. Fine.

    However, when R1 is switched off, AND R2 owns G1:

    -> Shutting down R2 doesn't then bring online R3. I have to perform that action manually.

    My first question is, is this correct? Or have I configured it wrong?

    My second question is relating to the direction that one can fail it back and forth.

    I was under the impression that I could move it in any direction, ie:

    -> From R1 to R2, then from R2 back to R1.

    -> I CAN go from R1 directly to R3, but then have to go back to R1 by going to R2 first.

    Again, is this correct, or have I got some bits configured wrong?

    Cheers for all your help,

    Alex

  • R2 will not go to R3 unless R3 is set for Automatic Failover. Check your settings on the group to see which systems are set for automatic. Those also must be synchronous. As long as a replica is set to synchronous, you can fail over and back without issues. If one is set to asynch and you fail there, you will potentially have data loss and you will no longer have mirroring working properly on any other replica on the system.

  • Hi Jeff

    When I look at the properties on the AG, both are Synchronous and both set to Auto.

    But that isn't happening in the last worst-case scenario.

    Is there anything else I should check?

    Cheers

    Alex

  • Wait a sec, I just realized you have not discussed your use of quorum. If you actually take down both of those nodes, you will NEVER get failover unless you have quorum set up properly. The system will see that the majority of the nodes of the cluster are gone, assume that it is the one that is down, and shut down the cluster service. This is to ensure that you don't accidentally get two active systems due to networking issues. You have to have a majority of the votes available to actually have the resource up. If you shut down 2 of 3 nodes and don't have good quorum to account for that, the whole cluster will shut down as there is no majority.

  • Aha! Makes sense.

    Is there a way around this then? Fileshare witness? Or increasing the votes on the 3rd node?

    As it stands, 3 nodes, each with a vote of 1 each.

    Alex

  • Either of those could work. I guess the question has to be asked what do you expect to be real circumstances. Do you think you will have cases where only 1 of 3 nodes will be online and do you need automatic failover in that case? For us that would be DR and we would not use failover to do that (if we are on node 3, we lost the main data center). Help me understand why you would want the system to be online if the majority of nodes went down. If so, then some combination of what you describe would be needed.

  • Actually, No Jeff, I think the Group moving between the first two replicas is fine and adequate, and as you say, if we were to lose 2 replicas within a similar window then we have other bigger problems anyway and would be in a DR situation and so a manual failover at this point is no hardship really..

    However; this is a test run for a much bigger deployment that will be spanning 3 data centers so wanted to practice the configuration now with automatic failover to each data center whilst I have the opportunity to be left alone with these 3 new servers.

    So its partly a test, and partly for my own edification as I've done zero work with quorum configuration outside of the (almost) default Node Majority.

    Cheers

    Alex

  • Well, upon re-looking at my AG configuration. I think I put up some bad information on it.

    The group is configured as follows:

    R1 -> Synchronous -> Automatic Failover

    R2 -> Sycnrhonous -> Automatic Failover

    R3 -> Synchronous -> Manual Failover

    So now I am not clear if it would even be possible to make R3 promote to primary automatically. Initially I thought it means that when R3 is the primary and it fails that it THEN wont automatically failover to another replica. But am I reading it wrong - is it really saying that R3 wont automatically become a primary if R1 and R2 are lost? Even before we get into the details of Quorum Voting!

    I've managed to confuse myself. This is nothing new, incidentally 😉

    Alkex

  • Yes, it means the AG won't fail over to R3 currently. If you change it to automatic failover, it should.

  • But I can only have two automatic failovers, right?

    I have R1 Auto Failover (which in my mind should failover to R2)

    And R2 Auto Failover (which in my mind should failvoer to R3)

    And r3 is set to manual failover (I am not sure where this would go to anyway!?)

    Confused!

    Cheers

  • In 2014 you can have 3 (including the current primary), I think. Try it (if it's not in Production).

  • The error message says '2 Automatic failovers only'

    And thus the confusion. It must be semantics but 2 auto failovers to me is R1->R2 and R2->R3 is only 2.

    I am not expecting R3 to automatically failover to somewhere else, just to automatically start accepting connections.

  • R1 is one auto failover. R2 is a second. That's two. It's not the paths they count, it's the number of partners.

  • "Automatically start accepting connections" implies failover. It must be primary to receive connections. If it is not primary it will not accept connections. Again, if it did, the AG would break possibly, if it was separated from the network and decided to start being primary on its own then rejoined it.

  • You're right, you can only fail over to two nodes automatically.

    No, a secondary can "receive connections" if it's set up to be readable, but obviously they will be read-only. (There are licensing implications for making your secondaries readable.)

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply