Availability Group - How do I maintain connection if one IP of a listener goes offline in multi-subnet configuration?

  • I'm not even sure if this is possible, but I thought I would throw it out there to see if anyone has had a way around this.

    I currently have a SQL Server 2016 Multi-Subnet Availability Group setup across 2 separate data centers;

    DC1 Server1 IP = 10.10.10.1 (Primary Replica)
    DC1 Server2 IP = 10.10.10.2 (Secondary Replica, SYNC commit, Auto Failover)
    IP in listener = 10.10.10.5

    DC2 Server1 IP = 10.20.10.1 (Secondary Replica, SYNC commit, Manual Failover)
    DC2 Server2 IP = 10.20.10.2 (Secondary Replica, ASYNC commit, Manual Failover)
    IP in listener = 10.20.10.5

    We set it up this way to allow an automatic failover to another replica in the same DC, if needed.  We are also afforded the option to failover to second DC without data loss.  WSFC quorum settings are setup that the 2 servers in the active DC have a vote, and one server from passive DC has a vote.  This gets us through issues of patching servers in passive DC, and not having quorum issues.

    Lastly, there is a single P2P connection between the 2 data centers (please work with me here as I know we should get a second P2P, but the purse strings are a little tight right now from management).  As long as nothing ever goes down anywhere, life is good (funny how this works...).

    Given the single P2P connection, the IP of the second data center will not be reachable if the P2P were to drop.  This would then cause an issue of all databases to go to RESOLVING... then back to PRIMARY.  Yes, I understand that this usually happens really quickly.  However, when this happens, we will typically see timeouts within our application, connections dropped, and other similar items.  Given the nature of our business, this has become unacceptable from management, and they are looking for a solution.

    Finally, on to my question...

    Is there anyway to pause / configure / "trick" the listener so that when an IP address (in this instance, the IP of the passive DC) becomes unavailable, then the listener maintains its configuration instead of reconfiguring immediately, causing the RESOLVING... to PRIMARY flip?  It isn't the active IP being used at the time, so I would think that there could be something?

    Thanks for any advise or other words of wisdom that can be passed along.
    Pete

  • You can set a session timeout to greater that 10 seconds in AG properties.
    https://social.msdn.microsoft.com/Forums/sqlserver/en-US/de428459-f8c1-4e66-b712-8367cf455107/changing-the-session-timeout-in-an-availability-group

    Which quorum settings are you using?

    Alex S
  • Thanks for your reply, Alex!

    To answer your question...  Both servers in the active data center have a vote, and 1 server in the passive data center has a vote.

    As a follow up to your reply, we currently have our session timeouts set to 90 seconds for the replicas.  However, the moment that the P2P connection drops, the listener IP address of the passive data center becomes unreachable.  The AG then immediately goes to RESOLVING... then back to PRIMARY.  This is the "blip" that I am trying to overcome, if possible.

    Thanks for any additional information you can provide.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply