AlwaysOn AG failover works if on same VLAN, fails from Other VLANs

  • Initially, AlwaysOn Availability group works from everywhere with clients of various sorts aimed at listener.

    Do a manual failover, and now only clients on the same VLAN as the SQL Cluster (2 nodes both on same VLAN) can retain a connection to the sql server (moves correctly with the primary and all is good). From the perspective of SQL, all looks wonderful and the failovers are flawless.

    From any VLAN other than that which the SQL Cluster is on, clients all fail to connect (applications, ODBC connections). Time out finally and "server does not exist or ..."

    Have seen lots of ideas about what may be the cause to similar symptoms but have never seen where it is so clearly defined by VLAN boundaries.

    So there has to be a simple cause and a simple answer, network related. Can anyone help?

    Thanks -

  • A VLAN represents a single broadcast domain within a network and communication between broadcast domains requires routing to be configured.

    SQL Server (to the best of my knowledge) uses broadcasts to locate other servers on the local network but because routers don't/can't forward broadcasts your local hosts can't see any other servers on any other VLANs.

    That, at first sight, looks like the problem you are having.

  • Michael Gerholdt (12/20/2016)


    Initially, AlwaysOn Availability group works from everywhere with clients of various sorts aimed at listener.

    Do a manual failover, and now only clients on the same VLAN as the SQL Cluster (2 nodes both on same VLAN) can retain a connection to the sql server (moves correctly with the primary and all is good). From the perspective of SQL, all looks wonderful and the failovers are flawless.

    From any VLAN other than that which the SQL Cluster is on, clients all fail to connect (applications, ODBC connections). Time out finally and "server does not exist or ..."

    Have seen lots of ideas about what may be the cause to similar symptoms but have never seen where it is so clearly defined by VLAN boundaries.

    So there has to be a simple cause and a simple answer, network related. Can anyone help?

    Thanks -

    The listener should be configured with an IP from each VLAN in an OR configuration.

    There is also a resource setting to be applied to stop the network name trying to register all IPs when comong online.

    Could you confirm the current configuration of the AG listener

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Michael Gerholdt (12/20/2016)


    Initially, AlwaysOn Availability group works from everywhere with clients of various sorts aimed at listener.

    Do a manual failover, and now only clients on the same VLAN as the SQL Cluster (2 nodes both on same VLAN) can retain a connection to the sql server (moves correctly with the primary and all is good). From the perspective of SQL, all looks wonderful and the failovers are flawless.

    From any VLAN other than that which the SQL Cluster is on, clients all fail to connect (applications, ODBC connections). Time out finally and "server does not exist or ..."

    Have seen lots of ideas about what may be the cause to similar symptoms but have never seen where it is so clearly defined by VLAN boundaries.

    So there has to be a simple cause and a simple answer, network related. Can anyone help?

    Thanks -

    Do the applications have the MULTISUBNETFAILOVER=TRUE option set in their connection strings?

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.

  • Thanks to all for your responses.

    We did focus on MultiSubnetFailovers value, but though issues there provided similar symptoms, our cluster doesn't span multiple subnets, so playing with those values both in the cluster configuration and in the clients did not solve the problem.

    In our case, the cause turned out to be that Group Policy was enforcing a "disable gratuitous ARPs" for all of our servers.

    Our solution was to put the cluster servers in an AD group and deny permissions to disable gratuitous ARPs for these servers (as well as remove the registry settings previously applied by GP).

    Immediate cure.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply