AlwaysON Multi Subnet Cluster, AG Listener

  • Hello

    I am not sure if this is a SQL issue or Windows Cluster issue, but am trying to rule out it being a SQL issue.

    When I fail an availability group between subnets, I am finding that the DNS entry in DNS is staying. So what happens is the Availablity Group listener has 2 records in DNS, one for each IP. This causes the App to timeout at times, since DNS will return either of the two IP's.

    Anyone ever run into this before?

    Qsac

  • .

  • Hi Steve,

    Have you solved the issue?

  • No I have not, please let me know if you have any insight.

    Steve

  • I think I figured this out. It is done by design.

    The App team has to add some new parameters to the connection string. Here is MS Link for more Info.

    http://msdn.microsoft.com/en-us/library/hh205662.aspx

    Thanks to all for the supports

    Steve

  • Brilliant!

    Here's another good link for completion:

    http://msdn.microsoft.com/en-us/library/ff878716.aspx

    Thank you Steve

  • I have had to do A LOT of research on this subject due to implementing a multi-subnet cluster here at my company. (For my situation, we're using Windows Server 2008 R2 SP1.) Your problem could be one of two things:

    By default, when you create the Availability Group Listener, the RegisterAllProvidersIP setting will be set to 1. That means that both subnet's IPs will be attached to the DNS A-record (i.e. your listener's name). If you are using older database clients that don't support the MultiSubnetFailover option, these clients will try to arbitrarily connect to only one of those IPs. This will result in 50% of your requests timing out.

    If you have RegisterAllProvidersIP set to 0, then only the online subnet's IP will be registered with DNS. This removes the 50% timeout problem above. However, keep in mind that you now need to consider your DNS replication settings and the HostRecordTTL property. When a failover occurs, the new, active node will make the update to the domain controller that it talks to. Since it's a different subnet, there's a good chance that this is a different domain controller than the previous, now "offline" node. Depending on which DNS you're looking at, you may or may not see the update immediately. Once DNS replication runs, all DNS records should match. That said, the local DNS cache on your client will rely on the HostRecordTTL property to know when to go out and get a fresh copy. The default for this value is 20 minutes, which means that you could wait up to 20 minutes to point to the new subnet's IP. Consider lowering this value to 5 minutes (or less) if you don't mind the extra network traffic to request DNS records.

    The HostRecordTTL and RegisterAllProvidersIP settings are done on the Availability Group Listener cluster resource, not through DNS, AD, or the cluster itself.

    Feel free to send me a message if you need any more info!

    -Brandon

  • Hi Brandon,

    I'm running into the same exact issue. I'd like to offer a multi-subnet AlwaysOn AG environment to all my applications with no additional configuration on their end, similar to how Failover Cluster Instances in a traditional SQL cluster provides HA transparent to the application.

    What settings did you settle on? I don't like the idea of 5 minute failovers but 50% timeouts wouldn't work at all :/

    -Drew

  • Hi Drew-

    I would suggest a 2-5 minute HostRecordTTL and a RegisterAllProvidersIP set to 0 in your scenario.

  • btuck (5/6/2013)


    I have had to do A LOT of research on this subject due to implementing a multi-subnet cluster here at my company. (For my situation, we're using Windows Server 2008 R2 SP1.) Your problem could be one of two things:

    By default, when you create the Availability Group Listener, the RegisterAllProvidersIP setting will be set to 1. That means that both subnet's IPs will be attached to the DNS A-record (i.e. your listener's name). If you are using older database clients that don't support the MultiSubnetFailover option, these clients will try to arbitrarily connect to only one of those IPs. This will result in 50% of your requests timing out.

    If you have RegisterAllProvidersIP set to 0, then only the online subnet's IP will be registered with DNS. This removes the 50% timeout problem above. However, keep in mind that you now need to consider your DNS replication settings and the HostRecordTTL property. When a failover occurs, the new, active node will make the update to the domain controller that it talks to. Since it's a different subnet, there's a good chance that this is a different domain controller than the previous, now "offline" node. Depending on which DNS you're looking at, you may or may not see the update immediately. Once DNS replication runs, all DNS records should match. That said, the local DNS cache on your client will rely on the HostRecordTTL property to know when to go out and get a fresh copy. The default for this value is 20 minutes, which means that you could wait up to 20 minutes to point to the new subnet's IP. Consider lowering this value to 5 minutes (or less) if you don't mind the extra network traffic to request DNS records.

    The HostRecordTTL and RegisterAllProvidersIP settings are done on the Availability Group Listener cluster resource, not through DNS, AD, or the cluster itself.

    Feel free to send me a message if you need any more info!

    -Brandon

    Brandon- I am facing exactly the same issue with different subnets. We are not making any changes to the connection strings. Updated the settings(hostrecordttl and registerallprovidersip) of listener. But when failover happens the dns is not getting updated and we cannot connect to listener. Any suggestions?

  • muthyala_51 (2/26/2015)


    Brandon- I am facing exactly the same issue with different subnets. We are not making any changes to the connection strings. Updated the settings(hostrecordttl and registerallprovidersip) of listener. But when failover happens the dns is not getting updated and we cannot connect to listener. Any suggestions?

    If the dns zones are AD integrated then you 'll ned to account for the replication topology pushing the updates around the DCs.

    Presumably you tested all this during the POC phase?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Presumably you tested all this during the POC phase?

    POC phase?

  • POC = Proof Of Concept

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (2/26/2015)


    POC = Proof Of Concept

    Thanks Perry. We are still in testing phase only. I will check with our network admin on the things you pointed out.

  • muthyala_51 (2/26/2015)


    We are still in testing phase only.

    This is the time to iron these things out see which way works best for your scenario

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply