I have had to do A LOT of research on this subject due to implementing a multi-subnet cluster here at my company. (For my situation, we’re using Windows Server 2008 R2 SP1.) Your problem could be one of two things:
By default, when you create the Availability Group Listener, the RegisterAllProvidersIP setting will be set to 1. That means that both subnet’s IPs will be attached to the DNS A-record (i.e. your listener’s name). If you are using older database clients that don’t support the MultiSubnetFailover option, these clients will try to arbitrarily connect to only one of those IPs. This will result in 50% of your requests timing out.
If you have RegisterAllProvidersIP set to 0, then only the online subnet’s IP will be registered with DNS. This removes the 50% timeout problem above. However, keep in mind that you now need to consider your DNS replication settings and the HostRecordTTL property. When a failover occurs, the new, active node will make the update to the domain controller that it talks to. Since it’s a different subnet, there’s a good chance that this is a different domain controller than the previous, now “offline” node. Depending on which DNS you’re looking at, you may or may not see the update immediately. Once DNS replication runs, all DNS records should match. That said, the local DNS cache on your client will rely on the HostRecordTTL property to know when to go out and get a fresh copy. The default for this value is 20 minutes, which means that you could wait up to 20 minutes to point to the new subnet’s IP. Consider lowering this value to 5 minutes (or less) if you don’t mind the extra network traffic to request DNS records.
The HostRecordTTL and RegisterAllProvidersIP settings are done on the Availability Group Listener cluster resource, not through DNS, AD, or the cluster itself.
Feel free to send me a message if you need any more info!