moving from tx replication to ao/ag possibly without fci....

  • hello,

    we are due to upgrade our production cluster within the next 6 months and have been advised by our hosting team that we should consider moving to ao/ag rather than using standard win/sql cluster with replication. we currently have a two-node cluster, sitting over a shared SAN, hosting a single instance, with about 3.5TB of data, and using tx replication to provide an offsite real-time (ish) copy of data for reporting. due to the sensitive nature of our services, even a quick cluster failover causes the majority of the connections to die so was thinking that we can drop the fci and just use ag's? we are also starting to hit hard I/O limits of the shared SAN so....

    my setup thinking is to split the main instance into two....we have a read heavy and a write heavy db and going to put these on separate servers. each one will hold a synchronous replicate of the other and all other ancillary databases will be spread. there will be an asynchronous replicate(s) offsite, connected by vpn, to provide a reporting database or two.

    i believe the ag listener will handle shifting the connection endpoint to the relevant primary replicate database wherever that may sit so we dont have to worry about that....and the fact that all connections will be dropped is a reality for us now so doesnt introduce any new problems. the application handles connection drops better than timeouts anyway.

    would a third node be a good idea? would figuring out what goes where in the case of a hardware failure be too complicated?
    the offsite replicates will be in a different domain, no trust, PCI Level 1 security - is this an issue?

    thoughts....pitfalls....basic wrongness??

    ta 🙂

  • Hi Samuel,

    "quick cluster failover causes the majority of the connections to die so was thinking that we can drop the fci and just use ag's?" - Yes. I would go for AGs

    "would a third node be a good idea?" I think it is a good idea. The following can be your AG setup

    Node1 = Primary replica
    Node2 = Secondary replica | Synchronous (for high availability)
    Node3 = Asynchronous | for Disaster Recover | Reporting

    I assume that your primary and secondary replica is on the same data center. If yes, and if anything goes wrong in the primary DC, then you can failover to Node3 (DR) which is on a different DC.

    You might also want to run your heavy reporting queries on your Node3 instead of Node2.

    If you are going to use SQL Server 2016, then it is possible to configure AG on a different domain.

    https://blogs.technet.microsoft.com/dataplatforminsider/2015/12/15/enhanced-always-on-availability-groups-in-sql-server-2016/

  • Secondary and third nodes can be useful for offloading reporting/read only , but be careful  with the
    - Licensing Cost for additional instance
    - Latency added to transaction commit for synch setups
    - Re develop a lot of maintenance script to suite AlwaysOn configuration.
  • samuel.bartick - Friday, September 1, 2017 5:33 AM

    so was thinking that we can drop the fci and just use ag's?

    Err, get it wrong and you're likely to have just as many issues with AGs. The failover can be fairly quick, but really depends on the redo threads and the availability mode (synch or asynch). Also note that client connections are still disconnected during an AG failover as the cluster resource moves from one node to another

    samuel.bartick - Friday, September 1, 2017 5:33 AM


    i believe the ag listener will handle shifting the connection endpoint to the relevant primary replicate database wherever that may sit so we dont have to worry about that...

    No, the AG listener is just the shared endpoint for the Client Access Point.
    It's the Windows Server Failover Cluster subsystem that handles cluster resources in conjunction with the database engine.
    Automatic Failover occurs only to a defined pair of replicas

    samuel.bartick - Friday, September 1, 2017 5:33 AM


    would a third node be a good idea?

    What would you use it for?
    More importantly, what quorum model have you employed for the current 2 node WSFC?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply