Always on synchronous

  • In the past I've used database mirroring in high safety mode and always fallen foul of the issue that our mirror might go down (service pack, whatever) thus not letting the primary commit transactions

    We're moving to 2019 with a 3 node HA group - 2 of the nodes are synchronous, the 3rd is asynchronous and used for reporting.

    I'm worried about the synchronous pair, what do I need to look for when we set it up?

    MVDBA

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • Even better... does anyone know a company that coaches in HA technology (UK based) ?

    MVDBA

  • Not seen a company who specialises in HA, each vendor will give you their own speel on what their product can do, but I find HA comes down to what the business need.

     

    Are you splitting the sync and async parts across data centres by any chance?  That may cause an issue should the sync pair die due to lack of quorum.  I know you can force the AG to come online, but that to me just seems dirty.

     

    As for the sync pair, obviously you want them as low latency as you possibly as close to each other network kit wise without introducing single points of failure.  Should anything cause a sync secondary to become unavailable the docs say the mode switches to async, then when everything is back it reverts to sync.

    https://docs.microsoft.com/en-gb/sql/database-engine/availability-groups/windows/availability-modes-always-on-availability-groups?view=sql-server-ver15

    Note

    If primary's session-timeout period is exceeded by a secondary replica, the primary replica temporarily shifts into asynchronous-commit mode for that secondary replica. When the secondary replica reconnects with the primary replica, they resume synchronous-commit mode.

    I've yet to see it not do the above in an unexpected reboot or patch window, others may contribute their experiences with it.

     

    The biggest issues I've had with AG's is database corruption and dumps due to the health check level being set high.  Well actually default, just no one bothered to see what was causing failovers every day.  The health checks where failing due to corruption in SharePoint databases and as it caused a SQL dump the AG saw the DBCC CHECKDB error and cause a failover. Rinse and repeat until we got control over the setup and sorted it.  Secondary to that is it was set to do COPY_ONLY log backups, so nothing had been marking the log as reusable, logs in the TB sizes only to be backed up in increasing size due to being non reusable.

     

     

  • the sync pair are within 2 meters of each other, the asycn is 400 meters away

    (we have plenty of other DR policies in place - don't worry) - but the checkDB issue is a nice one for me to look into

    Thanks

    MVDBA

  • by the way - all of it is on site. no data centre, we have our own greenlake/nimble san and a new beefy server for consolidating licences and removing linked server

    MVDBA

  • If it's all staying local to one site then the quorum issue doesn't come into play. Only when you start using it for the DR purposes cross sites and have an odd number of nodes does it creep its head with potential issues.

    As I say the only real issues I had with the AG's where out of control of the AG anyway, seems they have done a fairly decent job on it. Obviously wouldn't call it "Always On", "Nearly Always On" is a more of a better term as you will still have some downtime for failovers etc, but they are a lot smoother than traditional FCI's

  • I'm being thick here  - (that happens a lot), but why does an odd number of nodes not help? I'm more of a query plan guy and I've been dumped into quick thinking in order to stop us "royally making a mess of it" - given that it's already £500,000 on the SAN and £175,000 on our new hardware, I need to get my "ducks in a row"

    by the way, before anyone says - stupid for buying hardware before having a plan.... I wasn't consulted , welcome to being a DBA

    MVDBA

  • It's about establishing a quorum for the cluster.  Generally there is an odd number of votes needed to establish quorum to ensure things can operate correctly.

    A 2 node FCI would have a quorum disk or a fileshare witness or a azure storage account as the thing that establishes quorum.  Should 1 item fail, you still have quorum 2 of 3 is majority, if 2 items fail, you're as dead as a dodo as 1 of 3 is not majority.

    As AG's are somewhat reliant on the FCI concepts (unless you do cluster less AG's).

     

    Now for a 3 node AG setup which is within the same site "Building1" your fine with the setup as you can loose 1 of the nodes, everything is fine.  Loose 2 nodes your dead.  Same as you would be in a traditional cluster.  That's all well and good as your only providing local HA in "Building1"

    If you want to do DR where you have 2 nodes in "Building1" and 1 node in "Building2" (B2 being your DR site), should the proverbial hit Building1 and you have to invoke DR, as you've lost 2 nodes, 1 from 3 is not majority so the cluster and resources are down.  You need to force failover to bring it online (as I say that always seems dirty to me to force it).

    So you need to find a way to have a majority vote should the worst happen.  So add 2 nodes in DR if money can.  Up the vote of the node in DR from 1 to 2.  Then add in an external witness typically Azure storage if you can saves having a 3rd Building to bring a file share witness.

    That way you have 2 votes in "Building1", 2 votes in "Building2", 1 vote in Azure.  So if "Building1" was to die, 3 of 5 is majority, service is operational.  Lose "Building2" again 3 of 5 is majority.  Lost Azure, 4 of 5 is majority.

     

     

    But if your not using the AG to provide disaster recovery, and only doing it for local HA then your setup you have now is setup correct.

  • so this is really cool info and so much better explained that msdn. I can't go up to 4 nodes, so would you recommend I drop down 2  rather than 3? - the 3rd node  was planned for reporting

    when you ask is our Ag there for DR , in that case i'm a tad confused … for me, DR is how I can get a backup online or failover a system as fast as possible

    I think the answer Is local HA combined with our backups and that horrible garbage called "TAPE"

    MVDBA

  • It all comes down to need, again with 2 nodes you need to find a way to add a 3rd vote in somewhere, Azure storage, fileshare witness.  As you have no quorum should a node go down, so you need 3 votes as minimum done in some way shape or form.

     

    AG's are a way to do HA and DR together, with multi subnet failover clusters or DAG's.

    Take an Azure setup for instance, you have 3 nodes in UK-SOUTH, you get local HA, if UK-SOUTH was to die you've lost everything.  Franticly create a new AG setup in UK-WEST, restore everything from "tape" (shudder) hopefully within your agreed RTO.  If this an analogy for your setup, then your good the way you are.

    2 nodes in UK-SOUTH, 1 node in UK-WEST, you have local HA between UK-SOUTH's nodes and added DR with UK-WEST, so business can continue while UK-SOUTH is down, failed over in a matter of moments, meets your RTO and RPO.  If this an analogy then it needs a few tweaks to make it work like modifying votes.

     

  • trust me , i'm already shuddering - i'm asking because I want to get it right 🙂

    MVDBA

  • Odd number of nodes isn't that big of an issue - as said before you can up the votes on resources individually -

    Ant-Green wrote:

    As for the sync pair, obviously you want them as low latency as you possibly as close to each other network kit wise without introducing single points of failure. 

    This, however is IMHO the most critical part, people do look at the bandwidth in use and decide "1Gb/s network will do", while it's not about the bandwidth at all. It's essentially about how many nw gear hops exist between the listener address on the front end and  reaching both sync nodes on the "backend" (Cluster) network and how much latency does each hop introduce. For example your network performance will be different if you have to pass multiple switches on your replication network, each switch port you have to pass adds X ns of delay - every single time so it does make a difference already if you need to go through a switch at all or not and if you have - is it 1 - 2 or x? What's the latency on that switch?

    Second most critical part I'd consider lowest latency possible drives dedicated to log drives.

    Now I understand why you would love to see that post I promised you 🙂 No worries it's in the works, all the hardware is finally in one place I just need to find some time to do the setup and benchmarking - hope is I'll be able to do part of it this coming friday.

  • I would recommend adding either a disk or file share to the 3-node cluster giving (by default) 4 votes.  Then - I would set the node weight of the 3rd node (reporting) to zero, effectively taking it out of quorum.  Basically, you now have a 2 node cluster and quorum is based on the 2 actual nodes and your witness.

    With that setup you can take down reporting at any time and not affect the cluster.  You can also take down reporting - and lose another server and still be operational with a single server and your witness (disk/file).

    Now - with that said...what is the purpose of setting up an AG for HA only in the same building?  Do you have separate SANs for each node or is all storage coming from the same SAN?  If the same SAN then your redundancy isn't really redundant...lose the SAN and you lose all servers in the cluster anyways.

    So...why not just utilize a standard FCI 2-node cluster and add a 3rd node (setting the node weight to 0 and adding a disk/file share witness) in an AG just for reporting?  Works exactly the same...except you don't have to duplicate the data across 3 different sets of volumes on the SAN.  The only downside (that I am aware of) would be a slightly longer fail over time...if it even takes longer.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I was just having similar thoughts, You could go FCI 2 node hardware wise if you want with dual expander JBOD shelves and just replicate to the 3rd node. I would most likely use those dual expanders on one host (more bandwidth to the disks) and replicate via 100G networks and use a file share witness. All the money saved on the SAN would go right away into Optane / PCIe SSDs. And then we shall see how fast a SAN compared to some Intel Optane storage as dedicated log drive really is.

    Oh and consider at least doing yourself one favour: Put the TempDB files on local PCIe storage and not the SAN.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply