Designing and Architecture

  • Hi guys, I am new to architectural field and have some very common queries. I am moving my db servers to new infrastructure. The new infrastructure will contain 2 prod db servers and 2 Dr servers. I want to know what should be the best option to get HA as well as DR out of all the options available. Consider the application criteria for availability as near as possible to 100%.

  • You need a lot more information to come up with a full HA and DR plan other than "I have four servers, now what?"

    Are you on SQL Server 2016 or 2014 or an earlier version? For the general part of HA, and to a lesser extent DR, I'd look to using Availbility Groups if I were on that version of SQL Server. However that's really only fully functional on 2014 or better (and if you're building this out from scratch, move to 2016).

    100% is impossible. 99.999% generally costs millions. You're, at best, looking at four nines 99.99%. That's 52.56 minutes a year for down time and maintenance. That means you have to configure everything nearly perfectly and have minimal issues.

    A bunch of other questions present themselves, what is your RTO, Recovery Time Objective, the amount of time it will take to recover in the event of a catastrophic failure? What is your RPO, Recovery Point Objective, the amount of data measured generally in time, you're prepared to lose? These are business discussions you have to have in order to drive the technical discussion.

    You've mentioned servers, but where are they? You can't have a DR plan with all the servers in one room. They need to be in different locations. Preferably somewhat far apart locations. This adds latency which can affect both RPO and RTO in the event you're failing over from your primary location to your secondary location.

    How about backups? Thought that through yet? You need that as part of your DR plan. Be sure to set up testing for them.

    In short, this is an enormous topic and you're not going to get adequate advice from a forum of this nature, especially based on the amount of information you've supplied. I would strongly suggest you get a consultant who specializes in this sort of thing to help. I know a bunch of experts in this area if you'd like some recommendations.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the insight Grant. We have two new prod servers in one location and two DR servers in another location. We have rto of 2 hours and rpo is point of failure. I know it takes lots of information and discussion to finalize such things. However I just needed a high level understanding like how the architectural view should look like (using clustering). If I can any such design patterns somewhere in diagramming, I would be thankful.

  • sqlenthu 89358 - Monday, July 10, 2017 2:18 PM

    Thanks for the insight Grant. We have two new prod servers in one location and two DR servers in another location. We have rto of 2 hours and rpo is point of failure.

    No where near enough information to even start a DR design. No idea of databases, interdependencies, application servers, network setup, etc, etc.

    0 data loss?  That's not going to come cheap. You need redundant network connections between the data centers, high speed networks, synchronous 'DR' of some form, log backups that go offsite immediately, etc. It's not easy at that point.
    You need to look at other business restrictions, skills available in the admin team, budgets, and a lot of other things.

    If they need 0 data loss in the case of a disaster, any disaster, then they probably should get in a top-level specialist to help them get the right setup. I recommend Denny Cherry. He's not cheap, but their requirement is not going to be cheap and with DR the cost of getting it wrong is often the largest cost.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • sqlenthu 89358 - Monday, July 10, 2017 2:18 PM

    Thanks for the insight Grant. We have two new prod servers in one location and two DR servers in another location. We have rto of 2 hours and rpo is point of failure. I know it takes lots of information and discussion to finalize such things. However I just needed a high level understanding like how the architectural view should look like (using clustering). If I can any such design patterns somewhere in diagramming, I would be thankful.

    I'm pretty much in line with Gail. The fact is, you're not going to hit zero with the servers you have listed. It's extremely complex. I also heartily recommend Denny Cherry. He knows his stuff. An alternative would be David Klee. I'm not sure if he costs more or less than Denny, but he'll get the job done.

    I'll give you an approximation on what you could do. You haven't said what version of SQL Server we're working with. Hopefully 2014 or better, preferably 2016. In that case, a bare-bones set up (and this is a vague, holding a finger in the wind to gage temperature, pressure, wind direction & speed and what color socks I'm wearing, so please treat it as such) would be to have your local servers configured in an Availability Group. Don't go for failover clusters. The technology isn't loved any more by Microsoft. All the love goes to AGs. You can have one server active, the other passive (which, I'll bet money, your business immediately nixes, most do. They don't like servers sitting around doing nothing) with the AG in synchronous connection. That gives you a pretty good chance for HA locally with as close to zero data loss as you're likely to get. Then, you can have the secondary servers in asynchronous connection through the AG. I'm not sure how you'd deal with the second server in the DR site. Also, this set up means the potential exists for data loss. You can't do synchronous over distances, latency will kill your transactions and you'll be spending all your money on disks for storing the logs and you'll lose the data anyway.

    None of that addresses backups, logs, offsite storage, testing, etc. This is not a small undertaking.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Monday, July 10, 2017 7:10 PM

    Hopefully 2014 or better, preferably 2016. In that case, a bare-bones set up (and this is a vague, holding a finger in the wind to gage temperature, pressure, wind direction & speed and what color socks I'm wearing, so please treat it as such) would be to have your local servers configured in an Availability Group. Don't go for failover clusters. The technology isn't loved any more by Microsoft. All the love goes to AGs. You can have one server active, the other passive (which, I'll bet money, your business immediately nixes, most do. They don't like servers sitting around doing nothing) with the AG in synchronous connection. That gives you a pretty good chance for HA locally with as close to zero data loss as you're likely to get. Then, you can have the secondary servers in asynchronous connection through the AG.

    And then you ensure that you have log backups, running very often, going to a 3rd location (preferably) or at minimum to the DR site and then to a 3rd location. Your database backups go there as well. These should be encrypted and versioned (you can get old versions of the backups back)  This is so that dropped tables, dropped databases or ransomware don't take the entire infrastructure out.

    You also need a way to do automatic failover of the application, so that you can completely swing to the second data center. Also consider redundant domain server and DNS server, redundant email if the app depends on it.

    Finally, consider the people. In the case of a major disaster, staff may be unavailable or more concerned with their families than the business, so you may need a distributed administration team, if you don't have one already.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply