Creating DR strategy from scratch for 4 production instances of SQL Server

  • Peculiar enough, all 4 instances are different versions (all Enterprise editions):  2008 R2, 2012, 2016, and 2017. All in full recovery models.

    Currently there is only 1 nightly backup of each database on each server is created and stored on network location. Someone is testing random backup file restores of each backup twice a month on a play server, to make  sure backups are good.

    I, as a new DBA (coming from primarily Development background, both Oracle and SQL Server), have been tasked with coming up with a new, robust and correct, Disaster Recovery solution that would involve the least $ spent (with or without a 3-rd party software), the least resources (we have only 2 folks on SQL team, the company is small; and the databases do not exceed 500 G size (most of them are under 100 G).  2008 R2 and 2016 and 2017 servers can allow losing 1 day of data, but 2008 R2 and 20012 servers are allowed to lose as minimum data and transactions as possible, with ability to do point-in-time recovery.

    What should i do? Where do I even start with?

     

    Your recommendations/steps/sharing your experiences/ are greatly appreciated!

     

    Likes to play Chess

  • Well, for the one day loss, you can, if you want, leave the current backups in place.

    As to the rest... woof! Huge task. I'll give you some things to do, but I know I'm not going to get you to 100% just based on a forum post.

    First, define two things with the business, the Recovery Point Objective (RPO) and the Recovery Time Objective (RTO). The RPO represents the amount of data they're willing to lose. I know the mandate is currently "minimum", but we need to define that number. 5 minutes, one hour? Whatever it is dictates a lot of our other decisions. The RTO is the amount of time it takes us to recover. That's not simply running the restore of a backup because it also includes, if necessary, rebuilding the server, moving the backups to a secondary location and then restoring them, whatever it takes to get the given database back online. This also dictates a lot of what you do.

    First thing to learn, the backup types. You're going to have to switch some of your databases from simple recovery (which they must be in now, because you didn't mention log backups) to full recovery. Then, you need to set up log backups. How frequently you backup the logs depends on your RPO. Then, your RTO can dictate that rather than run 24 hours of log backups, you have a differential backup partway through the day.

    Next, disasters can involve a wide area. You can't just backup your database locally and you're done. You need to have an offsite location for the backups too. This will allow you to recover to a second location if needed. From the sounds of things, everything is hosted locally currently. I'd suggest adding the cloud as your secondary location for backup storage. Then, the cloud can also be your secondary server location if you're experiencing a large scale disaster. Spin up VMs, restore your database (and the apps of course), and your business is running again.

    There's tons more to all this. We haven't even talked about high availability (related to, but not necessarily a part of disaster recovery). You also need to coordinate with the application, servers, storage, all the things that make up your IT infrastructure have to survive the disaster. That makes it bigger than just backups.

    I hope that's enough to move you down the path a ways.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • +1 to what Grant wrote. It's the best advice I could give.

  • Grant, thank you so much for the overview! I really appreciate it.

    Question: establishing 2nd (cloud) location.  Which paid service of this kind would you recommend?

    or, if not cloud, then what would be another option? a 3rd party service of some kind? or should TAPE backups be considered as well? (or are they already 20-th century, and should not really be considered at all in favor of cloud?)

     

    Thanks!

    Voldemar

    Likes to play Chess

  • For backup storage, the cloud is convenient. Look at Amazon S3 and Azure Blob. Decide which works better. Do you have accounts in the organization already somewhere? Then maybe choose that. You can backup to the cloud directly, or use file sync software to copy backups. However, make sure you test the time to move files. If it's large, you might prefer a second copy somewhere that's easier to move files to/from. These are just files, so decide what works best.

    Make sure you secure your cloud (or other) file storage. No public access.

    Tape is fine, again, test time to copy to/from. Also, use some offsite service like Iron Mountain to move tapes in and out of the local building, in case something happens.

     

  • I'm with Steve, Amazon or Azure blob storage. I lean pretty heavily towards Azure, but I nailed my flag to that mast a long time ago. Also, Steve has already answered the question on taps as well as I can.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Just make damned sure that you know who can access your stuff on the cloud.  A lot of people don't setup security right.  There's been more than one instance of fairly large companies that have hung their dirty underwear on the front porch for all to see and they did it totally by accident.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply