SQL Backup Strategy Solution

  • What are people using for their entire SQL backup strategy solution?

    We use Ola Hallengren’s maintenance solution to backup all of our SQL databases.  We write those backups to a physical server that has multiple drives on it.  Our network folks then use Rubrik to create a VSS snapshot each night of each drive of the physical backup server.  When we run Ola’s backup scripts we delete files older than 23 hours on the physical server.  If we need a file after it’s been removed from the backup server, we will get it from Rubrik from one of the VSS snapshots it took.  We then have a retention schedule of those files in Rubrik that were captured with the VSS snapshot.

    We recently discovered that Rubrik is failing to back up some files when it’s doing its VSS snapshot.  The scenario is as follows: Rubrik runs one of its VSS snapshots at 4 a.m. on one of the drives on the physical backup server.  Between the time that Rubrik starts its snapshot and ends it snapshot, tlog files are deleted.  The tlog files that were present when Rubrik started its snapshot may not be present anymore when the snapshot finishes because when we run our backups (full and tlog) we delete any file older than 23 hours.

    This situation has underscored the need for us to re-evaluate our SQL backup strategy solution.  Feedback from you on what your company uses and what your SQL backup strategy solution looks like will be appreciated.

  • Our solution is a bit different than yours, but our RTO and RPO drives it differently than yours does.

    We used to use maintenance plans, but have switched over to direct TSQL coded in-house for our backup solution.  We investigated Ola's scripts, but they were too lengthy and complex for our small team to support and we only have 1 3rd party backup solution in place so it didn't make sense to grab a backup script like Ola's that handled multiple backup solutions.  By using our own scripts, we can tweak and tune things as needed.  We went with a multi-stored procedure approach where one is for backup, one is for index maintenance, one is for statistics maintenance, etc and a final "master" script that can call these other stored procedures based on parameters.  Having the smaller scripts made debugging almost trivial as each stored procedure can be run stand-alone and it is easy to test and fix any problems that come up.

    Next, all of our backups go out to a disk on the SAN.  The disk is hosted on one of our SQL Servers, so 1 out of 3 servers has direct access to the disk while the other 2 need to go through the 3rd to handle the backups.  We have that disk set up in DxEnterprise to fail over to the other 2 servers in the event the primary one goes offline.  This way our backups of any SQL instance should never fail due to disk issues except if the backup is in progress when the failover occurs.

    We keep the backups on disk for 2 weeks.  This is so in the event we need to do a restore, we nave data we can grab going back 2 weeks from today without needing to get the data off of a long-term backup solution.  The backups are copied to tape nightly that are moved off-site weekly.

     

    The reason we keep the backup files on disk for 2 weeks is restoring from tape is slow, plus we then have multiple tapes we can grab the backup from in the event a tape goes bad or the tape backup reported success but actually failed.  Results in duplicate data on tapes, but redundancy is a very big "nice to have" for me.

    If I was you, I'd be trying to keep the data on disk for longer even if it means wasted disk as I'd rather waste a few hundred GB of disk than have a backup get "lost".

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply