Developing a backup plan

Without a good backup and recovery strategy in place, a DBA should make sure their resume is always up to date. In this article, Greg Larsen discusses what should be considered when developing a backup plan.

The most important task for a DBA is to be able to recover a database in the event of a database becoming corrupted. Corrupted databases can happen for many different reasons. The most common corruption problem is from a programming error. But databases can also be corrupted by hardware failures. Regardless of how a database becomes corrupt, a DBA needs to have a solid backup strategy to be able to restore a database, with minimal data loss. In this article, I will discuss how to identify backup requirements for a database, and then how to take those requirements to develop a backup strategy.

Why develop a backup plan?

You might be wondering why you need to develop a backup plan. Can’t a DBA just implement a daily backup of each database and call it good? Well, that might work, but it doesn’t consider how an application uses a database. If you have a database that is only updated with a nightly batch process, then having a daily backup of the database right after the nightly update process might be all that you need. But what if you had a database that was updated all day long from some online internet application. If you have only one backup daily for a database that gets updated all day online, then you might lose up to a day’s worth of online transactions if it was to fail right before the next daily backup. Losing a day’s worth of transaction most likely would be unacceptable. Therefore, to ensure minimal data loss occurs when restoring a database, the backup and recovery requirements should be identified first before building a backup solution for a database.

Identifying backup and recovery requirements

Each database may have different backup and recovery requirements. When discussing backup and recovery requirements for a database, there are two different types of requirements to consider. The first requirement is how much data can be lost in the event of a database becoming corrupted. Knowing how much data can be lost will determine the types of database backups you need to take, and how often you take those backups. This requirement is commonly called the recovery point objective (RPO).

The second backup requirement to consider is how long it will take to recover a corrupted database. This requirement is commonly called the recovery time objective (RTO). The RTO requirement identifies how long the database can be down while the DBA is recovering the database. When defining the RTO, make sure to consider more than just how long it takes to restore the databases. Other tasks take time and need to be considered. Things like identifying which backup files need to be used, finding the backup files, building the restore script/process, and communicating with customers.

A DBA should not identify the RTO and RPO in a vacuum. The DBA should consult each application owner to set the RTO and RPO requirements for each database the application uses. The customers are the ones that should drive the requirements for RTO and RPO, with help from the DBA of course. Once the DBA and the customer have determined the appropriate RTO and RPO then, the DBA can develop the backups needed to meet these requirements.

Types of backups to consider

There are a number of different backup types you could consider. See my previous article for all the different backup types. Of those different backup types, there are three types of backups that support most backup and recovery strategies. Those types are Full, Differential, and Transaction log.

The Full backup, as it sounds, is a backup that copies the entire database off to a backup device. The backup will contain all the used data pages for the database. A full backup can be used to restore the entire database to the point in time that the full backup completed. I say completed because, if update commands are being run against the database at the time the backup is running, then they are included in the backup. Therefore, when you restore from a full backup, you are restoring a database to the point-in-time that the database backup completes.

A differential backup is a backup that copies all the changes since the last full backup off to a backup device. Differential backups are useful for large databases, where only a small number of updates have been performed since the full backup. Differential backups will run faster and take up less space on the backup device. A differential backup can’t be used to restore a database by itself. The differential backup is used in conjunction with a full backup to restore a database to the point in time that the differential backup completed. This means the full backup is restored first, then followed by restoring the differential backup.

The last type of backup is a transaction log backup. A transaction log backup copies all the transaction in the transaction log file to a backup device. It also removes any completed transactions from the transaction log to keep it from growing out of control. A transaction log backup, like a differential type backup, can’t be used by itself to restore a database. It is used in conjunction with a full backup, and possibly a differential backup to restore a database to a specific point-in-time. The advantages of having a transaction log backup are you can tell the restore process to stop at any time during the transaction log backup. By using the stop feature, you can restore a database right up to the moment before a database got corrupted. Typically, transaction logs are taken frequently, so there might be many transaction log backups taken between each full or differential backup. Transaction log backups are beneficial for situations when there is a requirement of minimal data loss in the event of a database becoming corrupted.

Developing a backup strategy for a database

When determining a backup plan for a database, you need to determine how much data can be lost and how long it takes to recover the database. This is where the RTO and RPO come in to determine which types of database backups should be taken. In the sections below, I will outline different database usage situations and then discuss how one or more of the backup types could be used to restore the database to meet the application owner’s RTO and RPO requirements.

Scenario #1: Batch updates only

When I say “Batch Updates Only”, I am referring to a database that is only updated using a batch process. Meaning it is not updated online or by using an ad hoc update processes. One example of this type of database is a database that receives updates in a flat-file format from a third-party source on a schedule. When a database receives updates via a flat-file, those updates are applied to the database using a well-defined update process. The update process is typically on a schedule to coincide with when the flat-file is received from the third-party source. In this kind of update situation, the customer would have an RPO that would be defined something like this: “In the event of a corrupted database, the database would need to be restored to after the last batch update process”. And would have an RTO set to something like this: “The restore process needs to be completed within X number of hours”.

When a database is only updated with a batch process, that is run on a schedule, all you need is to have a full back up right after the database has been updated. By doing this, you can recover to a point in time right after the database has been updated. Using the full backup only will meet the RPO. Since the time needed to restore a full backup is about the same time as it takes to backup, the RTO needs to be at least as long as it takes to run a restore process, plus a little more time for organizing and communicating a restore operation.

Scenario #2 – Batch updates only, with short backup window

This scenario is similar to the last scenario, but in this situation, there is very little time to take a backup after the batch processing completes. The time it takes to back up a database is directly proportional to the amount of data that needs to be backed up. If the time for a backup is short, it might be too short to take a full back up every time the database is updated. This might be the case when the database is very, very large. If there isn’t time to do a full database backup and the amount of data updated is small, then a differential backup would be a good choice to meet the RPO/RTO requirements. With a differential backup, only the updates since the last full backup are copied to the backup device. Because only the updates are backed up and not the entire database, a differential backup can run much faster than a full backup. Keep in mind, to restore a differential backup, you must first restore the full backup. In this situation, a full backup needs to be taken periodically with differential backups being taken in between the full backups. A common schedule for this would be to take the full backup when there is a large batch window, like on a Sunday when there is no batch processing, and then differential backups during those days when the batch window is short.

Scenario #3 – Ad hoc batch updates only

Some databases are not updated on a schedule but instead are updated periodically but only by an ad hoc batch update process that is manually kicked off. In this situation, there are a couple of different ways of handling backing up of databases that fall into this category. The first one is just routinely to run full database backups on a schedule. The second is to trigger a backup as the last step of the ad hoc batch update process.

A routine scheduled full backup is not ideal because the backups may or may not be run soon after the ad hoc batch update process. When there is a period of time between the ad hoc process and the scheduled full backup, the database is vulnerable to data loss should the database become corrupted for some reason before the full backup is taken. In order to minimize the time between the ad hoc update and the database backup, it would be better to add a backup command to the end of the ad hoc update process. This way, there is a backup soon after the ad hoc process, which minimizes the timeframe for when data could be lost. Additionally, by adding a backup command to the ad hoc update process, you potentially take fewer backups, which reduces the processing time and backup device space, over a routine backup process.

Scenario #4 – Online updates during business hours

In this scenario, the database gets updates from online transactions, but these online transactions are only run during business hours, say 8 AM to 5 PM. Outside of regular business hours, the database is not updated. In this situation, you might consider a combination of two different types of backups: Full and Transaction log backups. The full backup would be run off-hours, meaning after 5 PM and before 8 AM. The transaction log backups will be used during business hours to back up the online transactions shortly after these transactions have been made. In this situation, you need to review the RPO to determine how often to run transaction log backups. The shorter the RPO, the more often you need to take transaction log backups. For example, suppose a customer says they can lose no more than an hour worth of transactions, then you need to run a transaction log backup every hour between the hours of 8 AM and 5 PM.

Scenario #5 – Online updates 24×7

Some databases are accessed every day all day. This is very similar to Scenario #4, but in this case, the database is accessed and updated online 24×7. To handle backup and recovery in this situation, you would take a combination of full and differential backups along with transaction log backups.

With a database that is updated 24×7, you want to run the full and differential backups at times when the databases have the least number of online updates happening. By doing this, the performance impact caused by the backups will be minimized. There are way too many different database situations to tell you exactly how often a full or differential backup should be taken. I would recommend you try to take a full backup or differential backup daily if that is possible. By doing it daily, you will have fewer backup files involved in your recovery process.

The transaction log backups are used to minimize data loss. Like scenario #4, the frequency of transaction log backups is determined by the RPO requirements. Assuming that the customer can lose one hour’s worth of transitions, then transaction log backup would need to be run hourly, all day, every day to cover the 24×7 online processing.

Developing a backup plan

It is important for a DBA to work with the application owners to identify the backup and recovery requirements for their databases. The application owners determine how much data they can lose (RPO), and how long the database can be down while it is being recovered (RTO). Once the RTO and RPO requirements are defined, the DBA can then develop a backup plan that aligns with these requirements.