Backups Aren't Backups Until a Restore Is Made

One of the interesting things I saw in the recent GitLab outage and data loss was the fact that none of their backups were available. They use PostgreSQL and I'm not familiar with the ways in which the modern PostgreSQL engine handles backups or the options you have, so I'm not knocking either GitLab or PostgreSQL. It's possible one or the other had fewer options than we do with SQL Server with our full, differential, log, and filegroup backups, all during live database activity.

There was a live stream and a Google Doc open during the incident, showing the response by their employees (and plenty of Hacker News comments). Kudos to GitLab for their bravery and transparency in showcasing their mistakes and choices. I've been in similar situations, and the war room can be chaotic and stressful. There have been no shortage of times when someone makes a mistake under pressure and we scramble to recover from the damage. I've made those mistakes and understand how they happen when you get desperate and are tired. This is one reason I've usually insisted that when an incident is declared, I immediately send at least one person home to rest. I never know what time I'll need to get them back.

In reading the notes, there are a number of issues. One of the respondents doesn't know where the once a day backups are stored (1). The location they check has files only a few bytes in size, so backups might not be working (2). No disk snapshots in their Azure space for database servers (3), though the NFS servers get them. The snapshot process is incomplete, in that once snapshots are made, some data is removed from production, and will be lost in this recovery (4). The backups to S3 don't work (5). All of this results in a backup that is six hours old being restored. For people that commit code often, this could be a lot of data. Hopefully there weren't too many merges and branch deletions in this time for customers.

A backup doesn't matter. A restore matters. It doesn't matter what backup process you have, if you don't test it, then you don't know if you can recover. In fact, with databases (really any system), you need to test the restores regularly because the backup process can fail. I learned this early in my career when one of our admins realized his fancy tape changer that let him only change tapes once a week was broken. The drive had stopped writing and he never noticed.

Not only is it important to monitor that the backup process runs, it's important to ensure the backup files exist, where we expect them to exist. If this is a remote location, you need monitoring there as well. It's also important to restore backups regularly. Ideally you'd test every one, but at least get a regular rotation of testing once a week to ensure your process is working.

If you don't, then you risk not only data loss, as GitLab experienced, but an RGE. That's a resume generating event, and it's something none of us would like to experience.

How Far Back?

by Steve Jones

SQLServerCentral.com

Editorial

Today Steve Jones discusses the need to restore older backups.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(1)

You rated this post out of 5. Change rating

2018-05-30

71 reads

Discuss

Test Your Restores

by Steve Jones

SQLServerCentral.com

Editorial

Part of an effective response to a disaster situation is practice and testing of your skills and procedures. Steve Jones reminds us this is important today.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2012-12-11

143 reads

Discuss

Archive Backups

by Steve Jones

SQLServerCentral.com

Editorial

The need to archive data is becoming more and more important as data sizes grow. However when you choose to archive data, you might need to reconsider how your DR plan is structured.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2010-06-16

247 reads

Discuss

The Need for Tape

by Steve Jones

SQLServerCentral.com

Editorial

Are tape systems obsolete? A recent incident has Steve Jones thinking perhaps not.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(1)

You rated this post out of 5. Change rating

2013-10-01 (first published: 2009-03-26)

420 reads

Discuss

Backup to the Cloud - No Excuses

by Brad McGehee

SQLServerCentral.com

Editorial

With the proliferation of high availability & specialist online Backup companies, do we really have any excuses left to NOT have an offsite backup location, even if it is in the Cloud? Brad things not...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-08-10

143 reads

Discuss

Backups Aren't Backups Until a Restore Is Made

Rate

Share

Categories

Share

Rate

Backups Aren't Backups Until a Restore Is Made

Rate

Share

Categories

Share

Rate

Related content

How Far Back?

Test Your Restores

Archive Backups

The Need for Tape

Backup to the Cloud - No Excuses