Do You Verify Your Database Backups?

  • Well you can always rotate those backups so that each is restored at least once per period (probably best defined by being better than any external requirement for verifying backups - someone mentioned every quarter I believe).

    All of our production databases are restored nightly to a test machine - it happens to be in a different location so bandwidth is more an issue than disk space. Apart from verifying the restore it also allows me an environment to regression test and, on occasion, to investigate issues without impacting live data. Obviously there is a security element to all this and the test server is secured to the same standards as the production servers, all backups are encrypted, and it is covered by SAS70 audits.

    The only failures I've experienced so far (in a couple of years) have been ftp related ... the biggest benefit by far is the peace of mind in knowing that even under the worst circumstances I'll only lose 24 hours of data.

  • I'm really glad to see this topic discussed. It's something I've lately been reading on -- or trying to, anyway. Almost every article on backups will say "test your restores", but almost no one explains the options or gets into details.

    I always verify my backups, but haven't had a place to do regular test-restores. That will be changing soon, so I'm working on a plan.

    For those of you who do test-restores regularly: do you just restore to the most recent possible, or do you also do tests of point-in-time restores?

  • I even have a script to figure out which log to replay to get to any point in time!

    You should practice it every once in a while though.

    And in addition run checkb regularly. never trust the Hardware!


    * Noel

  • Yes...I do. I perform at least one full restore per week.

  • Clive,

    Is this required? Do you log/record this as a part of your job?

  • Hi Steve,

    Part of it is required. We may need to restore a client database as part of an upgrade process or we may need a fresh copy of a central database for dev/test/qa purposes. These are the required ones.

    However, I still perform my own validation of the database backups. I just make time on a weekly basis to perform these restores for sanity purposes.

  • When I had made a backup schedule for our 85 databases in the network I had put all verification options 'on' in our external backup software program (that time we were using DataBK backup). I remember vividly how I got into a discussion over using 'precious CPU time on our production servers' for something trivial as verifying the backups (which in his view had never failed when he was DBA).

    So I turned it off on all 85 backups; because in the end all I can do as DBA is warn my employer of the dangers; in the end it is HIS decision on what kind of risks he is willing to take in order to sacrifice server productivity over restore quality.

  • OK... Three years later, and not much has changed. We are still preaching the value of testing our backups, and complaining about the scarcity of resources (time, space, [virtual] machine, etc.) available to test backups.

    Has anyone found a readable guide to determining the criteria to use in deciding how frequently you need to test those backups, and then working those criteria into a formula that will provide you a 95% / 98% / 99% likelihood of NOT experiencing a restore failure if your backup test samples work 100%?

    I thought I had found such an article recently, but then wiped out the recall on my firefox browser and have not been able to rediscover it. :crying:

    Steve

  • bas de zwart (9/30/2009)


    When I had made a backup schedule for our 85 databases in the network I had put all verification options 'on' in our external backup software program (that time we were using DataBK backup). I remember vividly how I got into a discussion over using 'precious CPU time on our production servers' for something trivial as verifying the backups (which in his view had never failed when he was DBA).

    So I turned it off on all 85 backups; because in the end all I can do as DBA is warn my employer of the dangers; in the end it is HIS decision on what kind of risks he is willing to take in order to sacrifice server productivity over restore quality.

    It's silly, isn't it? These are the same people that embrace storing XML in the database and repeatedly shredding it.

    As the DBA, I'm pretty sure that I'm the one that knows what's best when it comes to protecting the data. I'm also pretty sure that I'll be the one hung out to dry if we ever have a failure and the restore from backups fails. I've only had to remind one boss that it's not his decision nor his hiney at stake and that we were going to do it the right way.

    As a side bar, I'm tickled and incredibly fortunate that I'm currently working for a company where the Infrastructure Team believes in making sure that backups, DR, and regular restore tests (some occur nightly) are the number one priority.

    I don't have a formula to help you determine the chances of successful restores but I'd say your boss just lowered the percentage.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well, in my case, I get to BE the policy wonk - I'm looking for a defensible basis (found some 'fancy' math papers talking about Markovian processes but my employer won't accept that as a valid justification. Among the questions we have to grapple with are:

    1. the relationship between backups and records retention (i.e., do we really care if a record was deleted from a database 3-4 months ago and will we need to retrieve it from an old backup?)

    2. the relationship between backups and resource utilization (which has been discussed only qualitatively, never with a discussion of actual resources used), which ignores what is the most expensive cost of backups - the monetary loss in terms of time needed to perform a restore.

    3. the level of intervention required to review and verify that (a) backups worked (b) they are usable and (c) the content is worthwhile (i.e., the database is not already corrupt).

    I hate when I don't remember the 'lucky' google search string that found the original article. If I find it, I will post it here.

    Incidentally, everything I see on determining backup frequency and testing frequency sounds like my ex-wife's old t-shirt showing two math professors showing a three step process:

    1. complex mathematical expressions.

    2. a miracle occurs

    3. complex mathematical expressions.

    the caption quotes one professor saying to the other: 'I think you need to be more explicit in step 2.'

    Steve

  • Steve, I think this[/url] is the link you refer to.

    We use the verify option on all our backups here, but we also have an automated process for test restoring our backups. Once you've done all the work to set that up, you may as well test as many restores as you can, so here's what we do:

    (1) Choose 10 databases at random from the last set of database backups. Only databases that were successfully backed up are considered, and if a database has already been test restored in the last 24 hours, it is not considered. We also have a list of databases that we force to be chosen every day.

    (2) Restore the database but don't recover

    (3) Restore any and all log backups between the backup and now

    (4) Recover the database

    (5) Run DBCC CHECKDB against the database at the same time as restoring the next databases

    (6) Alert the team if the restore was not successful or DBCC picked out any issues

    (7) When all 10 databases have been restored and checked, pick another 10 and start again

    (8) All activity is logged to tables so that we can find out what went wrong or provide evidence to auditors (and ourselves) that backups have been tested

    John

  • John,

    Thank you! YES!!!! Who'd a thunk the key search phrase would be 'statistical sampling'

    Steve

Viewing 12 posts - 46 through 56 (of 56 total)

You must be logged in to reply to this topic. Login to reply