The Backup Myth

  • Comments posted to this topic are about the item The Backup Myth

  • Having been in IT for 20+ years and being the one on several jobs telling the client 'hey you know those backups you've been doing, well they are no good' when it comes to the crunch. I have learnt through bitter experience that full backups are the only way, it does not matter if we are talking about Windows, SQL Server, Exchange full backups are the only way, forget clever agents, forget incremental or differential (slightly better than incremental backups) backups as you are relying on more than a single source of data to get the thing running again. Don't rely on a single technology e.g. tape, do backups to file too. Always use the built in backup procedures and dont get fancy e.g. ntbackup, backup built into SQL server. Dont do what I have seen some other companies do backup SQL server to a file on the same server, always backup to another location (different server, maybe different location). Testing backups through restores is the only way to know that your backup solution will get you back up and running in the event of an error. It is funny how when you test a backup solution through restoring the data how many times you say 'I forgot I needed x to get this to work', service packs are something a lot of people forget about it is important to note what service pack level a program is at when doing a restore especially in the event of total loss of hardware.

  • And here, more than anywhere else, is where I believe DBAs really earn their money. There's no substitute for experience, and very specific experience at that. It's all very well having a backup and recovery process based on a theory that looks perfect, but until it's proven to work in practice, it's useless. And how do you know it's proven to work in practice? Test restores in "clinical" conditions? Nah. Test restores are definitely required, but it's only a DBA who's been in the hot seat before who can fully appreciate in just how many different areas problems can manifest themselves, and what effect they'll have on performing a recovery in practice.

    Since we have several DBAs here where I work, we try to second guess providence by having one DBA think up the scenario and another to have to recover based on those circumstances. Quite often, we also us a routine upgrade to provide us with the raw material; for instance, decommissioning one physical server by upgrading to a new box gives us a spare system that was, until a few days before, our live production environment. That allows us to break bits and give a DBA practice in recovering a system that very closely resembles what they'd have to work with for real.

    Mind you, it's still no surprise that even practice like this doesn't stop the unexpected happening when the f***-up fairy really does come for a visit.

    Semper in excretia, suus solum profundum variat

  • It did irk me to see backups as a myth because we regularly test our backups by restoring for development and as part of training. That's one point that is often missed though. Even if your backups are intact, do your people know how to run them? We run restores regularly, but take turns doing them so that everyone remembers how they're done.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • We recently had a problem at a prominent client. Many (near) duplicate records were deleted from a critical table. The client IT manager could not use his "agent backup" to restore the backup to a different database on the original server. He had to restore to the same database name on a different server. The backup worked though.

    I have often said that the way to test backups (SQL or otherwise) is to:

    1. Perform the complete back up from A.

    2. Use the back up to restore to B.

    3. Compare B to A.

    In disaster recovery A is gone anyway.

    ATBCharles Kincaid

  • Charles Kincaid (1/9/2008)


    We recently had a problem at a prominent client. Many (near) duplicate records were deleted from a critical table. The client IT manager could not use his "agent backup" to restore the backup to a different database on the original server. He had to restore to the same database name on a different server. The backup worked though.

    I have often said that the way to test backups (SQL or otherwise) is to:

    1. Perform the complete back up from A.

    2. Use the back up to restore to B.

    3. Compare B to A.

    In disaster recovery A is gone anyway.

    So the recommendation is to go from A to B to see.

    :Whistling:

    Sorry, I'll get my coat....

    Semper in excretia, suus solum profundum variat

  • Me, too. I'll grab my resume, err coat and meet you downstairs 🙂

    It pays to test your restores regularly, but I'm not sure I'd test everyone. The SQL backup to disk works so well that I don't worry about the process or the files, but I do periodically check them to be sure they're not getting corrupt or there's another issue with the process.

    It's a good reason not to use sets or back up to the same file every time as well. If one gets corrupted, you could have a whole stream corrupted.

  • The other thing I do is ship the backup files across the internet (encrypted of course) so that they are in another physical location. I backup the whole database to the same file each night (rename the night before's) and then use XDelta http://xdelta.org to find the bit level differences between the backup file last night and tonights, I then send the resulting patch file across the internet (normally 50-150mb for a 20gb database per day) and rebuild the backup file on the other location. Works perfectly... Offsite backups are very important these days unfortunatly. 20gb file processed in less than an hour to create the patch file.

  • Why restrict yourself to the easiest part of the process? Testing the restore of a single user DB, you'ev bypassed all sorts of fun and challenging things that will likely go to hell in a disaster. Like - restoring your AD, then restoring the system databases, then getting all of these databases to run, and run well.

    You need to regularly do a full-fledged DR exercise. Starting with the "the data center blew up" exercise, and going up from there. At a previous employer, the CIO hired a consulting company to come up with 200 or so permutations of various "wrinkles", or rather additional issues we might need to be able to "survive". Things like - corrupted MSDB; missing backup (one or two sets); irretrievable AD restore, so revert to using the AD exports, etc.... or - the mainframe gateway's token ring card is gone, so the connectivity must be reestablished with new protocols over ethernet. In addition to all of the various "this user database's table x has been modified in x way".

    What's more - time yourself. You HAVE to be able to do this fast, and preferrably on unfamiliar hardware, under adverse circumstances.

    restores are important, but assuming conditions will be good when you need them is a surefire way to make sure you HAVE a problem. Try multiple permutations of various problems.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Nice one, Steve.

    The attitude to DR is scary sometime - one manager asked "why do we need all that - it's never failed before?" (grab brick wall, start banging head...).

    Personally I prefer to NEVER restore OVER something - restore next to, check, then replace.

    However this is where I came across a beaut..."you need 2TB disk space to restore the DB? But we never catered for that!".

    So make sure you have a "play area" - doesn't have to be redundant disk, just pretty fast striped e.g. SATA...can even power down when not in use (do our bit for global warming).

    And a last beaut - everything is set up to e-mail on failure (since on success would inundate you) but the system hangs totally or the e-mail fails...just noticed an old client whose backup hung about 2 months back - it didn't fail, so it's been sitting there for 2 months, no one has noticed!

    Only thing I can suggest is the "heartbeat" approach - have an asynch job that tests for e.g. backup files each day.

    Summarise ALL your notifications if possible into a single SUCCESS e-mail...at least then you're not inundated.

    And make your ERROR notifications stand out (from success or other) - then use e.g. Outlook rules to move all ERROR e-mails to a particular sub-folder.

    But nothing beats running through a test restore END TO END - I mean actually overwrite the live DB & all (and see whether your application still works after "oops, DB key changed, data inaccessible - d'oh!"). Sure it's hard - that's why they pay you so much (and if not, well you get what you pay for, right?).


    Regards
    Andy Davies

  • I think what's coming through loud and clear here is that there are plenty of us who've been there before, so have a good idea of the gotchas and of how well (or not) we work under those conditions. We're all well aware the key is planning based on that experience, too.

    The biggest problem, though, is that in some ways we're preaching to the converted. Re-reading my first post in this thread, it was a poor effort in trying to summarise this for any up-and-coming DBAs. This area is the one, if I had to choose just one, that I'd love to be able to distill into a "smart" pill and dispense to any new DB recruit, because it's both the most important lesson to learn and, perhaps, the most difficult. It's not rules-based, it's not absolute and it's not standard across companies. Much of it is hidden, intuitive and not subject to easy explanation.

    Good to see, though, that everyone here sounds like they're thinking rather than just following standard recommendations, and that, I believe, is the key.

    Semper in excretia, suus solum profundum variat

  • You'd like to think we're preaching to the converted, but what I've found over the years is that many people who read stuff here don't respond, especially if they're new.

    As we gain more experience, we sometimes forget that so many others are starting out on their journey each day and need good examples and knowledge about what the rest of us have been threw. We almost need to keep preaching the same ideas over and over so they penetrate a wide audience.

    And we need to say it many ways. It's why I love hearing your collective opinions. Often they state my original idea better than I do!

  • I will start a thread on 'Real world best practice' I will start with all gotcha's I can think of, hopefully everyone else will chip in so we can come up with a single doc/thread, with everything a novice would need to know about backup/restore disaster recovery. Give me a bit of time to start it off.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply