• Nice one, Steve.

    The attitude to DR is scary sometime - one manager asked "why do we need all that - it's never failed before?" (grab brick wall, start banging head...).

    Personally I prefer to NEVER restore OVER something - restore next to, check, then replace.

    However this is where I came across a beaut..."you need 2TB disk space to restore the DB? But we never catered for that!".

    So make sure you have a "play area" - doesn't have to be redundant disk, just pretty fast striped e.g. SATA...can even power down when not in use (do our bit for global warming).

    And a last beaut - everything is set up to e-mail on failure (since on success would inundate you) but the system hangs totally or the e-mail fails...just noticed an old client whose backup hung about 2 months back - it didn't fail, so it's been sitting there for 2 months, no one has noticed!

    Only thing I can suggest is the "heartbeat" approach - have an asynch job that tests for e.g. backup files each day.

    Summarise ALL your notifications if possible into a single SUCCESS e-mail...at least then you're not inundated.

    And make your ERROR notifications stand out (from success or other) - then use e.g. Outlook rules to move all ERROR e-mails to a particular sub-folder.

    But nothing beats running through a test restore END TO END - I mean actually overwrite the live DB & all (and see whether your application still works after "oops, DB key changed, data inaccessible - d'oh!"). Sure it's hard - that's why they pay you so much (and if not, well you get what you pay for, right?).


    Regards
    Andy Davies