• I worked at a large institution a few years back that was very good about testing their DR systems.  Every 6 months all of the application owners would have a disaster drill in which all of their data would be recovered into a back up site.  Unfortunately, those who planned the tests were a little more forward thinking than those who had originally planned the backup sites.

    After one drill, it was noted in the lessons learned that the backup site was only 1 mile from the primary data center.  A rogue backhoe cutting fiber would take them both out. 

    Enough application owners raised that concern that the backup center was then moved to an already owned site in a major coastal city in Florida.  Again, everything went well in the drill until it was pointed out that the disaster recover site was only 3 feet above the water table and the primary site, while not on the coast, was in a hurricane prone state.  

    The response this time was to build an entirely new recovery data center with all of the geographic concerns addressed.  The new data center was built away from coasts, well above water tables and even hardened for tornados.  All new equipment was ordered.  State of the art generators were installed.  Satellite were facilities brought online.  A database tape rotation was put into place so that one set went to the DR site and another to Iron Mountain.  We could come up in a day (that was considered acceptable at the time) if needed.

    The 6 month testing window came just as the new facility was brought online.  All of the application owners were lined up for a full week of recovery testing until the very first tape was inserted.  With all of the new equipment, no one had noticed that all of the newly installed tape machines were incompatible with the tapes that the primary site was using.  The new facility and all of the shiny new boxes suddenly became useless for DR due to one missed spec.

     In the past, I have told this story as a humorous example of poor DR planning.  But I also see in it some extraordinary planning.   Having owners buy into and dedicate time to test every 6 months is something I have not seen in other places.  Usually the owners have been OK with "If the DBA says we are good then we are good."  So many interdependencies were discovered that no single person knew about during those tests.  In addition, actually doing it live was the only way to really discover if the order of recovery you THINK you need to do things is really the way it should work.  And finally, I'm not sure the geographic concerns would have been addressed had the application owners not been invested in the DR planning and had the company heads not been serious enough to relocate and build facilities when those concerns were raised.