Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

A Patch Disaster Expand / Collapse
Author
Message
Posted Thursday, January 10, 2013 5:13 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Today @ 5:30 PM
Points: 31,374, Visits: 15,843
It wasn't the patch, it was the delivery. However, that's exactly what I might be concerned about over time. Someone makes a mistake in delivery, which ends up causing issues with systems.






Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1405695
Posted Friday, January 11, 2013 8:23 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Tuesday, December 23, 2014 12:37 PM
Points: 893, Visits: 2,478
Perhaps most importantly:

The first, easy part:
If we assume that somehow a server is rendered unstable, excessively slow, incorrect, invalid, unusable or inoperable, what's the plan... and do you ever test it?

The second, hard part:
Same as the above... but on many or all servers.

At many companies, even very large ones, the response goes much like this between a Hardware/OS/low level team and an application/database team:
Hardware: "We've got backups."
App: "So it'll be just like it was before X happened?"
Hardware: "Of course not - we only back up the data/your SQL Server .bak files!"
App: "Oh. So what now?"
Hardware: "We've installed the operating system on [the old | some new] hardware."
App: "So now we have to reinstall our application? From scratch? We haven't done that in Y years! And those people aren't in our team anymore!"
Hardware: "If the OS or hardware has issues, call us."
App: "What were all the settings we had?"
...
App: "It's up!"
User: "Z feature is broken!"
App: "Oh... there was an exception we had to do Q for."
GOTO User


And this takes awhile, but it's the normal response to one server failing. If a mass update causes multiple servers to fail at once, this becomes a real nightmare... especially if the backup servers are also affected.

Bare metal restore capable drive images are a very good solution to this.... but almost no-one actually does them of servers, and they do take a lot of space, more if you have encrypted or otherwise incompressible data.

Whatever you're "it caught on fire" plan is, if you don't try it, and plan for doing it on many machines at once, there's a lot of room for a nasty surprise.


Post #1406072
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse