• I know it makes the DBA look bad, but yeah we have a lot of unplanned downtimes of production systems. Bear in mind that we have over 1600 databases on ~140 different servers and 2 DBAs, so statistically we do ok with uptime.

    Number one cause. Too many people with admin rights and not enough communication. It is a cultural thing here I inherited, and I can't change. As a result we do a lot of fire fighting.

    Number two cause, budget constraints. Customers want High Availability for everything, but can't pay for it. Good example here is SAN failures, I won't name specific brand, but hey they were cheap for a reason.

    Other unplanned downtimes included:

    - Network switch failures

    - Antivirus software mis-configured killed clusters

    - Autopatch turned on accidently

    - AD management failures (helpdesk had power to reset service account passwords and used said power)

    - Rarely, the occasional CPU, Mainboard, memory, or other physical hardware failures.