RE: Downtime – SQLServerCentral

SSC Rookie

Points: 34

February 24, 2009 at 10:24 am

I know it makes the DBA look bad, but yeah we have a lot of unplanned downtimes of production systems. Bear in mind that we have over 1600 databases on ~140 different servers and 2 DBAs, so statistically we do ok with uptime.

Number one cause. Too many people with admin rights and not enough communication. It is a cultural thing here I inherited, and I can't change. As a result we do a lot of fire fighting.

Number two cause, budget constraints. Customers want High Availability for everything, but can't pay for it. Good example here is SAN failures, I won't name specific brand, but hey they were cheap for a reason.

Other unplanned downtimes included:

- Network switch failures

- Antivirus software mis-configured killed clusters

- Autopatch turned on accidently

- AD management failures (helpdesk had power to reset service account passwords and used said power)

- Rarely, the occasional CPU, Mainboard, memory, or other physical hardware failures.