Great topic. Something I give a lot of thought to, and always looking for new ideas. In many ways, I think my responsibilities consist in large part of the following:
1. Prevent bad stuff from happening (Protect)
2. If it happens, become aware of it quickly (Detect)
3. Resolve the issues accurately and efficiently (React)
My company has a pretty diverse environment (SQL versions between 6.5 and 2005, multiple domains, both physical and virtual servers). That, in addition to having a very limited budget for DBA tools, has forced me to use a variety of methods. Three main ones come to mind, in decreasing order of automation:
1. SQL Agent jobs for emailing me about really bad conditions (ex: SQL errors with a high severity, xaction log about to run out of space) in real time. I have a job that runs early in the morning to do nothing but send me an email, to verify that both SQL Agent and our email infrastucture are functioning properly.
2. Custom SQL scripts using Red Gate's SQL Multi-Script tool to catch conditions that while important, can usually wait a few hours (ex: disk space getting low, missing backups, SQL Error log entries that the SQL Agent jobs don't report on). I run these scripts a few times throughout the day. Doesn't take too much time...
3. Server-side traces to catch things like unauthorized use of the sa account, poor-performing queries.
It would be great to be notified about everything automatically when it happens. That said, I still think there's many instances where a bit of at least somewhat-manual checks are warranted. The manual checks can act as an audit of the automated checks.