• If you only check the monitor data when something goes wrong, you're reacting to an issue, not being proactive in managing your servers.

    No, you won't catch every instance, but if you keep an eye on the monitors then you've a chance of catching something that's building towards becoming a serious, if not critical problem before it's gotten that far. Alerts are great for when there is an exception: When something unexpected happened and your server fails or is close to failure.

    As to internal tools v third party: I'd always go with Hybrid. Belt + Bracers + Gaffa Tape: You know your systems and know what normal behaviour is while third party tools can't always be configured to account for peculiarities in your servers or cover all the things you would prefer to monitor. So use third party tools to reduce your workload, but be prepared to add additional scripts, monitors and in house tools to cover the local peculiarities of your systems.

    That's from my experience at least.