The Devil is in the Monitoring Details

,

Monitoring a database server is something that many of us know is important, but we often take the process for granted. Whether we've purchased a tool, like SQL Monitor, or we've built our own system, we often set up a watcher for our systems and rarely view the details unless something goes wrong. I'm not sure that's the wrong approach as part of the reason monitoring is set up is to allow data capture in the background and remove one more task from our daily workload.

Monitoring isn't necessarily simple, however, and while I still debate the best way to do this in many organizations, I realize that monitoring isn't necessarily something I want to build in house. There is enough work to just work with the data that other systems might output that I really want some other software in place that is built to perform monitoring for specific technologies. In reading about the complexity for the Stack Overflow monitoring systems, I realize that this can become very complex for a "set it up and let it run in the background" configuration.

The team at Stack Overflow built their own system for monitoring various systems, including SQL Server, but I think part of the mission of Stack Overflow was to build a system from scratch, which isn't the job for most of us. Plenty of us have other tasks to deal with as a part of our job, and software development for monitoring or alerting or some other administrative task isn't one of those jobs. I know I wouldn't want to stop and think about data management and gathering, and more as a software process. If I'm a DBA, I want to just get the data and use it to ensure systems are running well.

Monitoring can be a way for us to proactively look for developing issues and mitigate them before clients know there is a problem. It's important that a system is in place and handling data. It's even more important that there is some alerting application in place as well to ensure that when something does start to go wrong, the DBAs are alerted early enough to prevent widespread problems.

If you read about all the thought and details of the Stack Overflow system, you quickly realize that there is a lot to consider when setting up the monitoring for your systems. I'd encourage you to think about what is important and ensure that you've got some way to gather and analyze that data. When something goes wrong, and something will go wrong, you'll appreciate the time spent on the details of the monitoring system.

Rate

Share

Share

Rate