I saw a post recently asking how to build a daily report for each instance that tracks the metrics deemed important. My response was this:
Seems a little silly, but this was the fourth of fifth evolution of a monitoring system, and I even had Patrol and Unicenter on top of those at times. However this worked better.
1. Each server captures it’s own metrics. Critical in the case of communication errors. I’ve had servers disconnected (not down, but off the network) for ten minutes while someone replaced a cable. And that 10 minute window is ALWAYS when I’m building a report or looking for metrics.
2. Customizable. I can write separate scripts for anything I need, and even update them over time. For example, one thing I used to do was capture sp_configure information. I didn’t want it on a report unless something changed. So I stored the data every day, loaded a new copy the next day, compared them for changes, and then wrote out to my report table anything that was different.
3. Separate schedules for metrics. I might want CPU every 5 minutes, but sp_configure every day. I can easily have separate schedules.
4. I get one report a day, and I stored off the list of servers in this central instance. If I didn’t get a report from a server, I’d go looking for it. New ones were reported to me and people soon realized that if they let me know of their new instances they didn’t ever have to worry about them.
5. I have the raw data and the report stored off. This satisfied my ISO 9001 requirements, which was great for me.
I’m not knocking the purchased monitoring solutions, and they work great for having a central team monitor things, but they’ve always required some customization to work for me, and in the same amount of time I could build my own reporting system easily. A few scripts covered all sorts of things that often weren’t built into large systems, or cost extra.