Blog Post

Monitoring and Alerting

,

I responded in an interesting thread this morning where someone was looking for advice on what to monitor on their SQL Servers. As expected, I can't find the thread right now and I'm rather busy, but suffice it to say someone was wondering what to monitor on their SQL Servers using some third party tool.

This was interesting because I had actually seen a bunch of Kevin Kline's presentation on benchmarking at the recent PASS 2007 Summit. I like Kevin and he's got great information, but I was surprised at how many counters and areas he was looking at benchmarking. If you have a few critical applications I guess it makes sense to gather that stuff and I'm sure it's invaluable when troubleshooting, but I've rarely had a tool to handle that many counters. Especially when I deal with so many servers. If I was a 1-2 server guy, which I guess I am now, I'd look at a lot of counters and when a user complained, I could check all kinds of stuff.

But in larger environments where I've worked, with 10-100 servers, it's too much to track. And it's too much to look at. Personally I like the rough thumbnail approach. I tend to look at the rough performance of a server. CPU, memory reads, disk I/O, users, and transactions. Those items are a good gauge of how your server is doing overall.

However the numbers are meaningless in a real-time environment. If I tell you your server is running at 80% CPU or at 500 memory reads/sec, is that bad?

You can't tell me. If you average 85% CPU and 400 reads/sec, then that's pretty good. If you average 30% CPU, that's bad. So benchmarking is really important to understanding where you might be slow when someone reports an issue.

But do you need them to report an issue? Can't you just set an alert for 85% CPU and check the server when it comes? NO!

At least I won't. There are so many variables the server responds to and ends up having it's performance jump all over. Ever watched your counters when a log backup or db backup runs? They might spike all over and you don't want to get alerts everytime that happens. You can ignore them, but then you might tend to ignore something that's really wrong as well. So I've avoided setting alerts except for when we have real problems and I'm monitoring on a regular basis. They don't last long, usually just days.

This is one area that Microsoft would do well to build into the product. Or make it a separate product that's REASONABLY priced. Note: Most of the products, Patrol, Unicenter, etc. that do this cost way too much.

Instead, give me a simple traceing setup that will capture my data and let me graph it. The Health and History tool was a good start, but it was half-baked. When the timer pops out, send it along.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating