SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


What Should You Look for in a SQL Monitoring Tool?


What Should You Look for in a SQL Monitoring Tool?

Author
Message
Kathi Kellenberger
Kathi Kellenberger
SSCrazy Eights
SSCrazy Eights (9.6K reputation)SSCrazy Eights (9.6K reputation)SSCrazy Eights (9.6K reputation)SSCrazy Eights (9.6K reputation)SSCrazy Eights (9.6K reputation)SSCrazy Eights (9.6K reputation)SSCrazy Eights (9.6K reputation)SSCrazy Eights (9.6K reputation)

Group: General Forum Members
Points: 9552 Visits: 359
Comments posted to this topic are about the item What Should You Look for in a SQL Monitoring Tool?

Aunt Kathi Data Platform MVP
Author of Expert T-SQL Window Functions
Simple-Talk Editor
Eirikur Eiriksson
Eirikur Eiriksson
SSC Guru
SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)SSC Guru (159K reputation)

Group: General Forum Members
Points: 159153 Visits: 23309
Even more important than the completeness of the monitoring is the impact the monitoring system has on the instances, seen huge difference where some tools induce several percentages in overhead whilst other don't have a noticeable impact.
Cool

Understanding how the tools work is important, things like how the wait stats are collected, how the actual statements are collected etc. can help choosing the right tool.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)

Group: General Forum Members
Points: 870370 Visits: 47424
I've found that a whole lot of people that use monitoring tools use monitoring tools because they don't know how to monitor things themselves. Along with that lack of knowledge, they also have heard that "taking a baseline" is a good thing (and it IS) but the big problem with that is <drum roll please>, they don't actually know what the baseline is telling them and so they also don't know if the baseline is good or not. In other words, their system actually has many performance issues and because they don't know what to look for, they don't know they actually have a baseline that sucks. Wink

One great example of this is what seemed to be a bit of a focus in this article... long running queries. I've found that those aren't usually the real problem. What is a problem is the "death by a thousand cuts" queries that everyone thinks is normal and becomes the baseline and, once that happens (or even before), people just accept the problems as "normal".

Now... don't get me wrong. I think a monitoring tool that's smart enough to auto-magically identify that there's going to be a CPU intensive job (for example) that runs every day from 2 to 2:30PM and what the expected profile of CPU usage for that job is so that you don't receive alerts about "expected" CPU usage would be great but I've found that most don't actually do that without having to use the ol' Mark I Mod I eyeball to verify such a thing. There's also the idea that most of them don't actually lead you to the real performance issues, which is buried in the "Death by a thousand cuts" code that's continuously running all day every day.

As for waits... IMHO, waits is a symptom and not a cause. That makes them pretty useless for finding the cause and they won't actually lead you to the real cause of all the little stuff that's the actual cause. It's like a tolerance build up in physical hardware assemblies... it might be ok for a part to be off by 1/10000th of an inch but, if you have 10,000 such parts all stacked in a series, you suddenly have a inch error. Code is very similar but much worse because instead of just 10,000 parts stacked up, you may have millions stacked up per hour. Because people tend to look at code and say things like, well that runs from the GUI and it only takes 100ms to run, that shouldn't be a problem without realizing that it runs (as in a real case for me)) 10's of millions of times per hour because someone made a serious mistake in the GUI.

If you want to write some good monitoring software, then it needs to find such "Death by a thousand cuts" and report on them and, again, these are NOT the longest running queries. Usually, they're the shortest.

Of course, including that in monitoring software might also be a total waste of time because you actually have to have someone that will do something about such "cuts" and they need to be in a shop where the culture will realize the values (there re many) of fixing such things. It would also be cool if the reporting were a bit analytical for the proverbial biggest bang for the buck. Instead of simply reporting on the top 100 worst queries for each server across hundreds of servers (who in the hell would read such a report every morning?), it would be really cool if it said "here are the top 10 worst queries in the enterprise for each four hours of the day over a week and the first two live on 75% of your servers" (for example).

Shifting gears a bit, I agree that monitoring is a form of measuring something and any scientist will tell you that if you measure something, you've changed it and you must be careful to change it as little as possible. With that, I agree with Eirikur... I've seen monitoring systems interfere not only with performance but interfere with things like backups and replication. Then there are those (sometimes the same "those") that feel it necessary to set trace flags to enable such monitoring at the beginning of a sample and then to turn it off after the sample is taken. All of that junk gets recorded in the default trace, special traces, and in the SQL Server logs and, sometimes, in the Windows logs. I don't need that junk in my logs when I'm trying to troubleshoot and sure as hell don't need it filling up and causing the default trace to cycle so often.

To summarize my take on a lot of monitoring software I've seen, I don't want to install software written by amateurs for amateurs.. Wink Give me something useful, trainable, and maybe even self trained with overrides allowed. I don't need yet another "tool" that was built by Murphy's half brother, Sum Ting Wong. BigGrin

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)SSC Guru (870K reputation)

Group: General Forum Members
Points: 870370 Visits: 47424
p.s. While performance may not be THE most important thing, I agree that you did touch on it's importance, I'd like to stress that it's very important and runs a very, very close second only to accuracy.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search