Considering what has happened with the processor security problems as of late, I strongly suggest that, no matter which tool you use, if you haven't baselined your systems, now would be a good time to do so before you install any patches or other fixes.
Shifting gears back to the idea of monitoring, I find that a whole lot of people do a whole lot of monitoring and that's a good thing. It's important to know when and what something changed for the worst or for the better. But I also find that a whole lot of people aren't getting any of the other benefits from monitoring.
The first problem is the "Cry Wolf" syndrome. Some people tend to setup a bazillion alerts with some pretty tight limits and they get just that... a bazillion alerts. It ends up desensitizing people to the point where when a system alert is because of something really important, it gets missed because people think it's one of those "normal" alerts that everyone had made a rule to auto-delete on arrival.
The second problem is what I call the "Humpty Dumpty Already Fell" syndrome. This is where people finally get it through their heads that they really do need to take a baseline to compare their future monitoring to. The problem is, they never take the time to figure out what the baseline is trying to tell them. It's like when I first reported to my current job. They knew that average CPU (16 physical core threaded to 32) was "only" at 40%. They just didn't take the time to figure out that they had a ton of really crappy code running and that they should have been averaging about 3%. They only had about 75-100 connections at a time (and 50 or so were from the system itself) and the screens were "only" taking 5 or 10 seconds to paint. Must be what a good "normal" baseline should be, right? "Humpty Dumpty Already Fell" and was in desperate need of repair but no one knew because they didn't take the time to find out what their original baseline was trying to tell them.
Six years later, the "big" databases have each grown from about 65GB each to more than a TB each, the screen response on the floor is sub-second, and there are usually about 430 connections doing a fair bit more work because of the low screen times. CPU is averaging between 6 and 8%. We have many new databases and have added a whole lot of modules to the front end software. We've at least quadrupled the number of clients we serve. Things are so good now that we can execute large batch/ETL runs on the same box without anyone on the floor even noticing (they used to cause outages on the floor when I first reported), which also saves a lot of money because we seriously cut down on the number of Windows and SQL Server licenses we needed by consolidating servers. We DID upgrade hardware back in 2016 to include SSDs but that didn't have much affect on the system either good or bad because the performance lives in the code.
The bottom line is, monitor all you want... if you're not going to do anything about what the monitoring is trying to tell you, you might as well turn it off so that it doesn't impinge on your already sick system.
is pronounced ree-bar and is a Modenism for R
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair
How to post code problemsHow to post performance problemsForum FAQs