Monitoring is Essential

  • Comments posted to this topic are about the item Monitoring is Essential

  • Monitoring exceptions, key to being exceptional. Flawless logic 😉

  • Idera SQL Diagnostic Manager is one of the best SQL Server monitoring tools available IMHO. 😀

    [/url]

    "Technology is a weird thing. It brings you great gifts with one hand, and it stabs you in the back with the other. ...:-D"

  • The hard part is getting there, having the monitoring setup as you would like. Transforming from add-ons to smoothly integrated. Still playing with various nuts and bolts

  • It is hard. I can't quite decide how much to build and how much to integrate. I do like the "custom metrics" in SQL Monitor[/url], which solve some of the issues I've had in the past.

    I do still wish it were a distributed system.

    Disclosure: I work for Red Gate Software.

  • I tried RedGate's SQL Monitor a couple of times (mainly because of the price) but in the end I had to uniinstall it. It was just hogging all the memory and constantly hanging up all the time. Idera has a much tighter product IMHO, but it does costs more. 😀

    "Technology is a weird thing. It brings you great gifts with one hand, and it stabs you in the back with the other. ...:-D"

  • Considering what has happened with the processor security problems as of late, I strongly suggest that, no matter which tool you use, if you haven't baselined your systems, now would be a good time to do so before you install any patches or other fixes.

    Shifting gears back to the idea of monitoring, I find that a whole lot of people do a whole lot of monitoring and that's a good thing.  It's important to know when and what something changed for the worst or for the better.  But I also find that a whole lot of people aren't getting any of the other benefits from monitoring.

    The first problem is the "Cry Wolf" syndrome.  Some people tend to setup a bazillion alerts with some pretty tight limits and they get just that... a bazillion alerts.  It ends up desensitizing people to the point where when a system alert is because of something really important, it gets missed because people think it's one of those "normal" alerts that everyone had made a rule to auto-delete on arrival.

    The second problem is what I call the "Humpty Dumpty Already Fell" syndrome.  This is where people finally get it through their heads that they really do need to take a baseline to compare their future monitoring to.  The problem is, they never take the time to figure out what the baseline is trying to tell them.  It's like when I first reported to my current job.  They knew that average CPU (16 physical core threaded to 32) was "only" at 40%.  They just didn't take the time to figure out that they had a ton of really crappy code running and that they should have been averaging about 3%.  They only had about 75-100 connections at a time (and 50 or so were from the system itself) and the screens were "only" taking 5 or 10 seconds to paint.  Must be what a good "normal" baseline should be, right? :sick:  "Humpty Dumpty Already Fell" and was in desperate need of repair but no one knew because they didn't take the time to find out what their original baseline was trying to tell them.

    Six years later, the "big" databases have each grown from about 65GB each to more than a TB each, the screen response on the floor is sub-second, and there are usually about 430 connections doing a fair bit more work because of the low screen times.  CPU is averaging between 6 and 8%.  We have many new databases and have added a whole lot of modules to the front end software.  We've at least quadrupled the number of clients we serve.  Things are so good now that we can execute large batch/ETL runs on the same box without anyone on the floor even noticing (they used to cause outages on the floor when I first reported), which also saves a lot of money because we seriously cut down on the number of Windows and SQL Server licenses we needed by consolidating servers.  We DID upgrade hardware back in 2016 to include SSDs but that didn't have much affect on the system either good or bad because the performance lives in the code.

    The bottom line is, monitor all you want... if you're not going to do anything about what the monitoring is trying to tell you, you might as well turn it off so that it doesn't impinge on your already sick system. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply