Monitor All the Changes

  • Comments posted to this topic are about the item Monitor All the Changes

  • As an example of monitoring that doesn't quite help but does, I get alerts from the SCCM install when one of my servers loses communication with SCCM. But, in terms of usefulness, these tend to be low as either they're extremely brief blips (and normal network traffic isn't affected, based on the lack of notice from users,) or they're for servers which aren't in my purview...

    One thing it seems with any sort of automated monitoring system is, it's only going to be as "intelligent" as the people who created it and the people who craft the alerts. A system which allows for the user(s) to create custom alerts (such as from SQL queries or Powershell or the Linux equivalent) will (I would think) in the long run tend to have less "junk" alerts.

    But, even then it will take time to set up the alerts to work the way that the receivers feel is useful, which could potentially take long enough that they get them working their way, then move on to another position, leaving the next person to figure out how *they* want it...

    People will likely comment about "AI / smart" systems that will only alert on truly critical events, while relegating "notices" to a lower priority (possibly just logging,) but, again, such a system will only be as good as the people / person who wrote it made it.

    Truthfully, I don't think it will ever get to a "only give me the really critical stuff" level completely, there will always be some level of cruft included, if only because people are fallible, which means the software they create for this is fallible.

    But, some level of cruft is acceptable, you just need to decide for yourself where that cut-off point is...

  • It reminds me of years ago when we would put warning messages in the code when someone was about to do something critical. "Are you sure?" type stuff. Soon it just became another button to press and no one cared, until they didn't really want to do it. Then it was "Well why didn't you warn me?" -- "But we did. You just ignored the warning."

  • It seems that generally we need to consider warnings to be important enough to warn the user immediately, specific performance measurement important enough to be monitored or events important enough to log.

    Sounds familiar.

    It is essential to use the most appropriate information flow mechanism for each type of occurence.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • It's a little subjective and can also be dependant upon the individual e.g. the big boss has to have their updated dashboard available at a specific time, every day.

    Coming at this from an ETL developers viewpoint

    It sounds obvious but I look at all the important stuff (to me) such as maintenance plans, SSIS data builds (critical parts) etc. and ensure they are in place and working

    Then assess all the other bits such as data quality (is accuracy critical or can you let it through and log)

    Put a price against each element and let the business (or owner) decide whether it should be done, intervening occasionally when you feel that their priorities are unachievable

    Then the auditors get involved and ask for ludicrous checks to be put in place ...

    - Damian

  • Iwas Bornready (11/22/2016)


    It reminds me of years ago when we would put warning messages in the code when someone was about to do something critical. "Are you sure?" type stuff. Soon it just became another button to press and no one cared, until they didn't really want to do it. Then it was "Well why didn't you warn me?" -- "But we did. You just ignored the warning."

    Been there...

    Some years ago, I worked with an application that allowed you to set up function keys.

    This, effectively reproduced a number of key strokes at the press of a button.

    Quite often people used this to auto-key through these questions.

    It was audited (then ignored!) but did highlight that app builders need to understand what is a useful warning and what is just there because someone thought it might be useful

    - Damian

  • How you handle alerts really needs to reflect the volume of those that are generated.

    For low volumes the success mails have value (or rather the absence of them does). When they are not received this is indicative of an incident - either the task generating it has failed, is overrunning or the alerting system has failed. The first of those should generally be addressed by the presence of a failure alert but if either of the latter two conditions apply you won't get the failure alert either.

    For higher volumes this becomes impractical due to information overload but you still want to identify overrunning tasks and failures of the alerting system (and of any failed tasks hidden due to this). One option to handle this is to create a dashboard showing 3 states for all tasks - group them by priority or any other criteria appropriate and limit the actual alerts to only the critical tasks. Its a lot easier to scan down a page for any red (failed) or amber (not succeeded) than to hunt through several dozen or hundred email alerts for the few that need to be addressed immediately.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply