Redundant Redundancy

  • Comments posted to this topic are about the item Redundant Redundancy

  • I have done stuff like this for years.

    I've built up a long set of SQL Agent jobs that monitor backups, disk space and similar things. It even sets up the SQL Server Mail functions and just has to be modified for things like the email accounts and drive destinations. But it is fairly generic.

    As a script when I setup a new SQL Server, I just pull out the scripts and run them.

    I also can had it over to junior techs to run once it is configured for the environment. I consider it part of my toolkit.

    The other part of it is to know when you have too many that it gets ignored. I do it that I get an e-mail everyday for full backups regardless of success or failure. Using Outlook rules I then filter them into a s sub email box unread. I then just do a search in the box for "failure" and if any come up I investigate why. That is a daily job.

    But I setup tran log backups to only notify on failure. So if I see a message come into the box at 2P I know I need to investigate it. If I was getting them for each one, then I would probably ignore it because it is now "noise" and not something to be concerned with.

    You need that balance.



    ----------------
    Jim P.

    A little bit of this and a little byte of that can cause bloatware.

  • Oooooo... very nice article, Rodney. It really hit a sweet spot with me because I'm taking care of a disk space problem tonight during the "maintenance period" and I'm doing so without it being a panic.

    I wrote a stored proc (that calls PowerShell for its WMI capabilities) that not only alerts me to such problems via email, but it also captures daily disk usage (size, freespace, "IsDirty", etc, etc) so that I can forcast when we'll need to buy more disk space under normal growth. It even checks CDROMs because it's a real bugger when one of the folks in NetOps forgets which server they may have left a CD or DVD in. They can just check the morning report for non-empty volume names (hi-lited in Yellow in the report) where the disk type is CD/DVD type of drive.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No matter how well you automate things there comes a time when SQL Agent gets stuck and just hangs and no jobs would start. This happened to me fairly recently. So far I haven't found a really clever solution to this.

  • Hey there, what I have developed is my own SCOM data warehouse which I have then out into Analysis Servies. This has enabled me to extend the wealth of information from SCOM.

    What I also have create is a disks running out of space report. It uses the linear regression function with MDX to try and calculate based on the past 7 days of disk usage potentially how long it will take before it runs out of space.

    I have then put this into a simple SSRS report with some Kpis to make it easy to read. Also there is a drill down chart showing the past 2 weeks worth of disk usage as well as the entire history of the disk, which often can show you a pattern or trend.

    It has helped us countless times to avoid having similar issues of servers running out of disk space.

    Just another means of ensuring that the disk space doesn't get critically low.

    You can download and try it out for yourself here: http://gqbi.wordpress.com/2013/11/14/scom-systems-center-operations-manager-cube-and-data-warehouse/

    Any questions please let me know.

  • Jim P. (1/18/2014)


    ...The other part of it is to know when you have too many that it gets ignored...

    This is key. The same is true with the communications from my bank. Clearly they have understood what all of us probably know: the response to communication overload is inaction i.e. send too many notifications and none of them will be dealt with.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • The trick is isolating the critical warnings. Email for everything simply means that critical stuff will get lost. Getting the right balance, getting bothered only when necessary, is the tricky part.

    I had a similar experience with my bank. I was driving to work, got a phone call, threw it on speaker phone and heard a recorded voice talking about account security but with no mention of bank name or my account. So assuming it was spam, I dumped the phone call.

    Later that day I found that the bank had determined my account number had be compromised somewhere and was cancelling it for reissue.

    ...

    -- FORTRAN manual for Xerox Computers --

  • Gary Varga (1/20/2014)


    Jim P. (1/18/2014)


    ...The other part of it is to know when you have too many that it gets ignored...

    This is key. The same is true with the communications from my bank. Clearly they have understood what all of us probably know: the response to communication overload is inaction i.e. send too many notifications and none of them will be dealt with.

    Very true

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Nice article Rodney.

    I have built redundant alerts in the past and had gotten away from it. But, it makes a lot of sense. Especially considering the high frequency of failures I have seen with some monitoring tools.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • I just got caught by this yesterday. The replication monitor said everything was fine, everything passed a validation test, then a user phoned me about a missing transaction. Sure enough it was missing, so I've spent my morning so far coding other alerting. I should know better than trust replication. I find it very untrustworthy when it comes to reporting errors.

  • I use powershell to check space each day and send out a report that shows any drive with less than 10% space remaining. No forecast and nothing more complicated, just a quick report - has headed off a problem a few times. I read it almost every day. If I miss it there are still the final alerts that warn of impending doom. It is all too easy though to wind up with reports on top of reports for things that may or may not matter as much as they did back then. I've consolidated checks on a lot of small key things that I run once a day - then I have one report where I can add/remove things easily.

    Good editorial Rodney!

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply