How fast can I know that the server is going down or is down

  • I've been trying to figure out how to get the fastest alert of if a server is going down, is down, or was rebooted.

    1. I can tell if a server has been rebooted and can write scripts to check the log file maybe daily.

    2. But can I get the server to send out a message that a reboot has been issued?

    3. I've also seen where I can set up some kind of alert notification that the system has been up for a short while, possibly indicating that is just coming back online.

    4. Lastly what if the server goes down and doesn't come back online, my thought is only a scheduled external monitor would be able to alert to that condition.

    Does anyone have any best practices? I'm thinking of setting up 1 and 3 for my small environment.

    Any tips/advice would be appreciated.

    thanks

  • Best practices? It depends. Perhaps use a 3rd party software like centreon/nagios, or RedGate SQL Monitor, or Idera free admin tool (all would work).

    For options 1 and 3 you could use a start up procedure that sends you an email once the service has been restarted - http://technet.microsoft.com/en-us/library/ms191129(v=sql.100).aspx

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • thanks. I just needed some good ideas to get my thoughts flowing.

  • You could put in a start up procedure that sends you an email each time the server comes online. That wouldn't tell you it was going down, but it would be self-contained. The real issue is that you can't simply have the server tell you it's going offline. Instead, you need another server to watch the first one (who watches the watchmen type deal). You can build this out yourself any number of ways. I liked using SQL Agent to set up a regular job that queries the server. If it fails, you know it's offline. That's one way to do it.

    But, as was already said, building your own monitoring suite is a lot of work. Better to buy one. There are so many out there built out so much better than what you'll be able to put together that it just makes sense.

    Fair warning, I work for a vendor, but I won't bring up the product.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • @Grant, yeah I had already thrown the bone in that direction 🙂

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • thanks 2lwe actually use redgate backup pro and love it. the only thing about vendor software is I need to figure what I can do natively to understand truly how it works and my limitations before I can sell it to my bosses.

  • @mydog... yeah thanks. i hit up the msdn link and was able to put that into place for our servers for a temporary start. i'm going to research some monitoring apps and maybe scripts this week to try and get our environment more tied down. Today i'm going to put in some critical alerts, so any tips on those would be nice.

  • To start, I'd recommend setting up default alerts for:

    - Error Number 823 - IO/Hardware/System Issue detected

    - Error Number 824 - IO/Logical Consistency Check Failed

    - Error Number 825 - Read/Retry Warning

    - Severity 016 - Miscellaneous User Error

    - Severity 017 - Insufficient Resources

    - Severity 018 - Nonfatal Internal Error

    - Severity 019 - Fatal Error in Resource

    - Severity 020 - Fatal Error in Current Process

    - Severity 021 - Fatal Error in Database Process

    - Severity 022 - Fatal Error: Table Integrity Suspected

    - Severity 023 - Fatal Error: Database Integrity Suspected

    - Severity 024 - Fatal Hardware Error Raised

    - Severity 025 - Fatal Error

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • The SQL service logs when it is being shut down to the Windows event log. Short of it crashing down, you can use WMI instrumentation to have Windows notify you when those notices pop in the log.

    System center and third party tools like BlueStripe (I believe that's the name) can also monitor the SQL browser services in your network for a list of "available" servers against the expected list. When a server is shut off, they will disappear from the list and the tool can notify.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks again for the helpful information. I was wondering if there is anything in the SQL Log that I should be monitoring with sp_readerrorlog that is outside of those alerts. I looked through our logs, and I didn't see anything outside of 'our' norm.. lots of I/O taking a long time to complete, but no one who can fix that is fixing it (i've sent tons of i/o metrics over the last year and an alert on that would just become daily/hourly noise).

    I'm wondering if the sp_readerrorlog would be archaic to do, and I should just find a 3rd party tool that encapsulates all of that. BTW, this is on a DW, so most issues are low priority since we can recreate our DW to the previous day, versus OLTP.

  • If you really want to do it all manually, I'd look up one of the blocking scripts that are posted online here. Get a newer version for 2008 , not one of the old 2000 ones that are still hanging around.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply