SQL Server Instance Failure - Monitoring

  • Hi

    We've had an issue this morning (and still ongoing) that our main SQL Server has failed.

    However, we at the beginning there was no indication that the failure was in SQL.

    Eventually when digging deeper, we noticed that our two instances on our main VM for SQL, both SQL instances were in a stopped state. We re-started them and it appears to be okay, until we then refreshed Configuration Manager, to see in fact that they hadn't continued to run.

    Is there any method anybody is aware of, some sort of notification when a SQL Server instance fails to start?

    We're still investigating why it has failed (in short 5 VM's that migrated on a host failure went to an OS recovery state) but just posting a 'thought for the future'....

    Thanks all!

    Den

  • A few thoughts.....You'd want to use some monitoring outside of SQL Server to see if it's running. What do you use to monitor the VM host since it appears that is what started the issue? You can use a scheduled task and batch file to check SQL Server or any service with something like: sc query <YourServiceName>

    If a service fails to start, there are recovery options properties for services. In the services applet, open the service and click on the Recovery tab. You can run a program in response to a failure to start.

    When the issue is with the VM host then you would probably want to worry more about that VM being that it hosts at least a couple of SQL Servers. Restarting SQL Server in that scenario may not work if the issue is the host. Notifying a DBA makes sense if they are also responsible for the VM admin side of things.

     

    Sue

  • Thanks Sue

    Yeah, it's just really a heads-up we needed, i'll try the sc query one.

    Many thanks!

  • Powershell scripting could be the answer here.

    You can use : <Variable> = Get-Service -Name <service name>

    Then : <Variable>.Status

    This will tell you if the service is running or not. You can also script starting the service and include a delay within a loop to check it's still running. Just google 'powershell check if service is running on remote computer' or 'powershell check if service is running' for examples.

    Obvious caveat is check permissions to control remote script executions and allocate appropriately - sucking eggs, I know, but sometimes people forget the simple things o:)

  • Agree with Steve, PS can help with this sort of thing but then you have to schedule and run it and what happens if that fails.

    You could also look in scheduled tasks and see if there are any handy templates.

    If there is budget you should look at monitoring tools, I know Red Gate does one but I/we use Idera Diagnostic Manager. Its great at warning you when things are starting to look iffy and obs go wrong. Takes some time to tune but doesn't everything?

    Some ideas on the server issue itself:

    Check that the system databases are ok. We are in Azure and found that we had temporary (destructive) drives where the data is wiped on restart. It was an issue because we put our tempdb there. Not an issue for tempdb right? Wrong, although the db and logs can get wiped each time the system restarts the directories containing them cant. Cause when SQL restarts the location for tempdb doesn't exist anymore.

    Check it, easily overlooked.

    Adam Zacks-------------------------------------------Be Nice, Or Leave

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply