I look after several clustered instances of SQL SERVER. One of the tasks that I do every morning is to check if any of those instances have failed over to a new node. I would prefer to have this monitored automatically, and that’s what this post is about – getting feedback if possible.
My manual method of checking for failover simply involves running a query to see which node the cluster currently resides on, like so,
As well as pulling the uptime of the SQL SERVER from a DMV.
The query will basically let you know what the machine name is of the node that is currently hosting the clustered instance. It you run it on a just a single server instance then NULL is returned.
I use a multi-server query, and basically run this query across all of my clustered instances.
Now, because I only have a small number of clusters to monitor – six – I can immediately see if any of them are on a different node than they should be. The solution is OK, but there’s a number of problems with it
- It’s manual and rely’s on somebody (me) running the script.
- The person running the script has to have knowledge of the clusters – i.e which node are they expecting to see them on (or at least some sort of reference document handy)
Now, a cluster failover is rare for me – and if there is one then it’s usually because of a non SQL SERVER related problem – such as a disk issue.
I am currently investigating a few options for getting notified of a cluster failover, these include,
- Periodically polling all the cluster instances and comparing the current node with the preferred node. This can be done via linked servers or with a custom .NET app.
- Using another monitoring tool such as SCOM.
- Implementing an SQL startup procedure to notify me when the instance restarts.
However, recently, I have been thinking about a solution involving extended events.
I am wondering if it is possible to catch a “shut down” event and send a notification. Assuming that this was possible then if the SQL SERVER was shut down cleanly then I guess the notification would be sent.
However, I guess that a sudden shut down of the machine would cause the notification not to be sent.
On the reverse side of this, I could possibly catch a “startup” event. However, this would be of no use if the SQL SERVER – on the other node – did not re-start at all.
So, I’d be interested in any comments / advice on how other people handle the cluster failover notification issue please.
You can leave a comment here if you like – then everybody will get to see it – or if you like drop me an email through martin at martinz.co.nz.