SQL Server Restart or Cluster Failover Notification

,

Since I manage a lot of SQL instances (both stand-alone and clusters) I came up with a way to get a notification that a restart or failover may have occurred.  This notification was created in self-defense because often times no one from the system administrator’s team will tell the DBA when work is being done or if there are issues that may have resulted in a restart.  Many times I have logged onto a SQL cluster only to find that it failed over and have no explanation as to why.  It is okay for clusters to failover, after all that is what we build them for, but it would be good to know why in case there is some problem that should be fixed.  The same has happened for the stand-alone instances that I manage…I want to know when unexpected restarts happen so that I can prevent them from happening if possible.

Stand-alone instance restart notification

For a stand-alone instance we can use the sys.dm_server_services dynamic management view (available in SQL 2008 R2 and above) to get some information on the last time that a service was started up.

SELECT servicename, startup_type_desc, status_desc, last_startup_time
 FROM sys.dm_server_services

See https://msdn.microsoft.com/en-us/library/hh204542.aspx for more info on sys.dm_server_services

The restart or failover notification works by scheduling a job to run only when the SQL Agent starts that uses the code below.  I happen to like using HTML formatted e-mails to present results from queries.  I never know when I might have to forward that e-mail to someone at one of the companies that I work for so I have learned to make sure that results are easy to read.

-- create job that only runs when the SQL Agent starts that uses the code below.
-- replace all <<instancename>> & <<profile>> placeholders with correct values
SET NOCOUNT ON
DECLARE @body1 NVARCHAR(MAX)
-- instructions
SET @body1='<p> The <<instancename>> Instance has been restarted.</p>'
-- table attributes
SET @body1=@body1+'<table border="2" cellspacing="2" cellpadding="2">'
-- column headers
SET @body1=@body1+ '<tbody align="left" style="font-family:Arial; font-size: 11;" <TR><TH>Service Name</TH><TH>Startup Type</TH><TH>Status</TH>
                    <TH>Startup Time</TH></TR></tbody>'
-- data
SELECT @body1=@body1 +'<tbody align="left" style="font-family:Arial; font-size: 11;" <TR><TD>'+servicename+'</TD><TD>'+startup_type_desc+'</TD><TD>'+
                    status_desc+'</TD><TD>'+CAST(last_startup_time AS VARCHAR(30))+'</TD></TR></tbody>'
FROM sys.dm_server_services
-- Send an html formatted e-mail to notify of restart 
EXEC msdb.dbo.sp_send_dbmail
       @profile_name = '<<profile>>',  -- Use valid database mail profile here
       @recipients = 'DBA@company.com',  -- Use valid email address here
       @subject = 'SQL restart on <<instancename>>',
       @body = @body1,
       @body_format = 'HTML'
SET NOCOUNT OFF

Here is how to set up the job….

Add a job step with the code modified to have the correct instance name and a valid e-mail profile where needed.

Here is where the “magic” happens in that you can set the schedule to only run when the Agent starts.  With this schedule there is no need to run a job that polls data many times a day unnecessarily.

Here is an example of the stand-alone restart alert:

 

 A lot of people might worry that if the SQL Server Agent service only is restarted that you will also get the restart alert and that is a true statement.  However, you should investigate all unexpected service restarts.  You can see from the results above that both services restarted on the same day at virtually the same time which would indicate that there was a restart.  If only the Agent had been restarted, you would see that the startup time for the Agent would be very different than the SQL Server service startup time.  Either way, you would receive an alert of a service restart which is a good thing.

Cluster Failover Restart Notification

A SQL cluster failover is a bit different in that I can query the actual node name that SQL is active on by looking at the SERVERPROPERTY function.  (More on the SERVERPROPERTY function can be found here: https://msdn.microsoft.com/en-us/library/ms174396.aspx)  There is some prep that had to be done to place the current and previous node names into permanent tables that can be queried and compared when the Agent service is started.  This will allow us to detect if a failover or a restart has occurred.  You will need an administrative database available where the tables can be created.

-- Create tables
USE [ADMIN]
GO
CREATE TABLE [dbo].[CurrentNode](
       [cnode] [varchar](100) NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[PreviousNode](
       [pnode] [varchar](100) NULL
) ON [PRIMARY]
GO

Once the tables are created they need to be populated with the current and previous node table values.  These will be the same at first so that after a failover then the comparison query will find that the previous node is not the same as the current node and the job will alert for a failover.  If the current node is the same as the previous node then a restart has occurred.

-- Prep - Put the same value into both tables so that if the current changes on failover a notice is sent
INSERT INTO CurrentNode (cnode)
 VALUES (CONVERT(VARCHAR(100),SERVERPROPERTY('ComputerNamePhysicalNetBIOS')))
INSERT INTO PreviousNode (pnode)
 VALUES (CONVERT(VARCHAR(100),SERVERPROPERTY('ComputerNamePhysicalNetBIOS')))

Here is the code that is used in the job that is set up just like the stand-alone restart notification job was done earlier.  Again, the logic below compares the values in the current node query to the pervious node table and will notify of either a failover or restart of the instance.  An e-mail can be sent to notify anyone who needs it of the failover or restart event. 

-- create job that only runs when the SQL Agent starts that uses the code below.
-- replace all <<instancename>> & <<profile>> placeholders with correct values
SET NOCOUNT ON
DECLARE @curnodename VARCHAR(100)
DECLARE @prevnodename VARCHAR(100)
DECLARE @body1 NVARCHAR(MAX)
SET @curnodename = CONVERT(VARCHAR(100),SERVERPROPERTY('ComputerNamePhysicalNetBIOS'))
SET @prevnodename = (SELECT pnode FROM PreviousNode)
IF @curnodename <> @prevnodename -- Failover
BEGIN
       UPDATE CurrentNode SET cnode = @curnodename
       -- instructions
       SET @body1='<p> A failover has occurred for the <<InstanceName>> cluster.</p>'
       -- table attributes
       SET @body1=@body1+'<table border="2" cellspacing="2" cellpadding="2">'
       -- column headers
       SET @body1=@body1+ '<tbody align="left" style="font-family:Arial; font-size: 11;" <TR><TH>Current Node</TH><TH>Previous Node</TH></TR></tbody>'
       -- data
       SET @body1=@body1 +'<tbody align="left" style="font-family:Arial; font-size: 11;" <TR><TD>'+@curnodename+'</TD><TD>'+@prevnodename+'</TD></TR></tbody>'
       -- Send an html formatted e-mail to notify of failover 
       EXEC msdb.dbo.sp_send_dbmail
             @profile_name = '<<profile>>',
             @recipients = 'DBA@company.com',
             @subject = 'Failover on <<instancename>> cluster',
             @body = @body1,
             @body_format = 'HTML';
       UPDATE PreviousNode SET pnode = @curnodename
END
IF @curnodename = @prevnodename  -- Restart
BEGIN
       -- Send an html formatted e-mail to notify of restart 
       EXEC msdb.dbo.sp_send_dbmail
             @profile_name = '<<profile>>',
             @recipients = 'DBA@company.com',
             @subject = 'Restart of <<InstanceName>> SQL Services',
             @body = 'The <<InstanceName>> cluster services may have been restarted but not failed over.';
END
SET NOCOUNT OFF

And finally here is the T-SQL of the actual job that I created for the cluster failover alert.  It will only run if the SQLAgent is started since we have set the job schedule to only run at that time.

-- Create job
USE [msdb]
GO
/****** Object:  Job [Failover Notification]    ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object:  JobCategory [[Uncategorized (Local)]]]    ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
  EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
  IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'Failover Notification',
             @enabled=1,
             @notify_level_eventlog=0,
             @notify_level_email=2,
             @notify_level_netsend=0,
             @notify_level_page=0,
             @delete_level=0,
             @description=N'On restart of SQL Agent this job will run and notify if the instance has failed over.',
             @category_name=N'[Uncategorized (Local)]',
             @owner_login_name=N'sa',
             @notify_email_operator_name=N'DBA', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [FailoverCheck]    ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'FailoverCheck',
             @step_id=1,
             @cmdexec_success_code=0,
             @on_success_action=1,
             @on_success_step_id=0,
             @on_fail_action=2,
             @on_fail_step_id=0,
             @retry_attempts=0,
             @retry_interval=0,
             @os_run_priority=0, @subsystem=N'TSQL',
             @command=N'SET NOCOUNT ON
DECLARE @curnodename VARCHAR(100)
DECLARE @prevnodename VARCHAR(100)
DECLARE @body1 NVARCHAR(MAX)
SET @curnodename = CONVERT(VARCHAR(100),SERVERPROPERTY(''ComputerNamePhysicalNetBIOS''))
SET @prevnodename = (SELECT pnode FROM PreviousNode)
IF @curnodename <> @prevnodename -- Failover
BEGIN
       UPDATE CurrentNode SET cnode = @curnodename
       -- instructions
       SET @body1=''<p> A failover has occurred for the <<InstanceName>> cluster.</p>''
       -- table attributes
       SET @body1=@body1+''<table border="2" cellspacing="2" cellpadding="2">''
       -- column headers
       SET @body1=@body1+ ''<tbody align="left" style="font-family:Arial; font-size: 11;" <TR><TH>Current Node</TH><TH>Previous Node</TH></TR></tbody>''
       -- data
       SET @body1=@body1 +''<tbody align="left" style="font-family:Arial; font-size: 11;" <TR><TD>''+@curnodename+''</TD><TD>''+@prevnodename+''</TD></TR></tbody>''
       -- Send an html formatted e-mail to notify of failover 
       EXEC msdb.dbo.sp_send_dbmail
             @profile_name = ''<<profile>>'',
             @recipients = ''DBA@company.com'',
             @subject = ''Failover on <<InstanceName>> cluster'',
             @body = @body1,
             @body_format = ''HTML'';
       UPDATE PreviousNode SET pnode = @curnodename
END
IF @curnodename = @prevnodename  -- Restart
BEGIN
       -- Send an html formatted e-mail to notify of restart 
       EXEC msdb.dbo.sp_send_dbmail
             @profile_name = ''<<profile>>'',
             @recipients = ''DBA@company.com'',
             @subject = ''Restart of <<InstanceName>> SQL Services'',
             @body = ''The <<InstanceName>> cluster services may have been restarted but not failed over'';
END
SET NOCOUNT OFF',
             @database_name=N'ADMIN',
             @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'failover check sched',
             @enabled=1,
             @freq_type=64, -- this sets the job schedule to only run when the Agent starts
             @freq_interval=0,
             @freq_subday_type=0,
             @freq_subday_interval=0,
             @freq_relative_interval=0,
             @freq_recurrence_factor=0,
             @active_start_date=20150101,
             @active_end_date=99991231,
             @active_start_time=0,
             @active_end_time=235959
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

The actual cluster failover notification looks like this:

 

Now when SQL restarts or a cluster failover happens, I receive a restart notification and can then research why it happened and troubleshoot if necessary.  I hope this is helpful to you.  Enjoy!

Rate

4.79 (14)

Share

Share

Rate

4.79 (14)