I’ve been involved in building a solution on top of SQL Server peer to peer replication (SQL 2014 SP1) and slowly building up a script library for general DBA tasks within that solution. One common task is to add a new table/article to the publication.
The basic steps for adding an article are outlined here, and the end of the process requires the distribution agents to be restarted.
I set about building a template script that I could drop my table creates into and run, and the script creates the table, adds the articles and then restarts the distribution jobs.
My first attempt to restart the jobs was:
exec msdb..sp_stop_job @job_name = 'my distribution agent job'; exec msdb..sp_start_job @job_name = 'my distribution agent job';
With this result:
Job 'my distribution agent job' stopped successfully. Msg 22022, Level 16, State 1, Line 124 SQLServerAgent Error: Request to run job my distribution agent job (from User *****) refused because the job is already running from a request by User *****.
If you dig into the sp_stop_job procedure you find a call to sp_sqlagent_notify which in turn makes a call to master.dbo.xp_sqlagent_notify. This call goes out from the SQL Server service to the SQL Agent service and would appear to be asynchronous. You can see that by running something like the following which stops the job and then queries it’s state immediately afterwards.
exec msdb..sp_stop_job @job_name = 'my distribution agent job'; exec msdb..sp_help_job @job_name = 'my distribution agent job', @job_aspect = 'JOB';
The @job_aspect parameter means the procedure only returns a single row which includes a column showing the running status of the job – which in this case was still running.
The next thought is to send the stop job request and then loop until this request has completed and then run the start. A similar issue arises when querying the status of the jobs. My first preference is to be able to query the result from system views/tables; something like:
set nocount on; declare @i int; exec msdb..sp_stop_job @job_name = 'My distribution job'; while exists( SELECT 1 FROM msdb.dbo.sysjobs_view job JOIN msdb.dbo.sysjobactivity activity ON job.job_id = activity.job_id JOIN msdb.dbo.syssessions sess ON sess.session_id = activity.session_id JOIN (SELECT MAX( agent_start_date ) AS max_agent_start_date FROM msdb.dbo.syssessions ) sess_max ON sess.agent_start_date = sess_max.max_agent_start_date WHERE run_requested_date IS NOT NULL AND stop_execution_date IS NULL AND job.name = 'My distribution job' ) begin select @i =1 end; exec msdb..sp_start_job @job_name = 'My distribution job';
But this suffers the same problem with the asynchronous nature of xp_sqlagent_notify. It looks like there are still some completion actions that take place after the session has ended.
After much experimentation the most reliable way I could find of accurately determining the status (in close to real time) was by using another call out to the agent service: xp_sqlagent_enum_jobs. Loading a temp table with the results of a proc call and then querying those results was not as elegant a solution as I was looking for – but here it is:
declare @running bit = 1; declare @jobs table( job_iduniqueidentifier, [Last Run Date]int, [Last Run Time]int, [Next Run Date]int, [Next Run Time]int, [Next Run Schedule ID]int, [Requested To Run]int, [Request Source]int, [Request Source ID]varchar(100), [Running]int, [Current Step]int, [Current Retry Attempt]int, [State]int ); declare @job_id uniqueidentifier = (select job_id from msdb..sysjobs where name = 'My distribution job'); exec msdb..sp_stop_job @job_name = 'My distribution job'; while (@running = 1) begin insert @jobs EXECUTE master.dbo.xp_sqlagent_enum_jobs 1, dbo, @job_id select @running = running from @jobs delete @jobs end; exec msdb..sp_start_job @job_name = 'My distribution job';