We have a job that frequently hangs. It is trying to import data from Oracle through a linked server and for some reason tends to just wait forever instead of timing out or failing. We know how long the job should take, so we decided to stop the job if it took twice as long as it should and restart it. Since the job truncates all the tables it is importing into as it starts, there's no reason to avoid simply restart the job from the beginning.
After doing this manually for awhile, we decided to automate the process. The system stored procedure msdb.dbo.sp_stop_job works just fine if you know the name of the job you're trying to stop and the same goes for sp_start_job. Automating those two aspects is no problem. The tricky part is programmatically determining when the job is still running. You can use sp_help_job, but since it returns a table, you need to be able to insert that data into a table to find the data you want. sp_help_job is mainly for visual and manual use. I found details on sp_help_job here: Detecting The State of a SQL Server Agent Job.
Since sp_help_job does what I need, clearly the next step is to look inside sp_help_job and see what it does to see if I can automate the process. The stored procedure (sp) sp_help_job is located in the msdb database which can be found under SystemDatabases. Expand StoredProcedures and then SystemStoredProcedures and marvel at the number of stored procedures located there. This is a great place to explore when you have time, there are tons of helpful stored procedures here.
I opened sp_help_jobs and checked for what would tell me if the job was currently running. (To open it, right click on the sp name and click Modify. Be careful you don't change anything or hit execute while the sp is open. If you're worried, comment out the 'ALTER PROCEDURE' line when you first open it.) There's a variable called @execution_status that looks like a good candidate, let's see where that's used. Scroll down through the sp and you'll see @execution_status is fed to sp_get_composite_job_info.
If we open sp_get_composite_job_info, we'll see the variable 'running' in the last declared table, that's what we're looking for. A little further down is how this table is populated. The first step in populating the table is a statement to determine if you're running SQL Server 8 or newer. I am, so the step I want is
master.dbo.xp_sqlagent_enum_jobs @can_see_all_running_jobs, @job_owner, @job_id
The only difference is that in and after SQL Server 8, @job_id is a parameter. Before SQL Server 8, there is no @job_id parameter.
Since we're going as deep as we can to get just the parts we need, let's look at xp_sqlagent_enum_jobs. Note that this starts with xp_, not sp_ and it is in a different database. Move over to Master. Expand Extended Stored Procedures and then System Extended Stored Procedures. Here you'll find sys.xp_sqlagent_enum_jobs. Why this is in the schema 'sys' instead of the schema 'dbo' as listed in the Execute, I don't know. I can't find it in any folders with the dbo schema. There's also no way to look into this code. At least no way I know. Right clicking on it just brings a prompt to Start PowerShell. Checking Properties shows it is a DLL, I'd likely need a decompiler to take a look.
This is low enough to be a good stopping point though. It gets me the Running result in a format where I can use the value immediately and doesn't get me a ton of things I don't need at the moment. So how do we use this?
There's no need to reinvent the wheel, so just copy the parts we'll need from sp_get_composite_job_info. That's the declarations
DECLARE @can_see_all_running_jobs INT
DECLARE @job_owner sysname
DECLARE @job_id uniqueidentifier
And the table declaration
DECLARE @xp_results TABLE
job_id UNIQUEIDENTIFIER NOT NULL,
last_run_date INT NOT NULL,
last_run_time INT NOT NULL,
next_run_date INT NOT NULL,
next_run_time INT NOT NULL,
next_run_schedule_id INT NOT NULL,
requested_to_run INT NOT NULL, -- BOOL
request_source INT NOT NULL,
request_source_id sysname COLLATE database_default NULL,
running INT NOT NULL, -- BOOL
current_step INT NOT NULL,
current_retry_attempt INT NOT NULL,
job_state INT NOT NULL
We'll also copy the parts that populate the variables
SELECT @can_see_all_running_jobs = ISNULL(IS_SRVROLEMEMBER(N'sysadmin'), 0)
SELECT @job_owner = SUSER_SNAME()
SELECT @JOB_ID = job_id FROM msdb.dbo.sysjobs_view
WHERE name = 'RunOften'
And the part that populates the table we need
INSERT INTO @xp_results
EXECUTE master.dbo.xp_sqlagent_enum_jobs @can_see_all_running_jobs, @job_owner, @job_id
Normally I don't like to populate a table without specifying the columns, but as I can't declare the columns in a following SELECT statement and this code is directly from a system stored procedure, I'm letting it go.
Now that we have all the data, we have to do something with it. We'll declare a variable to store our result
DECLARE @IsRunning bit = 0
SELECT @IsRunning = xpr.running
FROM @xp_results xpr
Pass in the name of the job you want to check, then just check the Running stat. You don't need to join it to any other tables, just see if @IsRunning = 1 and if it does, stop the job with sp_stop_job
IF @IsRunning = 1
exec msdb.dbo.sp_stop_job @JobName
WAITFOR DELAY '00:01:00'
exec msdb.dbo.sp_start_job @JobName
If this is a job you want to start again automatically, you can start it again with sp_start_job, but I recommend including a 1 minute wait before you do to make sure the job has stopped successfully before trying to start it again.
If you have multiple schedules and want to find out which one is running, the process is slightly different. First we declare a variable of our own
DECLARE @Phase tinyint = 0
And we'll assign a value to it based on the information we've retrieved from the xp_ procedure
SELECT @Phase = CASE S.name WHEN 'First Run' THEN 1
WHEN 'Second Run' THEN 2
WHEN 'Third Run' THEN 3
WHEN 'Fourth Run' THEN 4
FROM @xp_results xpr
LEFT JOIN msdb.dbo.sysjobschedules JS
ON xpr.job_id = JS.job_id
AND xpr.next_run_schedule_id = JS.schedule_id
LEFT JOIN msdb.dbo.sysschedules S
ON JS.schedule_id = S.schedule_id
WHERE xpr.running = 1
In my job I have four distinct scheduled run times listed as 'First Run', 'Second Run', 'Third Run' and 'Fourth Run'. We can relate these named schedules back to the data we've retrieved through the system tables msdb.dbo.sysjobschedules and msdb.dbo.sysschedules. In sysschedules, the schedule's names are listed, we'll use that in the CASE statement to assign a number from 1 through 4. We just have to check the table we've brought back to see which of the schedules is currently running. The 'running' value for that job will be 1. (When you have a chance, check out sysjobschedules and sysschedules, you may find other useful information in there and it is good to understand the tables you're referencing.)
Finally, we return that @Phase value to the process calling this stored procedure
We can perform whatever actions we need based on that. For example, I truncate the day's temporary tables at the beginning of the job, but only if the scheduled job = 1.
One other handy piece of information. When you're looking at the Job Activity Monitor, you'll notice there are two columns: Last Run and Next Run. Last Run refers to the last completed run of the job. Next Run refers to the next time the job runs, but doesn't change until that run has completed. So if you have a job that runs at 5, 6, 7 and 8 and you look at the Job Activity Monitor at 7:15 and Next Run says 7, that just means that the 7:00 job is still running. If it says 6:00, that means your 6:00 job is running over time, not that your 5:00 job is running long. I find this information useful in determining if a job has gone badly enough to stop it and investigate or if it is just a little slow. The terms can be confusing though, so remember, Last and Next refer to job completion, not starts.