Runaway Job Management

,

I support quite a few third-party applications, many of which install Database Maintenance jobs of their own (e.g. purging, populating, etc.) Once in awhile these jobs interfere with day-to-day activities or my normal DBA maintenance jobs.

The repetitive task of finding the answer to "Why is XXX so slow today?", or "Why is the reindex taking so long this weekend?" usually involves checking to see if these third-party jobs are running/runaway/stuck. Normally, the job is killed by an administrator and life goes on.

Having laid that out, I wanted a way to be alerted or automatically take an action on my behalf so that I could retain my precious Corona time.

The code I provided will either send an Alert to the specified SQL Operator ('Alert'), or stop the job ('Kill') depending on what argument is set. This code can be altered to be placed directly within a "Governer" job, or you can make it into a Stored Procedure on a database of your choosing within the instance and call it for multiple jobs.

Logic can also be added to take other actions of your choosing. Hope you enjoy this as it's saved me quite a bit of time in the past year!

Mark

SET NOCOUNT ON

DECLARE
    @JobName SYSNAME,
    @RuntimeThresholdMins INTEGER,
    @Action VARCHAR(25),
    @AlertOperatorName NVARCHAR(100)

--User-defined parameters
SET @JobName = 'Job Name'
SET @RuntimeThresholdMins = 240
SET @Action = 'Alert'--'Kill
SET @AlertOperatorName = 'OperatorName'

DECLARE
    @JobID UNIQUEIDENTIFIER,
    @IsSysAdmin INTEGER,
    @JobOwner SYSNAME

--Determine Job ID from name
SELECT @JobID = SJ.job_id FROM msdb..sysjobs SJ WHERE SJ.[name] = @JobName
SET @IsSysAdmin = ISNULL(IS_SRVROLEMEMBER(N'sysadmin'), 0)
SET @JobOwner = SUSER_SNAME()

IF @JobID IS NULL
 BEGIN
    RAISERROR('Job "%s" does not exist on %s.', 16, 1, @JobName, @@SERVERNAME)
    RETURN
 END

DECLARE @JobState TABLE
    (
    JobID BINARY(1024),
    LastRunDate VARCHAR(8),
    LastRunTime VARCHAR(6),
    NextRunDate VARCHAR(8),
    NextRunTime VARCHAR(6),
    NextRunScheduleID INTEGER,
    RequestedToRun INTEGER,
    RequestSource INTEGER,
    RequestSourceID VARCHAR(255),
    Running INTEGER,
    CurrentStep INTEGER,
    CurrentRetryAttempt INTEGER,
    [State] INTEGER
    )

INSERT INTO @JobState
EXEC [master]..xp_sqlagent_enum_jobs @IsSysAdmin, @JobOwner, @JobID

--Normalize VARCHAR string for time formatting
UPDATE JS SET
 JS.LastRunTime = REPLICATE('0',6-LEN(JS.LastRunTime)) + JS.LastRunTime,
 JS.NextRunTime = REPLICATE('0',6-LEN(JS.NextRunTime)) + JS.NextRunTime
FROM @JobState JS

IF EXISTS(
    SELECT 'X'
    FROM @JobState JS
    WHERE
     --Job is currently executing
     JS.[State] = 1
     AND
        DATEDIFF(mi,
         CAST(SUBSTRING(JS.NextRunDate, 5, 2) + '/' +
            SUBSTRING(JS.NextRunDate, 7, 2) + '/' +
            SUBSTRING(JS.NextRunDate, 1, 4) + ' ' +
            --Time
            SUBSTRING(JS.NextRunTime, 1, 2) + ':' +
            SUBSTRING(JS.NextRunTime, 3, 2) + ':' +
            SUBSTRING(JS.NextRunTime, 5, 2) AS DATETIME), GETDATE()) >= @RuntimeThresholdMins)
 BEGIN    
    IF @Action = 'Alert'
     BEGIN
        DECLARE
            @Subject VARCHAR(100),
            @Body VARCHAR(8000)
        
        SET @Subject = '***Possible Runaway Job: ' + @JobName + '***'
        SET @Body = 'The job "' + @JobName + '" has been running for longer than ' + CAST(@RuntimeThresholdMins AS VARCHAR(38)) + ' minutes.'
        
        --Send alert
        EXECUTE msdb..sp_notify_operator
            @name = @AlertOperatorName,
            @subject = @Subject,
            @Body = @Body
     END
    ELSE IF @Action = 'Kill'
     BEGIN    
        --Cancel job
        EXEC msdb..sp_stop_job @job_id = @JobID
     END
    ELSE
     BEGIN
        RAISERROR('"%s" is an invalid action. Use "Alert" or "Kill".', 16, 1, @Action)
     END
 END
GO

Rate

Share

Share

Rate