Monitoring Failed Job Steps

,

One day I became tired of doing a manual procedure to check every morning to

see if nightly backup, data transfer, and other jobs have failed. I decided to

develop code that will automatically verify if jobs have failed and email the

results to me.

One way is to accomplish this is goal is to make sure the last step of any job

is 'Check Failed Steps'. This works as long as the last step of the job is the

only one to have an on failure action of 'Quit the job reporting failure'. For

example:

 Job Step Name On Success On Failure
Check all databases Go to next Step Go to next Step
Cycle Error Log Go to next Step Go to next Step
Report Folder Sizes Go to next Step Go to next Step
Database Size Report Go to next Step Go to next Step
Check failed steps Quit the job reporting success Quit the job reporting failure

In some cases if a step fails you'll want to branch directly to the failed

step:

 Job Step Name On Success On Failure
Extract Data Go to next Step Go to step : 5
Prepare File Data Go to next Step Go to step : 5
Create File Go to next Step Go to step : 5
Send File Go to next Step Go to step : 5
Check failed steps Quit the job reporting success Quit the job reporting failure

P_JOB_STEP_FAIL_1.prc contains the

stored procedure code to check failed steps for a job. This only checks the last

(current) instance of the job to run. To run it, the syntax is:

exec dbo.P_JOB_STEP_FAIL @job_name VARCHAR(50), @recipients varchar(100)

P_ANY_JOB_STEP_FAIL_1.rpc

notifies you of any failed job step within the last x hours. I typically a

separate job for this procedure and schedule it to run every 3-4 hours checking

8 hours back. Synax is as follows:

Exec dbo.P_ANY_JOB_STEP_FAIL @recipients varchar(100), @hours_back int = -12

Of course, I am using msdb system tables sysjobhistory and sysjobs. No

special permissions are required because they run as a part of a job.

Conclusion

It is may be not a big deal to check jobs daily and control them manually.

But this simple 2 way automation helps me, especially when I maintain multiple

servers and many jobs. And it gives me the pleasure of additional control. I

have been using these procedures for almost 2 years and didn’t have issues or

problems so far with proper error notification.

Rate

Share

Share

Rate