Returning full error details from SQL Server Agent jobs

Recently as part of a complex BI project, I have had to ensure that a multitude of SSIS packages have run when called from SQL Server Agent. This is not particularly difficult, and poses no real problem. However the project did let me get to understand in greater depth some of the ways that SQL Server Agent works - particularly as far as the tables that it uses to track jobs and job progress in the msdb database are concerned.

So I wanted to use the information that these tables can provide to solve two simple, but necessary problems required by my project:

Return full error details for a failed step when a SQL Server Agent job runs
Display in a web or winforms application the current status of a job that is running

Although the solutions to these two problems are based on a common thread, I consider that it is probably easier to treat them as separate items. I realise, of course, that plenty has been written about this before, but as I could not find exactly the solutions that I was looking for when I set off on my searches across the web, I thought that another take on the question could be useful. So in this first article, I will look at returning full error details from a failed step when a SQL Server Agent job runs.

Firstly, let's consider the typical problem:

You have set up an Agent job to run at a scheduled time - let's say overnight. Of course, everything runs fine, until the day that it fails, inexplicably. At this point, like me, you probably right-click the job in question, and select "View History", to open the Log File viewer. Then you expand the details of the failed step (at the top of the list), and read the details in the lower part of the dialog.

Then the trouble starts because the details are massively truncated, to say the least. In fact they only show the first 1024 characters, and given the extreme verbosity of SSIS (as this is what I was using), this meant that effectively all I got was a preamble to the problem, but nothing of much use.

Yes, I know that efficient logging in SSIS can give me lots of debugging information, and of course, I have added extensive logging to all my SSIS packages. Yet the consequent time and effort spent digging for the error messages given the inevitable urgency of a data load failure situation prompted me to wish for something simpler, faster and more centralised. After all, when debugging a process invoked by SQL Server Agent, I don't want to have to dig deep into logs specific to the processes that have been invoked as the first port of call.

So where can a solution lie?

Fairly simply, in looking more closely at the five core tables used by SQL Server Agent when running jobs. These tables provide a wealth of information that I have glossed over for years, and now realise is extremely useful.

Before suggesting a simple stored procedure to display all the full logging information for any failed step in an Agent Job, here is a very brief reminder of what these tables are, and why they are useful:

Sysjobs: All the jobs in the Server instance
Sysjobsteps: All the steps for the jobs.
Sysjobactivity: The current status of each job. This is particularly useful for seeing if a job is running, or when it last ran.
Sysjobhistory: The history of each job when it ran, including all job steps, until truncated (this is what you see in the Log File viewer)
sysjobstepslogs: The job step log for certain jobs.

Specifically, the "sysjobstepslogs" table, which, if, will store all the logging data for selected steps of a SQL Server Agent job. Here I need to stress that it is the "properly configured " part that changes everything, for this table will be empty by default.

So what I want to do is to set certain (or al of the) steps in a job to write full logging data to , which I will then read out to get the error details

Firstly, configure logging for the steps you wish to keep log data for:

Right-click the job, and select "Properties".
Click "Steps" in the left-hand pane.
Click on the step to edit in the right-hand pane.
Click "Edit" (or simply double-click the step to combine 2 and 3)
Click "Advanced" in the left-hand pane.
Click "Log to table".
Confirm the modifications to the step, and to the Job.

Note that if you select "Append Output to Existing Entry in Table" you will add log data from multiple job runs to the same record in sysjobstepslogs. Personally, I find this more confusing rather than useful.

CREATE PROCEDURE pr_GetStepFailureData
(
@JobName VARCHAR(250)
)
AS
/*
This procedure gets failure log data for the failed step of a SQL Server Agent job
*/DECLARE @job_id UNIQUEIDENTIFIER
SELECT @job_id = job_id FROM dbo.sysjobs WHERE [name] = @JobName
SELECT 'Step ' + CAST(JH.step_id AS VARCHAR(3)) + ' of ' + (SELECT CAST(COUNT(*) AS VARCHAR(5)) FROM dbo.sysjobsteps WHERE job_id = @job_id) AS StepFailed,
 CAST(RIGHT(JH.run_date,2) AS CHAR(2)) + '/' + CAST(SUBSTRING(CAST(JH.run_date AS CHAR(8)),5,2) AS CHAR(2)) + '/' + CAST(LEFT(JH.run_date,4) AS CHAR(4)) AS DateRun,
 LEFT(RIGHT('0' + CAST(JH.run_time AS VARCHAR(6)),6),2) + ':' + SUBSTRING(RIGHT('0' + CAST(JH.run_time AS VARCHAR(6)),6),3,2) + ':' + LEFT(RIGHT('0' + CAST(JH.run_time AS VARCHAR(6)),6),2) AS TimeRun,
 JS.step_name, 
 JH.run_duration, 
 CASE
 WHEN JSL.[log] IS NULL THEN JH.[Message]
 ELSE JSL.[log]
 END AS LogOutput
FROM dbo.sysjobsteps JS INNER JOIN dbo.sysjobhistory JH 
 ON JS.job_id = JH.job_id AND JS.step_id = JH.step_id 
 LEFT OUTER JOIN dbo.sysjobstepslogs JSL
 ON JS.step_uid = JSL.step_uid
WHERE INSTANCE_ID >
 (SELECT MIN(INSTANCE_ID)
 FROM (
 SELECT top (2) INSTANCE_ID, job_id
 FROM dbo.sysjobhistory
 WHERE job_id = @job_id
 AND STEP_ID = 0
 ORDER BY INSTANCE_ID desc
 ) A
 )
 AND JS.step_id <> 0 
 AND JH.job_id = @job_id
 AND JH.run_status = 0
ORDER BY JS.step_id ASC

Running it is simple:

EXEC pr_GetStepFailureData 'JobName'

This will return details of the step at which a job failed, with up to 2Bn characters of logging information (the underlying column, "Log" in sysjobstepslogs is a VARCHAR(Max) column). If the job did not fail, then it returns nothing.

Code Explanation

What matters here is the use of two columns:

Log in sysjobstepslogs
Message in sysjobhistory

The former is the dreaded 1024 character truncated data that you can get from the Log File viewer, the latter is the full logging information that will be kept if you have configured a step to "Log to table".

So all this stored procedure does is to join sysjobhistory and sysjobstepslogs, using a left outer join to ensure that it is not only the steps that have Log to Table enabled which are output (though you can be more restrictive if you wish). Then the Log data (or Message data from sysjobhistory if table logging is not enabled) is output.

The stored procedure only outputs data if a step has failed - this is detected by filtering on run_status = 0 in the sysjobhistory table. This too can be altered if you wish.

It is worth noting that the "job" tables use a GUID - job_id- to identify each job. So joining tables on the job_id is what you must do. However, as GUIDs are immensely unmemorable, I have chosen to pass in the job name and look up the GUID in the stored procedure. Steps also have a GUID, this is the step_uid that is in the Sysjobsteps and sysjobstepslogs tables.

The only part of this stored procedure that may seem complex is the subquery which finds the latest data for the Agent job. I am assuming here that the Sysjobhistory table may contain many records detailing all the steps for a job that has run many times. So the stored procedure simply finds the last but one instance_id (a sequential number generated in the table) corresponding to the end of a job (STEP_ID = 0). Then the stored procedure filters on all the instance_ids greater than this.

As a good European, I have formatted the date in a format that I am happier with. You can alter this as you wish.

Apart from the fields that I have mentioned above, some of the other ones that you may find useful are:

Sysjobstepslogs.log_size - This can tell you if there is an inordinate amount of logging data for a step.
Sysjobhistory.sql_message_id - useful for decrypting SQL error messages
sysjobactivity.run_requested_source - useful to see how the SQL Server Agent job was run. The codes are (courtesy of BOL):
1 = SOURCE_SCHEDULER
2 = SOURCE_ALERTER
3 = SOURCE_BOOT
4 = SOURCE_USER
6 = SOURCE_ON_IDLE_SCHEDULE
You may find outputting these codes - or testing on them useful in your application.

Conclusion

Of course, this is just one way to get at the valuable data contained in the MSDB tables which I described above. There are many other ways of returning the information that they hold, including:

Display full logging information for all successful steps once a SQL Server Agent Job has run.
Display full logging information for all steps (successful or unsuccessful) once a SQL Server Agent Job has run.

Obviously, this approach is only one take on the problem, and there are many other ways of achieving similar results. Consequently I hope that the code given in this article, as well as the explanation of the underlying data structures will fire your imagination!

So it can be as simple as this to display full logging of a SQL Server Agent job. In the next article I will show how to display in a web or winforms application the current status of a job that is running, as well as the job result.

Returning full error details from SQL Server Agent jobs

Firstly, let's consider the typical problem:

So where can a solution lie?

Code Explanation

Conclusion

Rate

Share

Categories

Share

Rate

Returning full error details from SQL Server Agent jobs

Firstly, let's consider the typical problem:

So where can a solution lie?

Code Explanation

Conclusion

Rate

Share

Categories

Share

Rate

Related content

Installing SQL Server Agent on SUSE Linux

How to Decipher sysschedules

Installing SQL Server Agent on Ubuntu Server

Automating SQL Server Agent Notification

Never update systems tables directly - a study in Agent job scheduling

Cookies on SQLServerCentral