Blog Post

Job History – Back to Basics

,

How necessary is that SQL Server Agent job that you have running on the server? I ask that question of my clients on a routine basis.

Sometimes, I will ask that question as a routine part of a health check for the server. Others, it might be due to a migration or upgrade that is required for the server. Generally, the answer to the question will be one of two things: a) “Yes it is necessary.” or b) “What is that job doing?”.

Believe it or not, both answers will typically spawn more questions. You see, I don’t usually ask that question unless the job is failing on a regular basis. You wouldn’t believe how many jobs exist out there that are scheduled and just fail every time as well.

When I encounter a situation like this, it means it is time to have a discussion. In order to have that discussion, one needs to have done a bit of homework in order to better understand the situation. For me, part of this homework involves running the numbers and figuring out the frequency of the job’s failure or success.

Data Gathering

For me, I like to understand how often a job has executed and what is the frequency of failure for that quantity of executions. If I see a job that has not succeeded successfully in 60 consecutive executions, it is probably a safe bet that the job is not needed. Why? Well, if nobody has noticed the job hasn’t been working for that long, the likelihood of the job providing any use to the business is slim to none. In this case, I would present a case to the business as to why it should be removed.

But, how do I get to that point? Well, you could go through the job history for each job one by one and run some manual analytics. Or, you could take advantage of a script. I prefer the script route because it is faster, more reliable and a lot less mundane.

WITH jobhist AS
(SELECT
   jh.job_id
 , jh.run_date
 , jh.run_status
 , jh.step_id
 , ROW_NUMBER() OVER (PARTITION BY jh.job_id, jh.run_status ORDER BY jh.run_date DESC) AS rownum
 FROM  dbo.sysjobhistory jh
 WHERE jh.step_id = 0)
   , agglastsixty AS
(SELECT
sjh.job_id
  , sjh.run_status
  , COUNT(*) AS RunCount
 FROMdbo.sysjobhistory sjh
 INNER JOIN jobhist  jh
 ON jh.job_id = sjh.job_id
AND jh.run_status = sjh.run_status
AND jh.step_id = sjh.step_id
 WHERECONVERT(DATE, CONVERT(VARCHAR, sjh.run_date)) > DATEADD(
   d
 , -60
 , CONVERT(DATE, CONVERT(VARCHAR, jh.run_date))
   )
AND jh.rownum = 1
 GROUP BYsjh.job_id
  , sjh.run_status)
   , aggtotal AS
(SELECT
sjh.job_id
  , sjh.run_status
  , COUNT(*) AS RunCount
 FROMdbo.sysjobhistory sjh
 INNER JOIN jobhist  jh
 ON jh.job_id = sjh.job_id
AND jh.run_status = sjh.run_status
AND jh.step_id = sjh.step_id
 WHEREjh.rownum = 1
 GROUP BYsjh.job_id
  , sjh.run_status)
SELECT
j.name AS JobName
  , sc.name AS CategoryName
  , sp.name AS OwnerName
  , j.owner_sid
  , j.date_created
  , j.enabled
  , CONVERT(DATE, CONVERT(VARCHAR, oa.run_date)) AS RunDate
  , CASE oa.run_status
WHEN 0
 THEN 'Failed'
WHEN 1
 THEN 'Succeeded'
WHEN 2
 THEN 'Retry'
WHEN 3
 THEN 'Canceled'
WHEN 4
 THEN 'In Progress'
END AS run_status
  , als.RunCount AS Last60StatusCount
  , agt.RunCount AS TotalStatusCount
FROMdbo.sysjobs  j
LEFT OUTER JOINjobhist  oa
ON oa.job_id = j.job_id
LEFT OUTER JOIN agglastsixty  als
ON als.job_id = oa.job_id
   AND als.run_status = oa.run_status
LEFT OUTER JOIN aggtotal  agt
ON agt.job_id = oa.job_id
   AND agt.run_status = oa.run_status
INNER JOINsys.server_principals sp
ON j.owner_sid = sp.sid
INNER JOINdbo.syscategories  sc
ON j.category_id = sc.category_id
WHEREoa.rownum = 1
ORDER BYRunDate DESC;

Running that script against my sandbox, I may see something such as the following.

Here you will note that the “wtf” job has two entries. One entry for “Succeeded” (in green) and one entry for “Failed” (in red). Each row receiving counts for number of executions.

This is the type of information I can use in a meeting to discuss with the business. This is no longer a discussion of opinion, but rather one that is now based on facts and data. It now becomes very easy to demonstrate to the business that a job has failed 60/60 times and nobody noticed it or cared enough about the consistent failures to do anything about it. Imagine if the failing job happens to be the database backups. I wonder what the action items for that job failure might include.

The Wrap

An important part of any DBAs job is to ensure database related jobs are running prim and proper. Sometimes that just doesn’t happen. When jobs are being overlooked, it is useful to be able to gather data related to consistency of job success or failure. This script will help you in your investigation efforts. In addition, I also recommend this article in your agent job audit efforts.

This has been another post in the back to basics series. Other topics in the series include (but are not limited to): Backups, backup history and user logins.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating