Blog Post

Failing SQL Agent Jobs – Part 1

,

SQL Server Agent Jobs are one of the core features of SQL Server. Agent Jobs perform many actions, from maintenance tasks (server and database), data movement flows, replication, log shipping, data cleanup, health checks and many more. Since Agent Jobs are a critical component in a data organization, it is important to know when Agent Jobs do not succeed. There are several ways to accomplish the monitoring of failed Agent Jobs, from easy to more complex. This four part series is going to cover how you, the DBA, can be notified of failing Agent Jobs in your SQL Server environment.

  • Part 1: Checking remote servers for failed jobs from a Central Console
  • Part 2: Building an e-mail report that notifies you of failed jobs
  • Part 3: Setting up SQL Server Agent Alerts/Operators to notify you when a job fails
  • Part 4: Collecting Agent Job information centrally and reporting off that data

Let’s get started with a basic example of checking our Agent Jobs for failure using our trusty friend dbatools.

First step:

Load the

$servers variable with all the SQL Servers we want to monitor for failing SQL Server Agent Jobs.

Second step: 
# Get Agent Job information for all jobs on all servers in the $servers variable
$failedJobs = Get-DbaAgentJob -SqlInstance $servers

Now run the

Get-DbaAgentJob command for all the servers and store the results in

$failedJobs .

Third Step: 
$failedJobs | Where-Object LastRunOutcome -ne "Succeeded" | Format-Table SqlInstance, Name, LastRunDate, LastRunOutcome, Category, IsEnabled, HasSchedule -AutoSize

As you can see from the results we get a lot of extra output that does not relate to what we are trying to accomplish.  We get Agent Jobs that have never been executed (Unknown) and Replication jobs that have a Cancelled status.

Next, let’s adjust our filters a bit to remove the Replication jobs from the mix.

$failedJobs | Where-Object { ( $_.LastRunOutcome -ne "Succeeded") -AND ($_.Category -notlike "REPL*") } | Format-Table SqlInstance, Name, LastRunDate, LastRunOutcome, Category, IsEnabled, HasSchedule -AutoSize

This makes the list more manageable.  We now have Agent Job failures and Agent Jobs never executed.  Now we have the ability to track down the Job Owner to see if jobs that have been deployed and never executed are still needed.  You should always keep an eye on things that can be cleaned up.

You can also determine from the list if a Job has a schedule or is enabled.  This will help you determine technical debt as well.  A Job with a last run outcome in 2018 that is disabled is probably not something that is expected to run and might be a candidate for archive and deletion.

$failedJobs | Where-Object { ( $_.LastRunOutcome -eq "Failed") -AND ($_.Category -notlike "REPL*") } | Format-Table SqlInstance, Name, LastRunDate, LastRunOutcome, Category, IsEnabled, HasSchedule -AutoSize

We are now to the meat of why we are here.  A list of SQL Agent Jobs that have Failed on their last execution. We can work to identify the failures and fix them or notify the appropriate teams. Again keeping in our mind possible cleanup candidates.  A couple of these failures have no schedule or are disabled.

Fourth Step:

Here are a couple extra examples of filters that might help identify jobs in your echo system that might need attention.

# How about any outcome that is not Failed or Succeeded
$failedJobs | Where-Object { ( $_.LastRunOutcome -ne "Succeeded" -AND $_.LastRunOutcome -ne "Failed") -AND ($_.Category -notlike "REPL*") } | Format-Table SqlInstance, Name, LastRunDate, LastRunOutcome, Category, IsEnabled, HasSchedule -AutoSize
# What if we want to see jobs that have never been executed
$failedJobs | Where-Object { ( $_.LastRunDate -eq "01/01/0001") -AND ($_.Category -notlike "REPL*") } | Format-Table SqlInstance, Name, LastRunDate, LastRunOutcome, Category, IsEnabled, HasSchedule -AutoSize
# Know we can look for never executed and not enabled
$failedJobs | Where-Object { ( $_.LastRunDate -eq "01/01/0001" -AND $_.IsEnabled -eq $false) -AND ($_.Category -notlike "REPL*") } | Format-Table SqlInstance, Name, LastRunDate, LastRunOutcome, Category, IsEnabled, HasSchedule -AutoSize
# Do failed jobs have a schedule
$failedJobs | Where-Object { ( $_.LastRunOutcome -eq "Failed" -AND $_.HasSchedule -eq $false) -AND ($_.Category -notlike "REPL*") } | Format-Table SqlInstance, Name, LastRunDate, LastRunOutcome, Category, IsEnabled, HasSchedule -AutoSize
Conclusion:

You learned how to get a list of all SQL Agent Jobs and populate a variable.  Then you learned how to filter that output to help you identify Failed Agent Jobs.  Also, you have the tools to filter the information further to provide Agent Jobs that are technical debt and no longer needed and can be candidates for removal.

Stay tuned for Part 2 of our series where you will learn how to take the output from the command and send an HTML e-mail for review. The easiest way to do that is by subscribing to this blog!

If you found this blog handy, please use the Subscribe section at the top of the screen to join my newly created mailing list.  I hope to start delivering content via the mailing list soon. ?? 

The post Failing SQL Agent Jobs – Part 1 appeared first on .

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating