• We had a situation where a job failed for a long time without anyone realizing because it had been set up to report success even when certain individual steps failed. I've been frustrated by the limitations of the built-in failure notification mechanism in this regard, so I decided to put an update trigger on sysjobsteps to catch every time a job step completes abnormally. The trigger executes xp_smtp_sendmail, which is a well-known 3rd party SMTP xproc. xp_smtp_sendmail has good support for attachments, so I set up the trigger to attach the job log file to the notification message that is sent.

    Putting triggers on system tables is almost always a VERY BAD IDEA, but it seems to work OK in this particular case. There's a single column that records the status of the step and whether its last run failed or was successful or was cancelled, etc. You just need to check in the trigger what's being written to this column.

    Making sure the SQL Agent is always running is a big concern, as is monitoring when a job hangs and takes much longer than usual. Interested to read your thoughts on these issues.

    E. Titus