I was catching up on work recently, reading the third installment of The 5 Worst Days in a DBA's life, starring The DBA Team. Someone had asked me if I enjoyed having Paul Randal (b | t) of SQLskills join them team. The piece had been edited and published while I was gone, and I hadn't had a chance to immerse myself in the adventure. I was anxious too read how Paul helped save the day.
It was a fun read, but one quote in the piece struck me. "A job that runs long or doesn't run at all can sting just as bad as one that fails." That's a quote from my character showcasing a situation that few people actually think about. However jobs that don't run or don't finish are situations that DBAs should be monitoring for.
So many of us adopt a set-it-and-forget-it mentality with our jobs. We assume that things will work, or fail, as we set them up. However it's easy to forget that there are other states we might find ourselves or our systems in that can cause issues.
Monitoring is critical to any well run system, but monitoring needs to be set up well. If we require that certain jobs run, we need to not only check for success or failure, but if the job has actually run and completed. It's easy to accidentally disable the wrong job and not notice. It's also entirely possible that a job gets stuck and doesn't complete.
If you're not watching for those other states, you might find yourself in a situation where you don't have backups and your job is on the line. However you probably won't have The DBA Team to call on.