One of the most useful parts of SQL Server is the SQL Agent scheduler. Over the years I've used this subsystem to automate work and ease the administrative burden of running a SQL Server instance. At times I've even used my SQL Agent to fire off business reports and alerts to nontechnical people that might need to take some action based on the data in our system.
One common task that I see SQL Agent used for is to backup a database using some method (Ola's scripts, SQL Backup, etc.) and then delete the oldest backup. This is a common way of ensuring that you keep xx amount of backups around for your business. Often if the first step (the backup) fails, then the second step (delete backup files) doesn't run. There are times where the execution choices between the steps aren't configured or get changed, and you run into the situation that Paul Randall describes in issue #170 of the SQLskills newsletter. The backup step starts to fail, but the deletes keep running until you don't have any more backups stored.
How does this happen? It's surprisingly easy because most people don't really treat their SQL Agent scripts and code like other code. This often isn't tested well, and rarely tested across time and with unusual conditions. After all, who wants to muck with the time on a production server to be sure that your Agent job works across days? Who wants to force a backup to fail to see how the job handles an issue? Who wants to double check their code when BACKUP is fairly simple syntax and a few quick tests of the delete code works with text files renamed with .bak extensions?
Do you treat your SQL Agent jobs like the production code that they contain? You should. In fact, moving to a more reliable, repeatable, DevOps style environment means that any code in an Agent job needs to be version controlled, it needs to be tested, and it should be a part of some (hopefully, automated) deployment process that ensures that changes to the code are recorded and you are confident of which version of code is on your system.
SQL Agent is a powerful tool, but it's also one that should be treated like a production system. Downtime and simple errors from careless scripting shouldn't be tolerated. We should, and can, do better.