SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Managing Jobs - Part 2

By Andy Warren,

A couple weeks back I wrote Managing Jobs - Part 1 as a both a brief introduction to some of the features of jobs plus a reminder that as your business grows and changes, so must the way you manage your jobs. This week I'd like to continue that by talking about some other ideas that you might consider to help you manage them more effectively.

Let's start by revisiting a subject I covered in Worst Practice - Not Having An Archive Plan For Each Table. I've found that unless I design a plan to purge data at some point, it will grow unchecked forever, which is usually about 3 years too long. Trying to solve that problem for myself, I started by creating a category called "Cleanup", then creating one job per database labeled 'DBName - Daily Cleanup'. Then I create one or more steps that usually just have simple things like "delete from orderdetails where orderdate < getdate() -365". If they are all plain deletes, I just put them in one step. If I've got portions that have dependencies (like orders vs orderdetails) I might put that in a separate step to make it easier to see. I like to run these jobs daily so that the amount of data being moved or removed is manageable - why delete 300,000 rows at the end of the month when you can delete 10,000 per day? One job isn't always going to be enough, you may well need a monthly or end of year type job. In any case, using 'Cleanup' they're easy to spot when I need to see what is being deleted when or need to add a new step to the job.

A big big item you need to think about is who gets notified when. Take a look at your options:

I've gone back and forth on email notification, but the plan I'm using now is to send the email to the person who is responsible for fixing the problem. Sometimes it may be that they'll just ask me to run the job again, other times the data is bad and must be fixed first. The alternative is you could have all notifications come to you. I've tried this and it's not bad, but sometimes I'm busy and no one can start working until I tell them it failed. Don't be afraid to add operators and send them the notifications, it's easy enough to change if they move on.

Paging can be useful, but should be used lightly. On my Nextel phone there doesn't seem to be a way to say 'delete all queued messages', I have to read every one! Reserve pages for really urgent problems. Same goes for net send - I typically only enable this when I'm troubleshooting a problem and might not realize I've gotten email if I'm working on something else. Net send is a lot faster than a page usually.

Writing to the event log? I run 400 or more backups a day, when you look in the log you scroll...and scroll. Leave the event log for real problems, rely on the agent and the job history for managing your jobs. It's easier, you always have access to it, your admin will thank you.

The automatic delete option is not used often, but it's a great tool when you need it. I use maintenance plans for index rebuilds, sometimes I'll want to do a quick run on a single db over lunch if it seems unusually slow. I set up a new plan rather than altering my existing one, then modify the job to delete itself when done - so I don't forget and let it run at lunch for a week or two.

Managing jobs is tough. If you get email every time a job runs, you'll soon get used to them and stop paying attention - until they stop showing up...maybe. Failure notification is better to me, but you are at the mercy of SQL Agent. Not uncommon for a job to hang, so it hasn't run, but hasn't failed either! For time sensitive jobs I typically run a second job that checks the history of the first job looking to see if it completed in time. This works fine until the agent itself fails. Then what?

Let me digress here a moment to say that not all jobs run in SQL. You may have jobs that run the tape backup via the NT scheduler or some other scheduler. Ideally you want to monitor ALL those jobs. Something to think about.

I like Greg Larsen's idea of a failed job report. That works well with the idea of sending the failure notice to the person responsible for fixing it, yet you can still monitor to make sure problems are resolved. Beyond that, I've been thinking that what I really want (and have not yet built) is a process separate from SQL that checks on all my jobs across servers. In particular I'd like to be notified if a job varies in run time from the last x times it ran. If a job usually takes 5 minutes and all of a sudden takes 20 minutes, I need to know and figure out why. I think I could also use this to catch jobs that hang up and never finish.

In a week or two I'll wrap this up with a discussion about how to code complicated jobs (exe, dll, vs., or TSQL) and where to run them. Until then, I hope you've enjoyed reading this and will take a minute to comment on the article - positive or negative!

Total article views: 8349 | Views in the last 30 days: 5
Related Articles

Deleting conenction manager problem

Deleted connection manager ID stays in the .dtsx file ... !



Delete Query


What's a Good Manager

What makes a good manager? In response to a recent thread, Janet Wong takes us through some of the g...


Managing Jobs Using TSQL

We've had a good run of jobs about job management lately, Randy sent us another one that looks at th...



How to store or email the deleted data for monitoring