Managing Jobs - Part 2

,

A couple weeks back I wrote Managing

Jobs - Part 1 as a both a brief introduction to some of the features of jobs

plus a reminder that as your business grows and changes, so must the way you

manage your jobs. This week I'd like to continue that by talking about some

other ideas that you might consider to help you manage them more effectively.

Let's start by revisiting a subject I covered in Worst

Practice - Not Having An Archive Plan For Each Table. I've found that unless

I design a plan to purge data at some point, it will grow unchecked forever,

which is usually about 3 years too long. Trying to solve that problem for

myself, I started by creating a category called "Cleanup", then

creating one job per database labeled 'DBName - Daily Cleanup'. Then I create

one or more steps that usually just have simple things like "delete from

orderdetails where orderdate < getdate() -365". If they are all plain

deletes, I just put them in one step. If I've got portions that have

dependencies (like orders vs orderdetails) I might put that in a separate step

to make it easier to see. I like to run these jobs daily so that the amount of

data being moved or removed is manageable - why delete 300,000 rows at the end

of the month when you can delete 10,000 per day? One job isn't always going to

be enough, you may well need a monthly or end of year type job. In any case,

using 'Cleanup' they're easy to spot when I need to see what is being deleted

when or need to add a new step to the job.

A big big item you need to think about is who gets notified when. Take a look

at your options:

I've gone back and forth on email notification, but the plan I'm using now is

to send the email to the person who is responsible for fixing the problem.

Sometimes it may be that they'll just ask me to run the job again, other times

the data is bad and must be fixed first. The alternative is you could have all

notifications come to you. I've tried this and it's not bad, but sometimes I'm

busy and no one can start working until I tell them it failed. Don't be afraid

to add operators and send them the notifications, it's easy enough to change if

they move on.

Paging can be useful, but should be used lightly. On my Nextel phone there

doesn't seem to be a way to say 'delete all queued messages', I have to read

every one! Reserve pages for really urgent problems. Same goes for net send - I

typically only enable this when I'm troubleshooting a problem and might not

realize I've gotten email if I'm working on something else. Net send is a lot

faster than a page usually.

Writing to the event log? I run 400 or more backups a day, when you look in

the log you scroll...and scroll. Leave the event log for real problems, rely on

the agent and the job history for managing your jobs. It's easier, you always

have access to it, your admin will thank you.

The automatic delete option is not used often, but it's a great tool when you

need it. I use maintenance plans for index rebuilds, sometimes I'll want to do a

quick run on a single db over lunch if it seems unusually slow. I set up a new

plan rather than altering my existing one, then modify the job to delete itself

when done - so I don't forget and let it run at lunch for a week or two.

Managing jobs is tough. If you get email every time a job runs, you'll soon

get used to them and stop paying attention - until they stop showing up...maybe.

Failure notification is better to me, but you are at the mercy of SQL Agent. Not

uncommon for a job to hang, so it hasn't run, but hasn't failed either! For time

sensitive jobs I typically run a second job that checks the history of the first

job looking to see if it completed in time. This works fine until the agent

itself fails. Then what?

Let me digress here a moment to say that not all jobs run in SQL. You may

have jobs that run the tape backup via the NT scheduler or some other scheduler.

Ideally you want to monitor ALL those jobs. Something to think about.

I like Greg Larsen's idea of a failed job report. That works well with the

idea of sending the failure notice to the person responsible for fixing it, yet

you can still monitor to make sure problems are resolved. Beyond that, I've been

thinking that what I really want (and have not yet built) is a process separate

from SQL that checks on all my jobs across servers. In particular I'd like to be

notified if a job varies in run time from the last x times it ran. If a job

usually takes 5 minutes and all of a sudden takes 20 minutes, I need to know and

figure out why. I think I could also use this to catch jobs that hang up and

never finish.

In a week or two I'll wrap this up with a discussion about how to code

complicated jobs (exe, dll, vs., or TSQL) and where to run them. Until then, I

hope you've enjoyed reading this and will take a minute to comment on the

article - positive or negative!

Rate

Share

Share

Rate