A Failed Jobs Monitoring System

Question

A Failed Jobs Monitoring System

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply

sqlagentman SSCarpal Tunnel Points: 4484 More actions · Answer 1

Scott, do you really want an email every time a job fails on every SQL instance? Must have a lot of free time on your hands and space in your in-box. Avg. 20 jobs per instance * 80 instances * running N times per day = WHOA!

- Tim Ford, SQL Server MVPhttp://www.sqlcruise.comhttp://www.thesqlagentman.com http://www.linkedin.com/in/timothyford

tjaybelt SSCrazy Points: 2604 More actions · Answer 2

I agree that writing it yourself vs buying software is a compelling argument, and people are on both sides of the camp. In my case, the buy it option was not an option. ever. Unfortunately.

So, instead of doing it in a lot more manual process, this is the solution that allows me to sit back and do other things, while i feel safe that the jobs are being monitored. Even if it cost a lot more for me to develop. And thats part of the reason i am sharing it. Why should you reinvent the wheel i just reinvented. Unless its for the self education on the process. Which was another selfish reason for writing it. Loads of baggage went into the system, and i just wanna share it.

I love that there have been so many different ideas shared here on this topic. its one close to my heart, and i love that so many options are out there. hopefully people will be albe to pick the best one for their shop.

thecosmictrickster@gmail.com SSChampion Points: 10386 More actions · Answer 3

Timothy Ford (2/6/2008)
Scott, do you really want an email every time a job fails on every SQL instance? Must have a lot of free time on your hands and space in your in-box. Avg. 20 jobs per instance * 80 instances * running N times per day = WHOA!

That's why I monitor by exception. 😀

I have very few job failures. What generally causes something to fail? Something changing. The systems are locked down and we have a rigorous (and improving) change process. For things like disk space, that's being monitored (DB growth rates as well) and flagged before backups & the like fail.

Granted, our systems are relatively straightforward as well, no complex replication scenarios (we do have replication), no flaky network links.

This may also change once we get a better centralised monitoring tool in place, where alerts can be sent to a console. Our current monitoring software doesn't handle that so well.

Scott Duncan

MARCUS. Why dost thou laugh? It fits not with this hour.
TITUS. Why, I have not another tear to shed;
--Titus Andronicus, William Shakespeare

Rudy Panigas SSChampion Points: 10702 More actions · Answer 4

I agree. I only monitor of the exceptions/failures. Don't need to know that a job ran successfully. But I have other reports that should all jobs successful or not. In the morning I just review the failed ones. Have an email alert don't really help me as we are not a 24x7 shop so the web page report (with SSRS) works great.

Rudy

Peter Waldweben SSC Enthusiast Points: 154 More actions · Answer 5

Nice solution! However I got a message that sp_SQLSMTPmail does not exist. Downloaded it somewhere. Next error about '@vc parameter that does not match with this procedure'. What am I doing wrong? Thanks in advance. Peter

stew_b SSC-Addicted Points: 490 More actions · Answer 6

This is a great article! How can I take it to the next level - I have a multistep job that is executing some but not all of the steps. The SQL logs/tables report job success, but I suspect now it is become at least SOME part of the job succeeded. Without parsing each step out into seperate jobs, is there way to get detailed information about the success or failure of each step within the job?

Strommy SSCrazy Points: 2386 More actions · Answer 7

Interesting solutions. For large environments, enterprise monitoring software is very useful. Our environment has 200+ SQL instances and we use NetIQ. We finished up a project that monitors all types of problems and will create a service ticket based on the monitoring events. The best part is the set-up for new servers. Install the client, add the server to the proper monitoring group in NetIQ, and we can start getting monitoring events from SQL (including failed and long-running jobs).

Thanks,

Eric

tjaybelt SSCrazy Points: 2604 More actions · Answer 8

that is the next step. its difficult to gather specific info fom the job steps, cause they dont key into the job, just seems they are ordered by time, and linked to a job... so the organization is a bit lacking. step 0 is the job report. other steps could fail. But finding the specific message that failed, and specific step, may be difficult, if the job failes a lot of times in a row...

Im struggling with piecing this together now.

thecosmictrickster@gmail.com SSChampion Points: 10386 More actions · Answer 9

@Strommy: In my last job we used NetIQ. A lot easier than Tivoli. You just have to watch out for adjusting child jobs - any changes to the parent job will overwrite it. A gotcha for people who don't watch what view they are in on the console.

@stew_b: A possible solution is to have a step between each to handle failures. Like so:

Step 1: Step1Tasks onsuccess goto step 3, onfail goto next

Step 2: Report failure of step 1 onsuccess goto next, onfail goto next

Step 3: Step3Tasks onsuccess quit with success, onfail goto next

Step 4: Report failure of step 3 onsuccess quit with success, onfail quit with fail

It depends what your steps are doing and whether you can incorporate error-reporting within them (e.g. RAISERROR). Failing that, use the above structure with e.g. RAISERROR('Step x failed', 16, 1) WITH LOG and check the ERRORLOG (manually or automate it) or get it to send an email (you may get spammed, watch out for that!).

Scott Duncan

MARCUS. Why dost thou laugh? It fits not with this hour.
TITUS. Why, I have not another tear to shed;
--Titus Andronicus, William Shakespeare

Strommy SSCrazy Points: 2386 More actions · Answer 10

Good points about the job failure logic. In general, if we ever have a job that fails and then sends a failure notification, we have that job step exit failed, regardless of success or failure of that step. As an aside, if we have a ticketing system monitoring job failures, we don't need the notification step. That was part of our goals for integrating with NetIQ.

Thanks,

Eric

stew_b SSC-Addicted Points: 490 More actions · Answer 11

stew_b

SSC-Addicted

Points: 490

February 12, 2008 at 8:48 am

#777381

Great idea Scott - Thank you!

sqldba-294117 Hall of Fame Points: 3486 More actions · Answer 12

I came across this site, so i thought let me share with forum.

SQL job manager

View and manage SQL Server Jobs

Free for a limited time

http://www.idera.com/Products/SQLjobmanager/

FreeHansje SSChampion Points: 11751 More actions · Answer 13

If you're already setting up a separate server, why not use MOM?

I've looked into that and cannot find a rule to monitor failed jobs. If I have to build this rule myself, then why not go for a tailormade solution anyway?

Personally, this seems like a lot of heavy lifting and manual labor, and the effort involved seems to outweigh the costs of commercial tools already available.

This is correct, however, have you even looked into the extra load some of these tools add to a server? Dunno about Sentry's Eventmanager, but others where a firm no-no after a day or so of testing. Idera JM was 1 of those.

For job monitoring, I just get each server to send an email (per job) when a job fails.

Yes, nice 1, we have more then a 1000 jobs running all over the place, and alas, not only the DBA can go there and add/delete jobs. I prefer a central monitoring system.

However, I miss the job to run this setup. I can figure it out, but do I miss something here? It's not part of the download?

Greetz,
Hans Brouwer

Rudy Panigas SSChampion Points: 10702 More actions · Answer 14

Using MOM. Yes this is a good tool but not everyone has the budgets to get MOM. This is the situation at my location, although there would be money available now, but I think this is working much better.

We use this tool/setup to also collect additional information from not just SQL servers but also from Oracle servers and MySQL servers as well. Not certain if MOM does Oracle/MySQL but so far this setup is working well.

Just go to show that there are many ways to get things done and you can bet a DBA/Develop will figure something out.

Thanks for the comments,

Rudy

darryl_marshall SSC Eights! Points: 805 More actions · Answer 15

We use a similar system to this to monitor all our servers. Had a look at Idera SQL Job Manager but it ran very slowly on my machine and we needed more information about other aspects of the servers than just the jobs. At the moment lots of details on disk space, db size, job outcomes and a number of logging tables are pulled back to a central server for reporting against. SQL Reporting service is then used to report against the data and send out email reports as needed.