Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


A Failed Jobs Monitoring System


A Failed Jobs Monitoring System

Author
Message
Rudy Panigas
Rudy Panigas
SSC-Addicted
SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)

Group: General Forum Members
Points: 440 Visits: 1303
I agree that there are tools out there that can do most of the work. I say most because there is always something that the tool does not cover. For example my reporting system also analyzes the SQL error logs (my report show the errors that I need to look into or can use to troubleshoot) but apps I've seen don't. If all the apps did everything we need, then most of this site would not be needed. Just look at all the great scripts, process improvements and articles!

The other aspect is dollars. Not every DBA can spend $2,000.00 dollars per instance or per server to buy ABC program to help out. Generally they have a very limited budget and have to spend the money wisely.

I use several purchased tools to help my work (not going to mention them as I'm not in sales) but there is alway things that are custom to your company and you have to figure out a way to automate or at least change their process to make it easier on the DBA.

Rudy



aaron.bertrand
aaron.bertrand
Valued Member
Valued Member (61 reputation)Valued Member (61 reputation)Valued Member (61 reputation)Valued Member (61 reputation)Valued Member (61 reputation)Valued Member (61 reputation)Valued Member (61 reputation)Valued Member (61 reputation)

Group: General Forum Members
Points: 61 Visits: 695
Yes, it's true, not every tool will do everything you want. That is true of pretty much all software out there, whether you bought it, wrote it, bought the source code and adapted it, etc.

However I use Event Manager on all of my instances and there is nothing that I have needed to date that it doesn't do.

Also, analyzing SQL Server Error Logs seems to be a bit of a disparate process from being told that a job failed, and I don't know that it is something you should expect from a tool dedicated to monitoring SQL Server Agent. It sounds like something more along the lines of a SQL Server error monitor tool.

As for the $2,000 figure, well, you can certainly get tools like Event Manager for less than $2,000 per instance. And what I hate about these debates is that managers and even some IT folks and DBAs think that their time spent developing software is free. Surely the amount of time you spent writing your code was worth well more than $2,000 in opportunity cost that you could have devoted to other tasks, and so you have not saved any money really, just shifted the line item on the income statement from "3rd party software" to "employee salary." Especially if you end up charging overtime or off-hours compensation because you couldn't get your normal work done while you were re-inventing the wheel. :-)
DrewTheEngineer
DrewTheEngineer
Grasshopper
Grasshopper (17 reputation)Grasshopper (17 reputation)Grasshopper (17 reputation)Grasshopper (17 reputation)Grasshopper (17 reputation)Grasshopper (17 reputation)Grasshopper (17 reputation)Grasshopper (17 reputation)

Group: General Forum Members
Points: 17 Visits: 158
I feel you pain ... we have over 200 SQL Server 2000 instances that need to be monitored, and not just for SQL job failures.

The solution I went with works somewhat in the opposite direction as yours. Instead of setting up a repository server with linked servers to all the instances, I set up all the instances with a linked server to the repository server. I then configured all the instances as target servers to a master MSX server. The MSX server pushes out a SQL job to all the target servers and this SQL job collects data (including job history) and feeds it into the repository server. The repository server then processes the collected data and sends notifications as necessary. To make sure that all the target servers are actually sending their data to the repository server, they all update a time stamp as part of their collected data. I can then query the time stamps to make sure all the target servers are sending over thier data.


Kindest Regards,

DrewTheEngineer

Rudy Panigas
Rudy Panigas
SSC-Addicted
SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)

Group: General Forum Members
Points: 440 Visits: 1303
The reason for me to pull the information instead of pushing is basically that new servers can be added to the network and I don't have to install or setup anything on that server. By updating the table on my repository server, it will then automatically connect to it.

Nice to see all the different ideas. Different is good as long as it makes your job easier and better.

Rudy



thecosmictrickster@gmail.com
thecosmictrickster@gmail.com
Hall of Fame
Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)

Group: General Forum Members
Points: 3336 Visits: 932
For job monitoring, I just get each server to send an email (per job) when a job fails. DBMail on SQL205, SMTP stored proc for 2000.

As far as auditing goes, if you have a Service Desk, log an Incident for each job failure (or add an entry to an existing incident if it is a work in progress sort of thing) and put the resolution in there. You should be able to search for any historical incidents relating to a particular server easily enough. Having some sort of known problem repository helps as well (especially for those failures that only happen every 6 months or so).



Scott Duncan

MARCUS. Why dost thou laugh? It fits not with this hour.
TITUS. Why, I have not another tear to shed;
--Titus Andronicus, William Shakespeare

Timothy Ford-473880
Timothy Ford-473880
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1035 Visits: 446
Scott, do you really want an email every time a job fails on every SQL instance? Must have a lot of free time on your hands and space in your in-box. Avg. 20 jobs per instance * 80 instances * running N times per day = WHOA!

- Tim Ford, SQL Server MVP
http://www.sqlcruise.com
http://www.thesqlagentman.com
http://www.linkedin.com/in/timothyford
tjaybelt
tjaybelt
SSChasing Mays
SSChasing Mays (619 reputation)SSChasing Mays (619 reputation)SSChasing Mays (619 reputation)SSChasing Mays (619 reputation)SSChasing Mays (619 reputation)SSChasing Mays (619 reputation)SSChasing Mays (619 reputation)SSChasing Mays (619 reputation)

Group: General Forum Members
Points: 619 Visits: 470
I agree that writing it yourself vs buying software is a compelling argument, and people are on both sides of the camp. In my case, the buy it option was not an option. ever. Unfortunately.
So, instead of doing it in a lot more manual process, this is the solution that allows me to sit back and do other things, while i feel safe that the jobs are being monitored. Even if it cost a lot more for me to develop. And thats part of the reason i am sharing it. Why should you reinvent the wheel i just reinvented. Unless its for the self education on the process. Which was another selfish reason for writing it. Loads of baggage went into the system, and i just wanna share it.

I love that there have been so many different ideas shared here on this topic. its one close to my heart, and i love that so many options are out there. hopefully people will be albe to pick the best one for their shop.



thecosmictrickster@gmail.com
thecosmictrickster@gmail.com
Hall of Fame
Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)

Group: General Forum Members
Points: 3336 Visits: 932
Timothy Ford (2/6/2008)
Scott, do you really want an email every time a job fails on every SQL instance? Must have a lot of free time on your hands and space in your in-box. Avg. 20 jobs per instance * 80 instances * running N times per day = WHOA!



That's why I monitor by exception. BigGrin

I have very few job failures. What generally causes something to fail? Something changing. The systems are locked down and we have a rigorous (and improving) change process. For things like disk space, that's being monitored (DB growth rates as well) and flagged before backups & the like fail.

Granted, our systems are relatively straightforward as well, no complex replication scenarios (we do have replication), no flaky network links.

This may also change once we get a better centralised monitoring tool in place, where alerts can be sent to a console. Our current monitoring software doesn't handle that so well.



Scott Duncan

MARCUS. Why dost thou laugh? It fits not with this hour.
TITUS. Why, I have not another tear to shed;
--Titus Andronicus, William Shakespeare

Rudy Panigas
Rudy Panigas
SSC-Addicted
SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)SSC-Addicted (440 reputation)

Group: General Forum Members
Points: 440 Visits: 1303
I agree. I only monitor of the exceptions/failures. Don't need to know that a job ran successfully. But I have other reports that should all jobs successful or not. In the morning I just review the failed ones. Have an email alert don't really help me as we are not a 24x7 shop so the web page report (with SSRS) works great.

Rudy



Peter Waldweben
Peter Waldweben
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 39
Nice solution! However I got a message that sp_SQLSMTPmail does not exist. Downloaded it somewhere. Next error about '@vc parameter that does not match with this procedure'. What am I doing wrong? Thanks in advance. Peter
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search