Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««1234»»»

A Failed Jobs Monitoring System Expand / Collapse
Author
Message
Posted Tuesday, February 05, 2008 2:14 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 12:04 PM
Points: 311, Visits: 1,080
I agree that there are tools out there that can do most of the work. I say most because there is always something that the tool does not cover. For example my reporting system also analyzes the SQL error logs (my report show the errors that I need to look into or can use to troubleshoot) but apps I've seen don't. If all the apps did everything we need, then most of this site would not be needed. Just look at all the great scripts, process improvements and articles!

The other aspect is dollars. Not every DBA can spend $2,000.00 dollars per instance or per server to buy ABC program to help out. Generally they have a very limited budget and have to spend the money wisely.

I use several purchased tools to help my work (not going to mention them as I'm not in sales) but there is alway things that are custom to your company and you have to figure out a way to automate or at least change their process to make it easier on the DBA.

Rudy



Post #451892
Posted Tuesday, February 05, 2008 4:07 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, April 02, 2014 1:38 PM
Points: 39, Visits: 564
Yes, it's true, not every tool will do everything you want. That is true of pretty much all software out there, whether you bought it, wrote it, bought the source code and adapted it, etc.

However I use Event Manager on all of my instances and there is nothing that I have needed to date that it doesn't do.

Also, analyzing SQL Server Error Logs seems to be a bit of a disparate process from being told that a job failed, and I don't know that it is something you should expect from a tool dedicated to monitoring SQL Server Agent. It sounds like something more along the lines of a SQL Server error monitor tool.

As for the $2,000 figure, well, you can certainly get tools like Event Manager for less than $2,000 per instance. And what I hate about these debates is that managers and even some IT folks and DBAs think that their time spent developing software is free. Surely the amount of time you spent writing your code was worth well more than $2,000 in opportunity cost that you could have devoted to other tasks, and so you have not saved any money really, just shifted the line item on the income statement from "3rd party software" to "employee salary." Especially if you end up charging overtime or off-hours compensation because you couldn't get your normal work done while you were re-inventing the wheel.
Post #451939
Posted Tuesday, February 05, 2008 4:30 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, March 09, 2011 10:38 AM
Points: 17, Visits: 158
I feel you pain ... we have over 200 SQL Server 2000 instances that need to be monitored, and not just for SQL job failures.

The solution I went with works somewhat in the opposite direction as yours. Instead of setting up a repository server with linked servers to all the instances, I set up all the instances with a linked server to the repository server. I then configured all the instances as target servers to a master MSX server. The MSX server pushes out a SQL job to all the target servers and this SQL job collects data (including job history) and feeds it into the repository server. The repository server then processes the collected data and sends notifications as necessary. To make sure that all the target servers are actually sending their data to the repository server, they all update a time stamp as part of their collected data. I can then query the time stamps to make sure all the target servers are sending over thier data.



Kindest Regards,

DrewTheEngineer

Post #451945
Posted Wednesday, February 06, 2008 8:02 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 12:04 PM
Points: 311, Visits: 1,080
The reason for me to pull the information instead of pushing is basically that new servers can be added to the network and I don't have to install or setup anything on that server. By updating the table on my repository server, it will then automatically connect to it.

Nice to see all the different ideas. Different is good as long as it makes your job easier and better.

Rudy



Post #452232
Posted Wednesday, February 06, 2008 1:37 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, April 14, 2014 7:17 PM
Points: 2,644, Visits: 809
For job monitoring, I just get each server to send an email (per job) when a job fails. DBMail on SQL205, SMTP stored proc for 2000.

As far as auditing goes, if you have a Service Desk, log an Incident for each job failure (or add an entry to an existing incident if it is a work in progress sort of thing) and put the resolution in there. You should be able to search for any historical incidents relating to a particular server easily enough. Having some sort of known problem repository helps as well (especially for those failures that only happen every 6 months or so).





Scott Duncan

MARCUS. Why dost thou laugh? It fits not with this hour.
TITUS. Why, I have not another tear to shed;
--Titus Andronicus, William Shakespeare
Post #452430
Posted Wednesday, February 06, 2008 1:43 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, March 25, 2014 7:55 AM
Points: 1,012, Visits: 440
Scott, do you really want an email every time a job fails on every SQL instance? Must have a lot of free time on your hands and space in your in-box. Avg. 20 jobs per instance * 80 instances * running N times per day = WHOA!



- Tim Ford, SQL Server MVP
http://www.sqlcruise.com
http://www.thesqlagentman.com
http://www.linkedin.com/in/timothyford
Post #452431
Posted Wednesday, February 06, 2008 1:54 PM


SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Wednesday, March 26, 2014 11:58 AM
Points: 614, Visits: 436
I agree that writing it yourself vs buying software is a compelling argument, and people are on both sides of the camp. In my case, the buy it option was not an option. ever. Unfortunately.
So, instead of doing it in a lot more manual process, this is the solution that allows me to sit back and do other things, while i feel safe that the jobs are being monitored. Even if it cost a lot more for me to develop. And thats part of the reason i am sharing it. Why should you reinvent the wheel i just reinvented. Unless its for the self education on the process. Which was another selfish reason for writing it. Loads of baggage went into the system, and i just wanna share it.

I love that there have been so many different ideas shared here on this topic. its one close to my heart, and i love that so many options are out there. hopefully people will be albe to pick the best one for their shop.



Post #452435
Posted Wednesday, February 06, 2008 3:25 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, April 14, 2014 7:17 PM
Points: 2,644, Visits: 809
Timothy Ford (2/6/2008)
Scott, do you really want an email every time a job fails on every SQL instance? Must have a lot of free time on your hands and space in your in-box. Avg. 20 jobs per instance * 80 instances * running N times per day = WHOA!



That's why I monitor by exception. :D

I have very few job failures. What generally causes something to fail? Something changing. The systems are locked down and we have a rigorous (and improving) change process. For things like disk space, that's being monitored (DB growth rates as well) and flagged before backups & the like fail.

Granted, our systems are relatively straightforward as well, no complex replication scenarios (we do have replication), no flaky network links.

This may also change once we get a better centralised monitoring tool in place, where alerts can be sent to a console. Our current monitoring software doesn't handle that so well.




Scott Duncan

MARCUS. Why dost thou laugh? It fits not with this hour.
TITUS. Why, I have not another tear to shed;
--Titus Andronicus, William Shakespeare
Post #452469
Posted Thursday, February 07, 2008 7:02 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 12:04 PM
Points: 311, Visits: 1,080
I agree. I only monitor of the exceptions/failures. Don't need to know that a job ran successfully. But I have other reports that should all jobs successful or not. In the morning I just review the failed ones. Have an email alert don't really help me as we are not a 24x7 shop so the web page report (with SSRS) works great.

Rudy



Post #452716
Posted Thursday, February 07, 2008 9:22 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, February 23, 2011 7:46 AM
Points: 4, Visits: 39
Nice solution! However I got a message that sp_SQLSMTPmail does not exist. Downloaded it somewhere. Next error about '@vc parameter that does not match with this procedure'. What am I doing wrong? Thanks in advance. Peter
Post #452808
« Prev Topic | Next Topic »

Add to briefcase ««1234»»»

Permissions Expand / Collapse