Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Monitoring for Non Existent Events Expand / Collapse
Author
Message
Posted Thursday, July 31, 2014 10:53 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: 2 days ago @ 3:20 PM
Points: 18,064, Visits: 16,099
dwilliscp (7/31/2014)
Thanks to SQL Sentry we have built on to our job step -- alert-- and can now check for jobs that run too long. But I do see one more hole... jobs calling code that is not well built, and it completes with out error, but does not do what you expected... I have had this come up from time to time.. the latest issue was bad data flowing into the database and was not picked up until the weekly maint schedule found that the data did not match the data type. One other time we had a trigger that got turned off.. and not turned back on...

David


That is such a pain. When that happens it usually involves implementing an additional process to verify success and alert if there is a smell of failure.

I'd rather add the extra checks and code to ensure less headache down the road.




Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server, MVP


SQL RNNR

Posting Performance Based Questions - Gail Shaw
Post #1598387
Posted Thursday, July 31, 2014 8:35 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:28 AM
Points: 7,179, Visits: 15,775
GoofyGuy (7/31/2014)
Steve Jones - SSC Editor (7/31/2014)
GoofyGuy (7/31/2014)
"A job that runs long or doesn't run at all can sting just as bad as one that fails."

What's the difference?

There is no difference.


Sure there is. A long running job might be stuck, but it's done some work. If you clear the issue, it may run quicker. Depending on the job, ETL or some check, it might not affect your day to day operations.

One that doesn't run is bad because you might not realize the event hasn't occurred. If there is no issue, like a corruption check, then it might not affect you, but certainly it could in the future. A failure of the same job would be indicative of a problem, at least it's likely.

These all can cause problems, but there certainly is a difference in many cases. Not all, but many.


All three cases represent failure to design and test properly. There is no difference, in my mind, from that perspective.


Perhaps - but the fallout of a partially completed job can be substantially harder to recover from, than, say, the job didn't run because someone disabled the scheduler.

Also - depending on the type of process you're dealing with, it may not be physically possible to test every single permutation, so - yes in some cases you might not be able to completely dummy-proof or fail-proof some jobs.


----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #1598553
Posted Friday, August 1, 2014 8:44 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, December 12, 2014 9:23 AM
Points: 206, Visits: 402
the difference between a proactive or reactive state.
- or perhaps - "Isn't it the users job to tell the DBA when a job did not finish?"


The more you are prepared, the less you need it.
Post #1598719
Posted Friday, August 1, 2014 9:53 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: 2 days ago @ 12:50 PM
Points: 2,500, Visits: 1,586
Andrew..Peterson (8/1/2014)
The more you are prepared, the less you need it."


Andrew, the last line in your post reminded me of something from years ago. For the jobs that failed we use to use checkpoint/restart features, such that we kept reference of state and when the job was restarted it resumed execution from that state/checkpoint. Using this technique, we were able to save a tremendous amount of time in those old Big Iron days.

Also in those jobs that failed to complete and just ran in loops, the last checkpoint may have been the one that caused a loop, wait, or other anomaly. If I remember rightly, and it has been a number of years, we could determine the state of the last correctly completed function or record, and then fool the checkpoint to restart just after the last successful process after the data error or other logic was fixed that caused the problem.

I wonder after reading this series of posts if the old technique of checkpoint restart has been lost, or forgotten.


Not all gray hairs are Dinosaurs!
Post #1598749
Posted Monday, August 4, 2014 9:26 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, December 18, 2014 1:21 PM
Points: 229, Visits: 650
Matt Miller wrote:

... depending on the type of process you're dealing with, it may not be physically possible to test every single permutation, so - yes in some cases you might not be able to completely dummy-proof or fail-proof some jobs.

Maybe not, but it's no excuse to bypass developing the appropriate test cases, either.

The time spent actually writing software should be almost vanishingly small, compared to the time expended on design up front and testing in back.
Post #1599306
Posted Monday, August 4, 2014 12:11 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:28 AM
Points: 7,179, Visits: 15,775
GoofyGuy (8/4/2014)
Matt Miller wrote:

... depending on the type of process you're dealing with, it may not be physically possible to test every single permutation, so - yes in some cases you might not be able to completely dummy-proof or fail-proof some jobs.

Maybe not, but it's no excuse to bypass developing the appropriate test cases, either.

The time spent actually writing software should be almost vanishingly small, compared to the time expended on design up front and testing in back.


I never said that it was, but there's a difference between having appropriate test case coverage and accounting for every single possible failure. This is where the old "Perfect is the enemy of Good" adage comes into play. You account for enough failure scenarios to meet your performance and functional specifications (usually you exceed the expected spec by some acceptable amount), and perhaps you document the other failure modes or at least detect that your success condition has not been met. Still - some failures may occur, even in well thought-out ones.


----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #1599360
Posted Monday, August 4, 2014 12:22 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, December 18, 2014 1:21 PM
Points: 229, Visits: 650
... there's a difference between having appropriate test case coverage and accounting for every single possible failure. This is where the old "Perfect is the enemy of Good" adage comes into play.

Absolutely, and I'm not disputing your opinion or the old adage.

I'm simply advocating taking software testing as far as it can reasonably be taken; all too frequently, testing isn't taken very far. Ditto for the thought put into software design.

It sounds as if we're saying the same thing and just talking past one another. In any event, I believe too much time is wasted in writing software, and in picking up the pieces when it fails - and not enough in up-front design and back-end testing.

Since we're bandying about old adages, here's a paraphrased one for you: if you love software and sausages, you shouldn't watch either being made.
Post #1599363
Posted Monday, August 4, 2014 2:16 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:28 AM
Points: 7,179, Visits: 15,775
GoofyGuy (8/4/2014)
[i]
Since we're bandying about old adages, here's a paraphrased one for you: if you love software and sausages, you shouldn't watch either being made.


Agreed - and both go well with beer


----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #1599392
Posted Monday, August 4, 2014 2:30 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, December 18, 2014 1:21 PM
Points: 229, Visits: 650
Matt - true, beer and software go together surprising well, but can have some pretty bad after-effects! It's amazing how much 'brilliant' software I've put together with the help of a few brews (or several!), and how poorly the software and I performed the next morning.
Post #1599396
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse