SQL Agebt Job running during unexpected server reboot

  • We had a power failure and our server rebooted during a sql job that runs the Maintenance Backup package. (this is a classic case of disaster recovery that's not tested - we are clustered running mirroring, but the ups failed so the whole process went down ugly).

    Well my issue as the DBA is that the job showed that it was running when you viewed the job history, but in Job Activity Monitor, all the jobs were idle. I right clicked the job and said 'stop' for good measure, but i got the error saying it could not stop because it was not started. So i thought it was ok. Well the job is called by some pearl script to get started, and when that ran the next time, it did not start the job, thinking it was still started. So i manually started the job and it ran. Now when i look at job history, in the details for the day of the last run 2/8, i see step 1 from 2/7 and then step 1 from 2/8 listed. there is no job history from 2/7. So my questions that i could not find on the internet or BOL is, what happens to a job if it is running and the server crashes, and why would both job starts be rolled into one history log record?

    our production is now back on schedule, but this is driving me crazy

    thanks in advance

  • The job never completed, so there will be no history. The table that stores the info about the job is still indicating that the job is running. You can update that table to indicate the job is not running, and if you want, you can add some entries to the history.

    But currently, SQL is telling you, and indicating correctly, exactly what happened.

    Andrew SQLDBA

  • Well, the job has 2 steps. Step 1 completed before the reboot. So when you look at the sql job history, you always only see the steps that are completed, but the job has the green arrow saying that it's running.

    the job history said it was still running when it had died with the server reboot.

    To restate, the job history had the green arrow, but the job activity monitored said idle.

    i'm looking for documentation or something because i would expect a job failure or warning, not a job executing icon in the job history.

    another step that i did not put in the first message, is that we have a job to see if the backup job is running (why because the original guys who dev the system were pearl/unix guys and they put the start of the dbbackups in one of their jobs, but if it never started, we never knew). so that check job ran and said that hey the job isn't running.. but the pearl script tried to start the job and it said nope - already started.

    so obviously some sys table controls the green arrow and another controls the actual status cause i couldn't 'stop the job'.

    any documentation on this?

  • Only advise that I can give, is get rid of all the Pearl and Unix stuff, and use SQL scheduler. No need to monitor to see if a job is running, that is a waste. you can create a SSIS pacakge to do all of this for you. Perform the backup, monitor the running tasks, update a log table with the progress, and when completed, then fire off an email. If anything happen in the middle, fire off an email stating a problem.

    Never a good idea to mix platforms. you are only asking for problems. You will be much better off to keep all SQL work, within SQL

    Andrew SQLDBA

  • look at the books online for

    sysjobactivity

    http://msdn.microsoft.com/en-us/library/ms190484.aspx

    and sysjobhistory

    http://msdn.microsoft.com/en-us/library/ms174997.aspx

    They're updated by [msdb].[dbo].[sp_sqlagent_log_jobhistory]

    That gets run when SQL Agent finishes a step. When the step = 0 (the whole job)

    the sysjobactivity table is updated.

    I guess just read the T-SQL code from [msdb].[dbo].[sp_sqlagent_log_jobhistory]

    Thanks

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply