SQL Agent Job Shows Running for Days, but Not Really Running

  • I have two SQL Agent jobs that show idle in the Job Monitor but Job History shows them as still running.

    The history shows the single step in each as complete which tells me the query is finished. I am confident that the query is finished because the job is a nightly maintenance job that runs in a transaction and the table it locked would have caused many calls to the help desk if the table was still locked.

    Two different DBAs looked at it and have no idea what caused it or exactly how to remedy it. We already tried right-clicking on the job in Job monitor to stop the job and that option is greyed out.

    I checked msdb.dbo.sysjobhistory for both and the run_status is 1 so not positive what the Job History is looking at.

    Attached is a screenshot of the job history and job activity monitor.

  • Run sp_whoisactive and get some of the additional stuff it can show you and see if anything makes sense then.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Usually when this happens in my environment, some sort of network blip has killed a connection to another server (the jobs that tend to do this are jobs pointing to other servers than the server the job is actually on). So the connection on the remote server is dead, but the job server thinks the connection is still active because it never received an "end process" command or a success or failure indication.

    The only remedy is to A) verify that the spid is actually dead on the target server, and B) kill the job. In that order. If you try to kill a job that has an open / suspended SPID that isn't actually processing, the SPID will never close until the services are rebooted and killing the job first may actually lock the SPID for reasons we have yet to discover.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I do get what you are saying and seeing - have had the same experience before. It changed the icon in the view after running the job again. I think what was missing when I went through the earlier runs of the job was that it had no row for Job outcome (in the step name) but I can't remember. For that job, you can try looking at the same thing the Job History displays. It will execute the following for each job:

    exec msdb.dbo.sp_help_jobhistory

    @job_id = '<JobUniqueidentifier>',

    @mode='FULL'

    Pretty sure I just ran that to find what was missing. sp_help_jobhistory calls sp_help_jobhistory_full which is mostly looking at msdb.dbo.sysjobhistory

    Sue

  • Great info! I ran that on both jobs and found that they only have one entry from when it started. There is no (Job outcome) entry.

    I will check on the SPID and see if I can get it cleared up.

    Brandie, these two jobs didn't cross over to a linked server. If I find anything common as I continue to research this I will let you know.

    Thanks for the help!

  • jerry-621596 (9/15/2016)


    Great info! I ran that on both jobs and found that they only have one entry from when it started. There is no (Job outcome) entry.

    I will check on the SPID and see if I can get it cleared up.

    Brandie, these two jobs didn't cross over to a linked server. If I find anything common as I continue to research this I will let you know.

    Thanks for the help!

    Yup, that's what we saw. Glad that helped. We didn't have any orphaned spids or spids involved with the job. When no other jobs were scheduled to be running, there were no Agent spids running any jobs or any idle Agent spids other than generic refresher and alert engine. It just appeared to be a case where all of the job history wasn't written to the history table. In theory, this could happen on a shutdown with nowait (or pulling the power from the server) but the instance hadn't been shut down at all. I did check msdb.dbo.syssessions in case it was an issue where Agent somehow restarted but that wasn't the case either. You could check syssessions with the times of the job not having an outcome and to see if Agent was restarted around that time.

    Sue

  • jerry-621596 (9/15/2016)


    Brandie, these two jobs didn't cross over to a linked server. If I find anything common as I continue to research this I will let you know.

    Please do. And please know that you are not the only one bothered by phantom job activity.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • This happens for me occasionally, and I use the following script to update sysjobactivity to reflect that the last run ended, and then it calls procedure to stop the job.

    declare @job_name varchar(2000) = '<job name>';

    declare @job_id uniqueidentifier =

    (select job_id

    from msdb.dbo.sysjobs

    where name like @job_name);

    update msdb.dbo.sysjobactivity

    set stop_execution_date = getdate()

    where job_id = @job_id

    and start_execution_date =

    (select max(start_execution_date)

    from msdb.dbo.sysjobactivity

    where job_id = @job_id);

    exec sp_stop_job @job_id = @job_id;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I tried your script and get and error saying the request to stop the job was refused because the job is not currently running.

    Seems like the mystery runs deeper than I thought.

  • It may not be running...that was the case we saw before. It just didn't log everything to the history.

    Sue

  • jerry-621596 (9/15/2016)


    I tried your script and get and error saying the request to stop the job was refused because the job is not currently running.

    Seems like the mystery runs deeper than I thought.

    That's odd, at leat in my situation setting stop_execution_date = getdate() is what I have to do to prevent that warning when attempting to stop a zombie job run. There are a lot of moving pieces to the sql agent job scheduler, so I guess it depends.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • jerry-621596 (9/15/2016)


    I tried your script and get and error saying the request to stop the job was refused because the job is not currently running.

    Seems like the mystery runs deeper than I thought.

    I hate making this suggestion. I really do. But sometimes it's the only fix to phantom jobs.

    You may have to restart SQL Server Agent service.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply