SQL Agent Jobs running 2 instances at once

  • I'm having an issue which I'm struggling to find an answer to because the only search terms I can use are too generic.  Hopeful this forum will be able to help.

    I have a server instance running in the simplest of setups (ie no replication)  but I'm finding that between 2am and 3am some SQL Agent Jobs, but not all and not consistently the same ones, are running more than once.  I haven't set these jobs up to re run on failure (which is the no 1 solution I'm finding when trying to search this up, number 2 is a daylight savings issue but that doesn't make sense as it doesn't effect everything and doesn't happen everyday, it occurs approx. 4 times a week).  The following SQL is returning rows...

    select job_id,run_date,run_time,max(run_duration),min(run_duration) from sysjobhistory where step_id=0

    group by job_id,run_date,run_time having count(1)>1

    This is showing that the same job is starting more than once at the same time, the max and min run duration is interesting as the min duration is always around -95444####, ie it thinks it's finished before it started.  I can also see from the processes that are running that the jobs are definitely running twice, so it's not just an odd reporting issue.

    Has anyone suffered this same issue, my understanding is that the same job shouldn't be able to be run concurrently anyway?

    I'm running SQL Server 2016 (SP2) which was recently upgraded from SQL 2008 R2 (SP3).

     

  • Perhaps try removing/deleting all the schedules and re-adding them back to the jobs to see if this clears up.

  • Already tried that, I deleted and readded the schedules, which didn't work, and also deleted and readded the jobs which didn't work either(completely from scratch rather than using generate scripts just in case).

  • My understanding is similar to yours - SQL shouldn't let you run the same job twice at the exact same time.  I have tried doing this and it gives me an error.

    As a thought, do any  of the jobs start each other?  Or do you have a different server that is starting jobs on this server?

    I am just wondering if maybe something else is starting the jobs, even windows task scheduler?

    Not sure if you can do this,  but is it possible to disable one of the schedules on a job that runs twice and see if it still starts without a schedule?  Even just running it manually for a few days to see if it does start automatically on you.

    When the jobs are running duplicate, it is 2 different SPIDs for them but the same login, correct?  I am thinking here that it might be something weird with threading.

    Also, do you notice a pattern by looking at a larger time frame for the duplicates?  Like that job A runs twice on Monday or on the 13th or on the 2nd Friday of the month or anything like that?  I am wondering if maybe it isn't random, just difficult to see a pattern.

    Another thought - can you reproduce this on a test server?  That is, can you build up a test server and reproduce the problem?  If you can, that is a good starting point as you can schedule things and do a bunch of testing.

    Last thought - are you patched to the latest CU?  I am not aware of that being fixed in a CU, but it might be a bug that was fixed.  Likely not or I'd expect more people to be responding with "I had this happen too", but could be something specific about your schedule setup.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thanks for responding, no there are no other severs involved, nothing is running it from elsewhere, this is a new server set up, so we are thankfully able to eliminate any old code floating around, but it's also possible something else in the setup of the new server might be causing it (as you mention threading) but that said it seems unlikely, or at least it seems like a bug somewhere as it shouldn't be possible for the same job to run concurrently.

    On disabling a job, yep, tried that, with either the schedule or the job disabled nothing gets run, it seems as if the agent is just accidentally firing off the job twice when it's time to run it the once.

    It is 2 different SPIDs, 2 seperate transactions and they both usually run to completion, we only even picked up on it because one of the randomly effected jobs happened to be locked out by the other version of itself!!

    There doesn't appear to be any pattern other than it only effects jobs that are scheduled to run between 2am and 3am but as I mentioned before, it doesn't effect them on every night or effect the same ones each time.

    We also don't appear to have the same issue on our test server, with the same setup.

  • Found this post:

    https://www.sqlservercentral.com/forums/topic/sql-server-schedule-jobs-running-twice-at-the-same-time

    where they found that there was a conflict in 2 tables.  The recommendation was to install the latest patches, but the user fixed it by deleting the schedule and recreating it which you already did.

    If you script out the job, do you notice anything odd like that there are any duplicate subscriptions or anything?

    If this isn't a production ready machine, are there any patches it is missing?  Both on the SQL side and on the Windows side?

    Also, thinking about the daylight savings thing, do you have a lot of warnings or errors in the event log about the clock time being adjusted?  I am wondering if the clock on the machine is not keeping time well.  So lets say the job starts at 2:02:00 AM, system clock says it is 2:02:00 AM, job starts.  Windows makes a request out to the NTP server (network time protocol) and sees it is actually 2:01:58 AM, so it adjusts the system clock and 2 seconds later, it tries to start the job again and (strangely) it is successful.  This could explain why it is intermittent (sometimes the time is wrong, sometimes it isn't, or windows isn't always re-syncing at the same time).  This should be in your system log in the windows event log.  Offhand, I am not certain which service would need to be stopped to prevent  the auto-update, or if it is baked into Windows and you'd just need to block it in the firewall to have  short term fix.  Permanent fix, if this is the case, would be to figure out why the clock is drifting.  If it is a VM, it might be something simple like installing a patch or adjusting resources.  If it is a physical box, you may need to replace parts.  Although, I could be way out to lunch on this one.

    Speaking of the event log, any interesting events in the log around the time the duplicate jobs start that don't happen on other days?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply