Snapshot Running Automatically Without a Schedule?

  • Hi Everyone

    I'm hoping someone will be able to point me in the right direction with this. We've recently started having an issue with some snapshots running on one of our SQL servers (2014), whereby they appear to be running twice within seconds of each other, and subsequently one shows as a successful completion, but the other fails (due to the same agent already running) which then causes the repldata folder contents to either be removed or not store at all, meaning the subscription subsequently fails due to there not being any data. Unfortunately, there doesn't seem to be a pattern in the sense that this is every day. One day all three of our snapshots will suffer, and the next day just one will have the issue.

    What puzzled me whilst investigating, is that when this 'double-run' occurs, there only ever appears to be one entry in the job history, both via the GUI view and in the sysjobhistory table. In the replication monitor's snapshot agent history though, it shows two entries; one that is successful and one that has failed.

    To try and counter the issue, thinking / assuming that there may just be a ghost processes running, was to re-start SQL server, and then eventually restart the physical server. This wasn't successful, so I deleted the publications completely and re-created them, thinking that if there was an issue with the existing agents/jobs/schedules this would solve itself upon removal and the newly created publications would then run as intended - once.

    Unfortunately this failed to resolve the issue either. What I did notice throughout my investigations, was that the 'double-run' would not occur if I were to run the snapshot manually (either by starting the SQL job, or by starting the snapshot agent), even if I started it within seconds of the automatic 'double-run' finishing. I was also able to get a successful single run of the snapshots via a schedule if I changed the schedule to an alternative time. Which leads me to my next part.

    Even though I have changed the schedule to a different time for one of the snapshots (previously 0600, now 0030), the snapshot still runs twice, however not as a 'double-run' but once at 0030 (the new time) and once at 0600 (the old time). Again, only one entry in the SQL job history view (for the 0030 run), but the snapshot agent history shows both 0030 and 0600. I have checked the MSsnapshot_agents, and sysschedules tables and there is only one entry for the snapshot. So I'm incredibly puzzled.

    It would appear that my issue is the same / similar to the one described at this link, which I came upon whilst trying to find a resolution to my own issue but as you'll see there isn't a confirmed fix on this link, so I am posting more for reference.

    I guess my overarching question, is if it is possible for another schedule to exist somewhere else, or another area that has errantly contained two identical records for a schedule and only one is being updated / deleted when I do so to the actual schedule? Or if not, where can I next look to diagnose / resolve the issue of what appears to be a ghost schedule for my snapshots?

    Below are screen snips of the snapshot agent history, and the sql job history. The 0030 run as mentioned was the new scheduled time. The job history does not list the 0600 run at all.

    The 15/07 and 16/07 runs at 0600 were based on the previous schedule.

    Thanks for any comments / assistance in advance.

    Shane

  • When you checked sysschedules, how do you know that only one schedule was for the snapshot agent job? That table doesn't reference any job as schedules can be shared and jobs can have multiple schedules. Typically you would want to query sysjobschedules to find schedules for a job. You can also just check the job itself by going to properties and checking the schedules.
    You would probably want to check MSsnapshot_history also.

    You did not mention what type of replication this is - is it snapshot replication? Were any databases involved in replication (including msdb) restored to other environments?

    Sue

  • Sue_H - Monday, July 17, 2017 8:37 AM

    When you checked sysschedules, how do you know that only one schedule was for the snapshot agent job? That table doesn't reference any job as schedules can be shared and jobs can have multiple schedules. Typically you would want to query sysjobschedules to find schedules for a job. You can also just check the job itself by going to properties and checking the schedules.
    You would probably want to check MSsnapshot_history also.

    You did not mention what type of replication this is - is it snapshot replication? Were any databases involved in replication (including msdb) restored to other environments?

    Sue

    Hi Sue, thanks for the reply. It was indeed snapshot replication.

    I probably got ahead of myself when I mentioned sysschedules; as you said I did mean sysjobschedules after ensuring the job_id's matched between distribution..MSsnapshot_agents and msdb..sysjobs.
    Interestingly, the MSsnapshot_history table lists the so called "ghost" snapshots which I guess is to be expected as that matches to the Snapshot Agent History window.

    As it happens, you're response is quite timely as I think that I may have solved it (I'll know for sure tomorrow morning).

    Effectively, we had another machine that still had old snapshot jobs that referenced this server. I had stopped the agent service on this server anyway but it seems that someone had restarted at some point in the last couple of weeks (when the issue started). Whilst the publications were gone from this server, it appears I'd not spotted the orphaned jobs still present. It certainly explains why the repldata folder wasn't being populated despite a successful snapshot being reported (the files were on this rogue server's local drive), and also makes sense why the Snapshot Agent History window was listing the snapshots, but not the SQL Agent Job History window, and the fact that the snapshot had been running at the same time / on its' previously defined schedule time.

    I'll confirm in response to this tomorrow or over the next few days once I've monitored this following the removal of the other server and mark as answer should what I've done indeed be the fix.

    Thanks
    Shane

  • SQLsloth - Monday, July 17, 2017 8:55 AM

    Sue_H - Monday, July 17, 2017 8:37 AM

    When you checked sysschedules, how do you know that only one schedule was for the snapshot agent job? That table doesn't reference any job as schedules can be shared and jobs can have multiple schedules. Typically you would want to query sysjobschedules to find schedules for a job. You can also just check the job itself by going to properties and checking the schedules.
    You would probably want to check MSsnapshot_history also.

    You did not mention what type of replication this is - is it snapshot replication? Were any databases involved in replication (including msdb) restored to other environments?

    Sue

    Hi Sue, thanks for the reply. It was indeed snapshot replication.

    I probably got ahead of myself when I mentioned sysschedules; as you said I did mean sysjobschedules after ensuring the job_id's matched between distribution..MSsnapshot_agents and msdb..sysjobs.
    Interestingly, the MSsnapshot_history table lists the so called "ghost" snapshots which I guess is to be expected as that matches to the Snapshot Agent History window.

    As it happens, you're response is quite timely as I think that I may have solved it (I'll know for sure tomorrow morning).

    Effectively, we had another machine that still had old snapshot jobs that referenced this server. I had stopped the agent service on this server anyway but it seems that someone had restarted at some point in the last couple of weeks (when the issue started). Whilst the publications were gone from this server, it appears I'd not spotted the orphaned jobs still present. It certainly explains why the repldata folder wasn't being populated despite a successful snapshot being reported (the files were on this rogue server's local drive), and also makes sense why the Snapshot Agent History window was listing the snapshots, but not the SQL Agent Job History window, and the fact that the snapshot had been running at the same time / on its' previously defined schedule time.

    I'll confirm in response to this tomorrow or over the next few days once I've monitored this following the removal of the other server and mark as answer should what I've done indeed be the fix.

    Thanks
    Shane

    Thanks for the info - please keep us posted.
    If the other server have the databases still accessible, you could check the job histories on that other server - you can check the sysjobhistory table without starting agent. But it certainly seems like the other server/jobs is the cause.

    Sue

  • Sue_H - Monday, July 17, 2017 10:13 AM

    SQLsloth - Monday, July 17, 2017 8:55 AM

    Sue_H - Monday, July 17, 2017 8:37 AM

    When you checked sysschedules, how do you know that only one schedule was for the snapshot agent job? That table doesn't reference any job as schedules can be shared and jobs can have multiple schedules. Typically you would want to query sysjobschedules to find schedules for a job. You can also just check the job itself by going to properties and checking the schedules.
    You would probably want to check MSsnapshot_history also.

    You did not mention what type of replication this is - is it snapshot replication? Were any databases involved in replication (including msdb) restored to other environments?

    Sue

    Hi Sue, thanks for the reply. It was indeed snapshot replication.

    I probably got ahead of myself when I mentioned sysschedules; as you said I did mean sysjobschedules after ensuring the job_id's matched between distribution..MSsnapshot_agents and msdb..sysjobs.
    Interestingly, the MSsnapshot_history table lists the so called "ghost" snapshots which I guess is to be expected as that matches to the Snapshot Agent History window.

    As it happens, you're response is quite timely as I think that I may have solved it (I'll know for sure tomorrow morning).

    Effectively, we had another machine that still had old snapshot jobs that referenced this server. I had stopped the agent service on this server anyway but it seems that someone had restarted at some point in the last couple of weeks (when the issue started). Whilst the publications were gone from this server, it appears I'd not spotted the orphaned jobs still present. It certainly explains why the repldata folder wasn't being populated despite a successful snapshot being reported (the files were on this rogue server's local drive), and also makes sense why the Snapshot Agent History window was listing the snapshots, but not the SQL Agent Job History window, and the fact that the snapshot had been running at the same time / on its' previously defined schedule time.

    I'll confirm in response to this tomorrow or over the next few days once I've monitored this following the removal of the other server and mark as answer should what I've done indeed be the fix.

    Thanks
    Shane

    Thanks for the info - please keep us posted.
    If the other server have the databases still accessible, you could check the job histories on that other server - you can check the sysjobhistory table without starting agent. But it certainly seems like the other server/jobs is the cause.

    Sue

    Sue, apologies for the delay. All working fine now.

    Thanks
    Shane

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply