Live DB Backup Maintenance Plan Hangs

Question

Live DB Backup Maintenance Plan Hangs

bvanhorn

SSC Veteran

Points: 202
More actions
November 16, 2017 at 10:02 am

#341382

We have been having problems with our live DB Backup maintenance plan on SQL Server 2008 R2. Normally this process takes 2 minutes, but recently it seems to be getting hung up. There were 3 recent instances were it took over 3 hours, 12 hours, and 12 hours to complete, but it did actually complete. After a server restart, it was fine for about a week, but then the problem re-occurred. The live DB backup plan kicks off at 7 PM, and I found it still running at 9 AM the following morning, at which time I terminated the process as it was causing other problems. As soon as I terminated the process, the other problems were alleviated. I tried manually starting the live DB backup maint plan later that day, but then terminated the process when it had not finished in 10 minutes. After a restart of the server, it finished in 2 minutes. The database is for a small company ERP system and is only about 3GB. The log files are backed up every 2 hours, and no slowness or problems have occurred with the log file backups. The live DB maintenance plan includes several sub steps, check database, reorganize index, shrink database, backup database, and maint cleanup task (delete old backups). I was able to drill down, and it appears that the reorganize index task is the part that has been hanging, at least during the 3 instances that completed extremely slow. This SQL Server database has been running for about 5 years, and no recent configuration changes have occurred.
My position with the company is Controller, and we have no internal MIS staff, so please keep my limited knowledge level in mind for any suggestiions offered.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Sue_H SSC Guru Points: 90861 More actions · Answer 1

I'm impressed with the fact that you got that much figured out - kudos to you!
Maintenance plans can really be a pain sometimes and they can be hard to debug and figure out the issues. And they can just get flaky like this for unknown reasons. It just happens if that's any consolation.
How did you drill down and figure out it was the index task?
If you open up the maintenance plan to modify and look at the tasks and right click on the index task select edit which of those options are selected?
And one other thing I can think of that might help in verifying if this is the index rebuild screwing things up (and you may already have it in place) but is there any logging for the maintenance plan? I don't use Maintenance plans but I thought the logging included times of what tasks were being executed. That would show which task or tasks or taking so long. To get to that logging piece to see if it exists or to add logging, when you have the maintenance plan open to modify, on the top toolbar for the maintenance plan, just to the right of the Manage Connections, there is some weird looking icon that looks like two pieces of paper. If you hover your mouse over it the tooltip says Reporting and Logging. If it is set to generate reports, it show the directory where the reports would be. The default is something like:
Program Files\Microsoft SQL Server\MSSQL<version number><instance\MSSQL\Log
You would create the logging using that same thing, just select generate reports and towards the bottom of that screen, select Log Extended Information.

It's looks like it's blocking - that's why the other problems were alleviated when you killed the process. But it could also be blocked. Being that it's not all the time, are there any other jobs that might be running on SQL Server during the same time? Job Activity Monitor and the Next run date time might help to see those.
And are there any other processes running on that server during those times? So if you know when the last date and time what, you could look in the event logs on the server where SQL is and check the event logs during that time. You may find some issues in there. Check the Application and System event logs.

And then in the meantime, I'm still trying to figure out what scripts might help you track down the issues without having to go through too many hoops.
And ask any questions about whatever I just wrote. You are definitely in a tough spot so ask whatever you want.

Sue

Sue_H SSC Guru Points: 90861 More actions · Answer 2

And duh on my part - did you check the SQL Server error logs when this has happened or are they available from the last time it happened? That's usually the first thing you want to check when you have problems with SQL Server.

Sue

bvanhorn SSC Veteran Points: 202 More actions · Answer 3

I figured out it was the reorganize index task causing the issue by right clicking on the maintenance plan and looking at the history, which I believe is just another way of viewing logs. You can drill down on the history to see the subtasks, and the reorganize index is the subtask that took hours on all three days the maintenance plan finished but ran very slow.

The reorganize index task covers tables and views for one specific database and compact large objects is selected.

I turned on the option to log extended information as you suggested, but it was not on previously.

The issue we have had when the reorganize index has been hung is that heavy MS Access Queries to the database error off, while light queries continue to run, so I think it is more of an issue of slowing down the database than locks.

I did not see any errors in the SQL Server error logs or application or system logs.

I am wondering if I should try a one-time rebuild of the indexes instead of just using reorganize indexes to see if that might resolve the problem?

Sue_H SSC Guru Points: 90861 More actions · Answer 4

bvanhorn - Thursday, November 16, 2017 4:59 PM
I figured out it was the reorganize index task causing the issue by right clicking on the maintenance plan and looking at the history, which I believe is just another way of viewing logs. You can drill down on the history to see the subtasks, and the reorganize index is the subtask that took hours on all three days the maintenance plan finished but ran very slow.
The reorganize index task covers tables and views for one specific database and compact large objects is selected.
I turned on the option to log extended information as you suggested, but it was not on previously.
The issue we have had when the reorganize index has been hung is that heavy MS Access Queries to the database error off, while light queries continue to run, so I think it is more of an issue of slowing down the database than locks.
I did not see any errors in the SQL Server error logs or application or system logs.
I am wondering if I should try a one-time rebuild of the indexes instead of just using reorganize indexes to see if that might resolve the problem?

Thanks for posting that additional information.
The long running query may need to take a lock on the entire table so it waits while other locks are running through the table, etc...that type of thing and the flip side where the queries wait on the long running query to let go of it's locks. And it can depend on what queries are run as there are different types of locks and some work fine with each other, others are considered incompatible so are waiting for other locks to be released.
You could try to rebuild the indexes on the tables that are the largest and most active - it's may not help anything though.
The shrink step is a bit concerning as it's something you don't want to do regularly but I don't want to change anything that could impact things for you and create a new issue you need to try to figure out.

Is there a time when the things in Access aren't as active or aren't running long queries? It may help to change the time of the job if there is a better time window for it to run in

Sue

bvanhorn SSC Veteran Points: 202 More actions · Answer 5

We have light ERP transactional activity all night long, and sporadic heavy MS access queries/reports being run during the night (a heavy query here is one that may take a minute or so to run), so there is not really any better time to run the live DB backup plan. The live DB backup has to run before the file server backup. The file server backup starts at 9 PM, so should not interfere with the live DB backup. There is no other heavy server usage other than the mentioned items. It is possible that an ERP transaction was attempted during the normally brief reorganize index causing the problem, but this has not come up once during the previous 5 years with the database and operations similarly configured. The heavy queries seemed to time-out, and light queries ran fine, so I don't think it was locks affecting the queries, I think the database was running slow while the reorganize index was hung. The first time the slow reorganize came up, it had the same problem 3 days in a row, before the server was restarted, but did actually finish all 3 days, so this also makes me think it is not locks causing the problem. Also, we have no other issues with record locks on this database.

The shrink database step has always been included in our daily maintenance plan, and these plans were set up by the consultant that helped us convert our ERP database to SQL Server 5 years ago.

I will give rebuild indexes a try, and cross my fingers that it resolves the problem. Maybe one of the indexes is corrupt, and a rebuild will resolve it?