Full Backups taking an abnormally long time to complete

  • Hello --

    I have been monitoring the amount of time it takes for a Full backup to complete. The e-mail notification shown below provides an example of the current state of affairs:

    JOB RUN:'Backup Plan.Full Backup' was run on 8/23/2014 at 10:00:00 PM

    DURATION:26 hours, 7 minutes, 44 seconds

    STATUS: Succeeded

    MESSAGES:The job succeeded. The Job was invoked by Schedule 9 (Backup Plan.Subplan_1). The last step to run was step 1 (Full Backup).

    The amount of time that it has taken to complete such a backup has grown over the course of several months from just under an hour to just over a day. There are a total of twenty databases that are part of the maintenance plan, and they range in size from fifty megabytes to 133 gigabytes.

    The mdf files are located on a drive connected to one disk array, and the bak files that are generated are on a drive that is on another disk array. The two disk arrays are connected to a private local network that utilizes a 10-gigabyte switch.

    As part of the troubleshooting process, I have replaced several disks that were reporting a relatively high amount of I/O errors in the array that hosts the bak files. The disks in the array contain the mdf files did report any issues. The Full Backup log file did not have entries that indicated a problem with the actual job.

    Beyond that, what other steps could I take to isolate the problem? For example, the Windows Performance Monitor has SQL counters available. Which of these would be useful in this case?

  • Is this a single job backing up each database one at a time?

  • Hello --

    The backups are run under one subplan that backs up every database to a individual bak file.

  • kaplan71 (8/25/2014)


    Hello --

    The backups are run under one subplan that backs up every database to a individual bak file.

    So what you are saying is that if that one backup file goes corrupt or missing you lose the backups for ALL your databases.

    To be honest, your backups should be run separately and each database backup to its own unique backup file.

    Also curious if you are using compression for your backups.

  • Hello --

    So what you are saying is that if that one backup file goes corrupt or missing you lose the backups for ALL your databases.

    That is incorrect.

    To be honest, your backups should be run separately and each database backup to its own unique backup file.

    Although the Full Backups all done under one Subplan, each database has its own unique backup file.

    Also curious if you are using compression for your backups.

    I checked the T-SQL commands, and there was no option for compression with any of the backups.

  • kaplan71 (8/25/2014)


    Hello --

    So what you are saying is that if that one backup file goes corrupt or missing you lose the backups for ALL your databases.

    That is incorrect.

    To be honest, your backups should be run separately and each database backup to its own unique backup file.

    Although the Full Backups all done under one Subplan, each database has its own unique backup file.

    Also curious if you are using compression for your backups.

    I checked the T-SQL commands, and there was no option for compression with any of the backups.

    Regarding item 2, this is what you said:

    The backups are run under one subplan that backs up every database to a individual bak file.

    Based on this it was easy to assume that you are backing up to a single file.

  • You might check the backup history to see what your throughput is for each database, find the worst one and work from there. I modified this from Brent Ozar's script here[/url]:

    SELECT bset.database_name,

    YEAR(backup_finish_date) AS backup_year ,

    MONTH(backup_finish_date) AS backup_month ,

    CAST(AVG(( backup_size / ( DATEDIFF(ss, bset.backup_start_date,

    bset.backup_finish_date) )

    / 1048576 )) AS INT) AS throughput_MB_sec_avg ,

    CAST(MIN(( backup_size / ( DATEDIFF(ss, bset.backup_start_date,

    bset.backup_finish_date) )

    / 1048576 )) AS INT) AS throughput_MB_sec_min ,

    CAST(MAX(( backup_size / ( DATEDIFF(ss, bset.backup_start_date,

    bset.backup_finish_date) )

    / 1048576 )) AS INT) AS throughput_MB_sec_max

    FROM msdb.dbo.backupset bset

    WHERE bset.type = 'D' /* full backups only */

    AND bset.backup_size > 5368709120 /* 5GB or larger */

    AND DATEDIFF(ss, bset.backup_start_date, bset.backup_finish_date) > 1 /* backups lasting over a second */

    --AND bset.database_name = 'MyDatabase'

    GROUP BY YEAR(backup_finish_date) ,

    MONTH(backup_finish_date) , bset.database_name

    ORDER BY @@SERVERNAME,

    YEAR(backup_finish_date) DESC ,

    MONTH(backup_finish_date) DESC

    Regarding compression you might want to check what build you are on for your instance. If you are running SQL Server 2008 (10.00.xxxx) then backup compression is not supported unless you are running Enterprise Edition. SQL Server 2008 R2 allowed this feature in Standard Edition. You might also note that you are running an unsupported version of SQL Server.

    Microsoft has an article on Optimizing Backup and Restore Performance in SQL Server that might be worth reading as well.

    Shawn Melton
    Twitter: @wsmelton
    Blog: wsmelton.github.com
    Github: wsmelton

  • Hello --

    I checked the drive where the bak files are located, and the attachment to this e-mail contains a sample of the bak files.

  • I'd suggest monitoring the backup process and see what the wait statistics are while the backups are running. Understanding why the backups might be waiting on anything will let you know where to focus your efforts.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • kaplan71 (8/25/2014)


    Hello --

    I have been monitoring the amount of time it takes for a Full backup to complete. The e-mail notification shown below provides an example of the current state of affairs:

    JOB RUN:'Backup Plan.Full Backup' was run on 8/23/2014 at 10:00:00 PM

    DURATION:26 hours, 7 minutes, 44 seconds

    STATUS: Succeeded

    MESSAGES:The job succeeded. The Job was invoked by Schedule 9 (Backup Plan.Subplan_1). The last step to run was step 1 (Full Backup).

    The amount of time that it has taken to complete such a backup has grown over the course of several months from just under an hour to just over a day. There are a total of twenty databases that are part of the maintenance plan, and they range in size from fifty megabytes to 133 gigabytes.

    The mdf files are located on a drive connected to one disk array, and the bak files that are generated are on a drive that is on another disk array. The two disk arrays are connected to a private local network that utilizes a 10-gigabyte switch.

    As part of the troubleshooting process, I have replaced several disks that were reporting a relatively high amount of I/O errors in the array that hosts the bak files. The disks in the array contain the mdf files did report any issues. The Full Backup log file did not have entries that indicated a problem with the actual job.

    Beyond that, what other steps could I take to isolate the problem? For example, the Windows Performance Monitor has SQL counters available. Which of these would be useful in this case?

    Quick thought, this sounds like a serious bottleneck somewhere. Have you checked the health status of the two file systems, fragmentation, collision rates on the switches etc.?

    😎

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply