Backup taking too long - SQL Server 2016

  • Backup is taking too long to complete all of a sudden, this is the wait_type and time shown for the backup.

    wait_type :ASYNC_IO_COMPLETION

    wait_time : 53127671

    This DB server has more than 10 CPU and 96 GB RAM, SQL is using max memory of 92 GB.

     

  • If memory serves, ASYNC_IO_COMPLETION wait type is related to writing to disk.  Quick google tells me that is correct:

    Occurs while waiting for non-data-file disk I/O to complete. For example while growing a transaction log disk I/O is required to zero-out the file. Backup and restore operations can also accumulate this wait counter. As this wait accumulates during backups moderate values are expected

    https://www.sqlshack.com/sql-server-wait-type-async-io-completion/#:~:text=ASYNC_IO_COMPLETION%201%20Description.%20Occurs%20while%20waiting%20for%20non-data-file,restarting%20your%20device.%20Full%20screen%20is%20unavailable.%20

     

    So basically, you have something causing the disk to be slow.  This may be something on the OS side causing slow writes to disk (antivirus for example), or it could be on the database side (a 10 PB backup file will take time to write to disk for example) or could be SAN related if the backup disk was moved from SSD to HDD.

    The slowness is not (likely) due to CPU or RAM, but due to your disk.  If backup compression is turned off, turning that on should help performance.

     

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Backup compress is performed in the backup script, size of DB is just 400 GB and is taking almost 10-12 hours to complete backup.

     

  • How busy is the disk when you go to do the backup?  Is it an SSD or HDD?

    I have a database that is roughly 400 GB and it can back up in 30 minutes easy.  Which I find to be slow, but it is as fast as our slow disk will allow (10,000 RPM HDD in our SAN).

    My guess is something is up with the disk, something is using the disk, or something other than SQL Server is touching the backup file during the backup operation causing SQL to wait longer than it should be between writes.

    Is the disk local to the machine or out on the network/internet?  If it is local to the machine, I would try copying a large file from the non-backup disk to the backup disk and see how long it takes.  If it is going out over the internet or across the network, that is a different beast as something may be throttling your connection or you may be having network congestion.  Or it could be a sign of hardware failure.

    I would try copying a large file (10+ GB) to the backup disk from the SQL Server server to get a baseline for how long it should take and you can extrapolate that data to get a rough idea of how long it should take for the database backup to complete.  With compression, I would expect roughly 50% compression, so if a 10 GB file takes 1 minute to copy over, a 200 GB file should take roughly 20 minutes to complete.  Add a little overhead for the backup process itself and the compression and other things, I would expect it to complete within 30 minutes.  Now if a 10 GB file takes half an hour to copy over, I would expect a 200 GB backup to take at least 10 hours to complete.

    I am expecting this not to be a SQL Server problem but more a problem with either the existing hardware, some hardware setup/configuration, or failing that, a problem with some software on the SQL Server that is slowing down writes to disk for some reason (antivirus scans for example can be horrid performance killers for large files).  Now, if you are backing up 100 databases that are all 400 GB in size to the same physical disk at the same time AND it is an HDD, you are going to get performance issues with that backup as HDD's do not handle synchronous writes or synchronous reads.  So things get queued up and you may end up waiting for some other process to complete.

    If you are defragging or chkdsk-ing the drive while doing a backup, that will slow things down too.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Are you backing up to the same disk where all the data is? Are you backing up to the disk where the OS or is? Are you backing up across a network or through some type of attached storage?

    I'm with @mr. brian gale. Sounds like some kind of contention, but it could also be a failure of something mechanical somewhere along the way.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Is the backup broken into several files? The backup database command has options

    , Buffer count =2200, blocksize = 65536, maxtransfersize = 2097152

    If that fails, use

    , Buffer size =1200, blocksize = 32768

  • We are backing up to the network share and this slowness started just a week before otherwise within 4-5 hrs and it would have been done.

    I am planning to do manual copy of the 10 Gb file as suggested and see how long it would take.

  • No, the backup is not broken into several files, we can try to do that.

  • Since it is going out to a network share, it could be that someone else is doing something on that share at the same time as your backup.  I am expecting it to be contention OR throttling.  Could be your network team introduced something new into the network that is causing the traffic to that disk to be slower such as throttling the connections per machine.

    My guess - someone changed something outside the SQL Server causing your writes to be slower.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • We had the experience once where backups slowed down and the problem was a switch going bad. It didn't evidence itself for any other process except our SQL Server backups. All sorts of stuff comes into play once you're hopping the network with the backup process.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • PJ_SQL wrote:

    No, the backup is not broken into several files, we can try to do that.

    As you said, you didn't have this problem just a little bit ago.  I'm with Grant in that it could be a simple but not so easy to find hardware failure anywhere from the NIC in the server to and thru the NAS.  Don't discount things like a cable connection going bad over time... I've had that happen more than once.

    I'd concentrate on finding the original problem rather than trying multi-backups.  Save that for when you know the "pipe" is working up to snuff for sure.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • We had the switch upgrade right before the backup started taking long time to finish.

  • So, a hardware change and then behavior changed? When you say right before, you mean several days, several hours, or they pretty much coincide?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • On the day of the upgrade, starting that night backups started taking long time for just few servers.

  • PJ_SQL wrote:

    We had the switch upgrade right before the backup started taking long time to finish.

    I'm thinking that you've just identified your problem with the backups. 😉

    Have the check the settings.  The use of Auto-Negotiate will frequently cause the switch to spend more time "negotiating" than it does passing data.  It's been a long time since I've worked with that part of the hardware but simple mistakes like that and the old mistake of (and I don't know if it still applies) using Half-Duplex instead of Full-Duplex (and I imagine, other new types of settings that I don't know about) used to cause serious performance issues.  I've also seen it where they installed a "lesser" switch by mistake or didn't test their connections and have a weak or "chattering" connection in the actual cable terminations.  I've even seen partially broken brand new cable that has a flaw in it and the concept of a "minimum bend radius" is still an issue in a lot of shops.  They may have even used the wrong kind of preterminated cables that don't have the correct "crossover" of the wires themselves.

    Again, it's been a long time since I've worked that kind of hardware but my understanding is that they now have cable testers that "operate at frequency" and can measure for any "echoes" produced by bad cables and poor termination or even using incorrectly rated cable.

    Of course, the brand spankin' new switch may also be bad.  I've had that happen a couple of times even in the last couple of years (as well as all the other stuff  I've mentioned).  There's also the "coffee pot" syndrome where the switch is plugged into a real noisy circuit.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply