SQL Backup running long

  • Hi Guys,

    Can you give me a guideline as to what to check what caused my backup to run for a long time? We have a backup job that ran for 10 hours last night. It usually runs for 5 hours.

    The database is 8 TB is size. I spoke to the Network guys and there was not network contention between the database and backup server. I checked the database and backup server and no contention of both servers either.

    Can you tell me whether I am missing something.

    Regards

    IC

  • In general the I/O is the limiting factor with backup operations. Does the size of the backup file differs much between the last and the previous one? Did you specify different parameters in the backup command (like stripes or buffercount)?

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • yes, the backup size increased by 30GB since the previous full.

    No, we dont set a buffercount when we backup the database.

  • What is logged in the SQL Error log for both (and previous) backup runs? Look for something similar like "BACKUP DATABASE successfully processed 3824 pages in 0.376 seconds (79.440 MB/sec)." Is the throughput mentioned between brackets approximate the same?

    What kind of storage is used on the backup server? Does the storage admistrators can see if it performed well during the backup?

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • BACKUP DATABASE successfully processed 849077049 pages in 35673.692 seconds (185.946 MB/sec).

    Disk: SAS and SAS flash, works on tier system.

  • Imke Cronje (10/25/2016)


    BACKUP DATABASE successfully processed 849077049 pages in 35673.692 seconds (185.946 MB/sec).

    You only post information about one single backup run (is this the slow or an average one?). With information of only a single run we cannot tell you much. We could compare it with results agains our own systems, but that wont tell anything about why your situation has changed.

    How do your posted values compare to your other runs? Compare multiple backup runs from last week (or from an even longer period) and see if there is a large deviation or trend visible. Are the processed pages approximate the same across all backups? Does the troughput vary a lot? Also take the time of each run into consideration. If a run is executed on a completely different time of the day the network and/or backup server could have other processes to handle.

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • Thanks! There is a difference with the backup times.

    Last nights speeds was slower. The average speed is 300MB\sec.

  • Have you considered differential backups for daily, with perhaps a full backup only weekly (at whatever your least active time during the week is)?

    Does your version of SQL allow you to specify COMPRESSION?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • If you have a monitoring tool like Redgate or SQLSentry, then go back and check for any extended periods of blocking on the backup process that may have occurred during that time.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • we do fulls every 4 days and diffs in between

  • We have Sentry and there was no blocking during the time when the backup took place.

  • ScottPletcher (10/25/2016)


    Have you considered differential backups for daily, with perhaps a full backup only weekly (at whatever your least active time during the week is)?

    Does your version of SQL allow you to specify COMPRESSION?

    The problem is that what used to take 5 hours is suddenly taking 10 hours even with just a 30MB change. Something changed or went bad. DIFs, Compression, and all that are excellent ideas but that's not the problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Imke Cronje (10/25/2016)


    Thanks! There is a difference with the backup times.

    Last nights speeds was slower. The average speed is 300MB\sec.

    I just went through that over the weekend and yesterday.

    First, the over the weekend thing was because a card in a network switch started to go bad. They replaced that. It fixed most of the systems that were experiencing the slowdown but one. It took our good folks in NetOps a bit to figure it out but they found that a cable was "broken". It appears to have had a bad connector that was just waiting for someone to touch it. They had to pull the cable (and all the others) to remove the switch to replace the card. There was no external sign that the connector failed internally but the transfer rates plummeted to 0.043MB per second.

    In the past, we also had folks that thought it would be a good idea to do multi-file backups. Doesn't work well unless you can control which spindles each file is going to use. With today's SAN setups, that's nearly impossible and the headthrashing between two files causes things to run much slower.

    We also had folks that messed with backup buffer and packet sizes. We found the "sweet spot" for that for our particular system but they originally used "rote settings" from some article and never checked to see what the impact was.

    I've also seen switch settings changes really mess things up. Having a switch change from FULL DUPLEX to single will cause things to run twice as slow in a lot of cases. Setting NICs and switches/routers to AUTO NEGOTIATE can cause similar problems.

    Some of the tools already mentioned can help you track these things down but, just because such a tool doesn't find something like I've mentioned, doesn't mean that such a problem doesn't exist... especially the slowly-going-bad card that acts up only when it gets heated up or the intermittent cable that will handle single requests just fine but not a stream of data.

    And then there's the addition of jobs or someone running something new. It doesn't even have to be on the SQL Server itself. We had a Web Server go bonkers and start sending a bazillion requests that flooded the proverbial pipe. It was almost like a denial of service attack. It was just some bad code that someone promoted.

    I've also seen things like Windows updates mess things up and need to be rolled back or someone do an update to virus protection and the exclusions for SQL Server files were overridden.

    Last but not least, I've also seen well-meaning people set the clock frequency on the server to be automatic to "save energy". That doesn't work so well with SQL Server.

    "Fun" stuff. I wish you luck.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply