150 GB Backup (compressed) taking over 1 day

  • We have 75 databases being backed up using simple recovery model on a nightly basis. We have no custom applications, but we have the following 3rd party apps installed and using the SQL server: SharePoint 2013, VMWare, SCCM, SolarWinds, Microsoft BI including SSAS, SSIS, and SSRS.

    Backups have not always taken this long and we are struggling to find out what is causing the delay. Any recommendations?

  • Are the backups being taken locally?

    Are there any other services taking up resources while the backup is running. ie. SSAS,SSIS

    Do you have a dedicated drive for backups.

    Is it physical or VM server.

    Do your dbs have a lot of VLFs

    Is your drive suffering with I/O issues.

    What is the server spec and version

  • Backup time is mostly IO subsystem, reads from the source DB, writes to the backup device.

    Check perfmon for the latency on the drives involved, speak with your storage admin about the throughput and poor performance

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • jbrandenburg (11/16/2015)


    We have 75 databases being backed up using simple recovery model on a nightly basis. We have no custom applications, but we have the following 3rd party apps installed and using the SQL server: SharePoint 2013, VMWare, SCCM, SolarWinds, Microsoft BI including SSAS, SSIS, and SSRS.

    Backups have not always taken this long and we are struggling to find out what is causing the delay. Any recommendations?

    These types of problems are always tough to find the cause of. It usually boils down to someone setting up a new job somewhere that "fills the pipe" or beats the hell out of the related hard-drives (BERemote {Backup-Exec} was our nemesis for a while). On rare occasions, a bad network card can be the blame and on even more rare occasions, someone has done something as simple as rerouting a cable and putting too tight of a bend in it or having a bad cable end that's intermittent. I've also seen people use the wrong type of cable (straight thru) instead of the proper Cat-5 cable with the internal cross over.

    Last but not least, there may have been another borderline equipment failure such a switch or router going bad. I've also seen it where someone bounced such a piece of equipment and when it came back up, it was set to half-duplex and auto-negotiate instead of being hard-set to full duplex and fixed speed settings.

    Like I said, not going to be easy to find but now you might know a couple of other things that it might be.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/16/2015)


    Last but not least, there may have been another borderline equipment failure such a switch or router going bad.

    Or a disk controller is going. Or someone messed with the settings on the backup jobs and have added more DBs to them.

    I'd start with talking to the storage admin and the server admin to see if someone can check out the usual hardware suspects. Once someone tripped over a cord in our server center. It was just plugged in enough to occasionally work and just loosened enough to have intermittent outages. Took us 2 weeks to find the culprit (the wire) for our problems and when we did, the server tech did a blush and an apology as he pushed it back in the slot.

    Lo and behold, stuff started working again like magic.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • jbrandenburg (11/16/2015)


    We have 75 databases being backed up using simple recovery model on a nightly basis. We have no custom applications, but we have the following 3rd party apps installed and using the SQL server: SharePoint 2013, VMWare, SCCM, SolarWinds, Microsoft BI including SSAS, SSIS, and SSRS.

    Backups have not always taken this long and we are struggling to find out what is causing the delay. Any recommendations?

    Almost impossible to tell with so little information.

    What are you backing up to? What is the connection between your SQL server and backup device? What else might be accessing the backup device at the same time? What else is happening in SQL during your backups? If your servers are virtual, what else is happening on the host? Etc.

  • More information:

    The backups are being taken to another virtual server on the SAN.

    We have dedicated drives for backups.

    Virtual Server using VMWare

    VLF Count ranges from 4-389 (across all databases).

    It appears we have I/O issues - the SQL Logs say we process between 2 and 13 MB per second on the BACKUP DATABASE successfully processed pages message.

    We have a 10GB connection between the SQL Server and the Backup Server.

    What information should I collect from my storage admin? We have an EMC SAN holding all of our data.

  • Change the storage unit from OpenStorage disk device to an advanced disk device :unsure:

  • Go sit with your storage admin and see if the two of you can work out why you're getting such slow throughput on a SAN. Work with him, don't just tell him you need information.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • What are the Buffercount, maxtransfersize and blocksize you use when doing backups?

    Maybe link the script that is run.

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • GilaMonster (11/17/2015)


    Go sit with your storage admin and see if the two of you can work out why you're getting such slow throughput on a SAN. Work with him, don't just tell him you need information.

    This. And make sure he helps you check from end to end, start to finish. Don't let him run a "quick diagnostic" and say everything is fine.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I have been working very closely with the storage admin. He has confirmed things appear to be setup properly. The SQL server and the backup server are on the fastest disks we have.

  • How is this configuration option completed? What is the impact of doing so?

  • jbrandenburg (11/17/2015)


    How is this configuration option completed? What is the impact of doing so?

    Which configuration option are you talking about?

    Have you gotten with the server admin to run a PerfMon to check the IO counters for issues? Just because your storage admin says everything is on the fastest disks doesn't mean there isn't a problem with the disks now. Did he actually do diagnostics or just check the setup of the backups?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (11/18/2015)


    jbrandenburg (11/17/2015)


    How is this configuration option completed? What is the impact of doing so?

    Which configuration option are you talking about?

    Have you gotten with the server admin to run a PerfMon to check the IO counters for issues? Just because your storage admin says everything is on the fastest disks doesn't mean there isn't a problem with the disks now. Did he actually do diagnostics or just check the setup of the backups?

    +1

    Sounds like your disks are disk bound

    _________________________________________________________________

    "The problem with internet quotes is that you cant always depend on their accuracy" -Abraham Lincoln, 1864

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply