*MAY* need to switch DBs from Full Recovery to Simple, due to backup space issues...

  • First and foremost, I'm working with our backup admin to find a better solution for the problem, that won't require doing this.

    Second, adding capacity to the backup system would be nice, but by the time the request were to make it through the bureaucracy here, we'd be migrating to SQL3010...:hehe:

    So the scoop is, the backup system here ran out of room for the Sharepoint SQL DBs, and now myself and the backup Admin are trying to get it working again. She's limited in what she can do by the retention policies, I'm limited by the fact that it's Sharepoint, so no going in and dropping rows from big tables...

    (The Sharepoint Admin is on vacation, to boot...)

    So I think what may have set the problem off was the SP admin, after realizing that the previous admin had enabled auditing of EVERYTHING and did not set up the "trim" option to only keep a month or so worth of entries, tried to delete out old records. With the DBs being in Full recovery, the TLogs got rather full.

    Then the backup ran out of space because of this (I think the retention is 30 days, then it goes to tape)

    So my worst-case solution to reduce the size of data in the TLogs is to do the following:

    1. After-hours, switch all DBs from Full Recovery to Simple, sacrificing point-in-time recovery

    2. After SQL has "truncated" the logs, switch back to Full Recovery

    3. Let the early AM Full DB backup run, and get back on our bi-hourly TLog backups

    Yes we lose PiT recovery, but right now we have NO recovery since yesterday...

    Yes, I should have caught this sooner.

    So, a question:

    About how long might it be before SQL auto-truncates the logs? Can I "force" it to happen sooner by issuing a "Checkpoint" on the DBs, and could it take more than one checkpoint?

    I'm not currently planning to shrink the physical files, so that SQL doesn't need to "grow" the file later.

    Thanks,

    Jason

    (As an example, one of the Log files is showing 18655.19MB used in the log... I had been monitoring the sizes of the DBs and TLogs for a couple weeks, looking to free up some disk, and this DB averaged 402MB...)

  • jasona.work (4/3/2013)


    So my worst-case solution to reduce the size of data in the TLogs is to do the following:

    1. After-hours, switch all DBs from Full Recovery to Simple, sacrificing point-in-time recovery

    2. After SQL has "truncated" the logs, switch back to Full Recovery

    3. Let the early AM Full DB backup run, and get back on our bi-hourly TLog backups

    What does this gain you that a single log backup (which truncates the log in full and bulk-logged recovery) doesn't, other than extra work?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (4/3/2013)


    jasona.work (4/3/2013)


    So my worst-case solution to reduce the size of data in the TLogs is to do the following:

    1. After-hours, switch all DBs from Full Recovery to Simple, sacrificing point-in-time recovery

    2. After SQL has "truncated" the logs, switch back to Full Recovery

    3. Let the early AM Full DB backup run, and get back on our bi-hourly TLog backups

    What does this gain you that a single log backup (which truncates the log in full and bulk-logged recovery) doesn't, other than extra work?

    My guess, and Jason can confirm, is that they don't have the space to complete a t-log backup.

  • Overall that sounds reasonable, given that you are in an emergency situation.

    About how long might it be before SQL auto-truncates the logs? Can I "force" it to happen sooner by issuing a "Checkpoint" on the DBs, and could it take more than one checkpoint?

    It shouldn't take long at all. Checkpoints automatically occur fairly frequently. I'd make the change to Simple ~20 mins before the full backups run, then switch back to full ~5 mins before (which probably won't really take affect until after the full backup anyway). That leaves only a very small vulnerability window.

    I'm not currently planning to shrink the physical files, so that SQL doesn't need to "grow" the file later. (As an example, one of the Log files is showing 18655.19MB used in the log... I had been monitoring the sizes of the DBs and TLogs for a couple weeks, looking to free up some disk, and this DB averaged 402MB...)

    You should strongly consider shrinking log files that need it, such as ones this overgrown and/or if the # of VLFs is > ~ 200 (command "DBCC LOGINFO" can show you the VLFs). You can gain serious performance sometimes by shrinking the log file and then re-growing it yourself all at once, giving you fewer VLFs and (almost always) more contiguous log files.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Lynn Pettis (4/3/2013)


    GilaMonster (4/3/2013)


    jasona.work (4/3/2013)


    So my worst-case solution to reduce the size of data in the TLogs is to do the following:

    1. After-hours, switch all DBs from Full Recovery to Simple, sacrificing point-in-time recovery

    2. After SQL has "truncated" the logs, switch back to Full Recovery

    3. Let the early AM Full DB backup run, and get back on our bi-hourly TLog backups

    What does this gain you that a single log backup (which truncates the log in full and bulk-logged recovery) doesn't, other than extra work?

    My guess, and Jason can confirm, is that they don't have the space to complete a t-log backup.

    *DING DING DING*

    Yes, there's currently not enough room on the backup systems storage (Commvault) to even run a TLog backup to free up the log space. There is enough room on disk to do this, but I don't believe the filesystem is backed up on these servers (and that goes right back to then not having enough room.) Although if the filesystem gets backed up, it's likely to a different storage "bucket." I've asked the Commvault admin about this.

    ScottPletcher, Once the SP Admin gets done with clearing out the old audit information, actually reducing the on-disk size of the logs (and as you suggest, setting their size to have a "reasonable" number of VLFs) is next on the plate.

    Thanks,

    Jason

  • Occasionally I've seen stuck open transactions lead to unchecked log file growth. Try DBCC OPENTRAN on the DBs with oversized log files and see if there is an active open transaction that has been there a long time.

    If there is, and you can trace to a process that should have closed a long time ago, you can kill the process. Run DBCC OPENTRAN again, and once the stuck transactions are gone (know what they are before you kill them of course), the log file should start to reuse space and further growth will stop.

    The log file itself will not shrink by itself though, if you want that done you'll need to run the DBCC SHRINKFILE command. That may be a good idea, because if your log file is 3 times larger that the datafile, it is likely you have an excessive number of VLFs (Virtual Log Files) and this won't help anything.

    I've addressed log file issues on my instances a number of times without changing the Recovery Model, which carries its own risks.

    Naturally you should know what you're doing when running these commands on a live production database, so please do some research on DBCC OPENTRAN, SHRINKFILE and VLFs. Hopefully that will point you in the right direction.

  • Then just switch to simple recovery, switch back to full immediately, run a full backup and restart the log backups after that. Providing there's nothing else preventing log reuse, that will leave most of the log inactive and the log backup right after the full will just have the log records since the full backup, after that they'll, as normal, have all log records since the previous log backup.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • jasona.work (4/3/2013)

    ScottPletcher, Once the SP Admin gets done with clearing out the old audit information, actually reducing the on-disk size of the logs (and as you suggest, setting their size to have a "reasonable" number of VLFs) is next on the plate.Thanks,

    Jason

    Personally, if I were you, I'd consider leaving the db in simple mode while clearing out the old audit information, assuming you are deleting in batches. Then you wouldn't have to do a tlog backup inbetween each audit batch.

    Do an incremental backup when done with deletes and go back to FULL mode then.

    Btw, I suggested clearing the logs immediately also because it will give you space to back up other logs to, even if that's not the drive where you normally store log backups. If the log backup is just an intermediate file, it doesn't really matter where it goes on disk (other than overall performance, but that is secondary right now).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Everyone: Thank you for the ideas / suggestions

    dan-572483 - Right now I'm not worrying about shrinking the files, that will come later. We're OK on disk space on the log drive, unless I back the logs up to that drive to try to fix this...

    I did give the DBCC OPENTRAN a try, and there were no open transactions, so that's a good thing...

    Gail - That's probably what I'm going to end up doing, although I'm still trying to find a better solution with the backup Admin.

    Scott - That's what I'm planning at this point when the SP Admin comes back from his vacation and is ready to try again. He'd be most likely doing this on a weekend, so after our "end of the day" TLog backup, I'd kick the DBs to Simple, he'd fire off his Powershell (provided by MS for just this purpose) to clear out the audit entries in batches, then when he's done, we'd kick the DBs back to Full recovery in plenty of time for our start of the week full backup.

    Once again, thanks everyone!

    Jason

  • Once again, thank you...

    In the interests of paranioa, I ran log backups of the 6 biggest log files to disk on the server. Thanks to Enterprise Edition, the backups used about 10GB total (compression on for the backup)

    The data space used by all the log files?

    Went from ~46GB to ~5GB...

    So yeah, when the SP Admin is ready to purge out his audit tables, I'm going to kick the DBs to Simple, let him do his thing, then back to full (making sure we've got a backup from before he starts, just in case...)

    Once more, thank you!

    Jason

  • What are your SLAs for RTO and RPO? If you do not have SLAs for these items you are most likely either using more money than your business wants you to, or you are risking business continuity by not being able to recover essential data.

    Dealing with lack of disk space for backups may be this morning's problem, but you really need to deal with aligning your SLAs with what can be delivered.

    If your business is happy to have 12 hours data loss on SharePoint, then there is no need for these databases to ever have been in Full recovery mode.

    If your business reckons it will cost $1m for every minute of data lost in a recovery, then you need to tell your boss he is currently exposed for $720m of costs until he can get you $20k more disk space. It should get their attention.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply