Database goes into "In Recovery" due to drive full

  • DB goes into "in recovery" when reindexing runs and log drive becomes full.

    DB size - 300GB

    Log drive size 100GB.

    VLF count 750.

    growth rate 1000MB

    And take 2-3 hours to come back online. Is this expected behavior ? If no what can be the reason?

    Cheers

  • The best approach would be to prevent the transaction log of the database becoming full.

    Are you running regular transaction log backups?

  • Yes I understand that... Normally we take log backup frequently...

    Some time reindexing fills up the entire 100gb drive...

    But normally if Tlog is full, process should fail with Tlog full error.

    The question is why does it go in recovery for hours... Even when we create some space in drive for undoing operation to complete...

    Cheers

  • The database is rolling back any active transactions by going through the VLFs in the transaction log, the time taken is based on the operations that have to be performed in order to get the database back into a consistent state.

    If you have sp_whoisactive installed on your instance you will be able to see what is being executed.

    How are you performing maintenance on your indexes? Are you rebuilding all of them or reorganising/rebuiding based on fragmentation?

    Also, you can see why the existing log space is not being re-used by running the following query:-

    SELECT name, log_reuse_wait_desc

    FROM sys.databases

  • Your database is going into recovery mode probably because of some form of corruption.

    Have you run DBCC CHECKDB recently?

  • DBA From The Cold (9/3/2014)


    Your database is going into recovery mode probably because of some form of corruption.

    Really? What form of corruption sends a DB into the recovering state?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I run checkdb every weekend, so corruption can be ruled out.

    My maintenance plan follows all MS recommendations.

    since the issue is not happening right now I cant check log reuse status.

    Cheers

  • @Gail, Do you think high VLF count (~750) can be reason for this delay?

    I am not fully convinced because I have seen databases with couple of thousands of VLFs without any such issue

    Cheers

  • This was removed by the editor as SPAM

  • jacksonandrew321 (9/3/2014)


    U can shrink the log and use simple recovery mode or shrink database files.

    No, please don't recommend shrinks. They really shouldn't be run without a good reason. Simple recovery prevents point-in-time restores (not rollbacks), which may be unacceptable to the business. Shrinking the data files will cause severe index fragmentation.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • AppSup_dba (9/2/2014)


    DB goes into "in recovery" when reindexing runs and log drive becomes full.

    DB size - 300GB

    Log drive size 100GB.

    VLF count 750.

    growth rate 1000MB

    And take 2-3 hours to come back online. Is this expected behavior ? If no what can be the reason?

    It shouldn't go into the recovering state unless something took the DB offline/online or restarted SQL. The long recovery is expected, the index rebuild has to roll back.

    I would suggest making sure you have sufficient space on the log drive, spreading the rebuilds out so that there's chance for the log backups to run between them and, if you're using maintenance plans, switching to Ola's index maintenance tool which doesn't rebuild everything every time it runs (http://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html). Possibly also make your log backups more frequent.

    Corruption wouldn't cause this, that would send a DB into the SUSPECT state, possibly RECOVERY_PENDING.

    Don't shrink the logs, they'll just regrow if they need to be the size that they are

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/3/2014)


    AppSup_dba (9/2/2014)


    DB goes into "in recovery" when reindexing runs and log drive becomes full.

    DB size - 300GB

    Log drive size 100GB.

    VLF count 750.

    growth rate 1000MB

    And take 2-3 hours to come back online. Is this expected behavior ? If no what can be the reason?

    It shouldn't go into the recovering state unless something took the DB offline/online or restarted SQL. The long recovery is expected, the index rebuild has to roll back.

    I would suggest making sure you have sufficient space on the log drive, spreading the rebuilds out so that there's chance for the log backups to run between them and, if you're using maintenance plans, switching to Ola's index maintenance tool which doesn't rebuild everything every time it runs (http://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html). Possibly also make your log backups more frequent.

    Corruption wouldn't cause this, that would send a DB into the SUSPECT state, possibly RECOVERY_PENDING.

    Don't shrink the logs, they'll just regrow if they need to be the size that they are

    Ah, sorry yes the database would go into SUSPECT not recovery.

    Completely agree, spacing out the rebuilds and regular backups will stop the log filling up

  • Gail,

    we use Ola's scripts for maintenance.

    The database doesnt go offline and nor sql service is restarted.

    One ques: I understand that Long running rebuild index will take time to roll back. But why the database get stuck "in recovery" is what I can not able to interpret. The database should be in usable state even when rebuild index is being rolled back.

    Just in case it helps, this database is also a principal.

    Cheers

  • It shouldn't be, I did say that.

    Is the mirroring failing over?

    Are there any useful messages in the error log?

    Is the log on the mirror side also filling up?

    Are you sure the DB isn't going offline before going into recovery?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Is the mirroring failing over? No

    Are there any useful messages in the error log?

    Error log:

    4078 transactions rolled forward in database 'database' (9). This is an informational message only. No user action is required.

    Recovery of database 'database' (9) is 4% complete (approximately 146199 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.

    1 transactions rolled back in database 'database' (9). This is an informational message only. No user action is required.

    Recovery is writing a checkpoint in database 'database' (9). This is an informational message only. No user action is required.

    Recovery completed for database database (database ID 9) in 128 mins (s) (analysis xx ms, redo xx ms, undo xx ms.) This is an informational message only. No user action is required.

    Is the log on the mirror side also filling up? No

    Are you sure the DB isn't going offline before going into recovery? Yes database goes offline and restarted due to rollback process failed (I stand corrected on this)

    Looks like: Rebuild index -> log full -> rollback -> rollback failure -> database offline -> database restarted on its own -> database "in recovery" -> database online

    Do you think long recovery time is normal (as index rebuild is also long IO intensive operation) or we can reduce this somehow?

    PS: I am already ordered extra drive space, so above question is for understanding only.

    Cheers

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply