Database stuck in RECOVERING state

  • Hello,

    I have a big issue with an SQL Server. Several databases are stuck in RECOVERING state. I restarted the server and now some other databases are stuck (msdb is one of them).

    Any ideas on how I can troubleshoot or manage this issue is more than welcomed.

    Thanks,

    Ioana

  • Check if the the disks with the db files are accessible or if you have any network issues with those drives if there are on SAN.

    Thank You,

    Best Regards,

    SQLBuddy

  • Before you do anything else....

    Query sys.databases. What's the exact state (state_desc) for these databases?

    Check the SQL error log. Are there any entries in there relating to these databases? If so, what are they?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hello,

    Thank you for your replies.

    The disks are accessible, the db files are accesible. I even copied some of them (while the server was stoped) and attached them to another SLQ Server and they seem to work.

    Still, when starting again the SQL server, new databases seem to be in recovering mode.

    @Gail

    The exact state description is: RECOVERING

    In Error Log are entries related to these databases saying that they are starting: "Starting up database '[database name]'" and that is all there is

    Thank you,

    Voichita

  • If they are recovering, then you just need to wait. Long recovery times could be caused by poor IO performance or log fragmentation. Regardless, they should come on line once SQL has finished the recovery process. Don't be tempted to do silly things like detach and delete log files.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hello again,

    Problem partially solved. Here is what we did:

    - determined which was the blocking recovering database

    - stopped SQL Services

    - renamed mdf and log files for the database above

    - restarted SQL Server

    - all current databases recovered successfully in a couple of minutes

    - deleted the database with the issues and created a restore based on the latest backup (lucky, there were no data loses)

    Still, I am wondering why the recovering of the database took so long. Next I shortly describe the cycle of events that led to this situation, maybe one of you guys come with ideas on how something like this can be prevented:

    - The hard drive on which the database files were stored performed very badly lately (it is an flash USB drive) and was short on free space; our DBA decided to defragment it, but before that he detached the the database

    - After hard drive fragmentation was over, he attached back the DB

    - By default, the DB was set into READONLY mode, so he tried to update it to READ/WRITE. At this point the DB was blocked; the update ran for 15 min and did not complete. He tried to kill the process and ... yeap, it started an ever lasting ROLLBACK.

    - After an hour during which we waited for the ROLLBACK process to end, we decided to restart SQL Server. On restart, normally, it entered in recovery mode, only that it did not finish and was also blocking other databases to recover.

    Si, I think you may be right about log fragmentation, I suspect this was the cause for the long recovering. Please let me know if you have any other ideas on what might have happened or any advices on how this can be prevented.

    Thank you for your support,

    Ioana

  • As I said, poor IO performance (which you confirmed was the case) and log fragmentation are the main causes of long database recovery.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/8/2010)


    As I said, poor IO performance (which you confirmed was the case) and log fragmentation are the main causes of long database recovery.

    I'm speculating here... if the database was in full recovery model, and they weren't doing t-log backups, the log file will grow to be really large over time. Would this affect the recovery time of the database? Would it affect it to this extent?

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (9/8/2010)


    Would this affect the recovery time of the database? Would it affect it to this extent?

    Shouldn't and no.

    Recovery just needs the active portion of the log, the portion that contains info on uncommitted transactions and transactions that committed but hadn't been written to disk. A DB in full recovery with no log backups will have a massive log, but the vast majority (the portion that log backups would have removed) will be inactive and hence of no interest to the recovery.

    Recovery will start reading the log at the beginning of the oldest active VLF and will keep reading until it finds the end of the log. That's the only portion that it's interested in, the rest is unnecessary.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail, Sorry for hopping in but would this be a typical case of VLF fragmentation ?

    ref: Monitoring SQL Server Virtual Log File Fragmentation

    http://www.simple-talk.com/sql/database-administration/monitoring-sql-server-virtual-log-file-fragmentation/

    Performance impact: a large number of virtual log files – Part I ( Linchi Shea )

    http://sqlblog.com/blogs/linchi_shea/archive/2009/02/09/performance-impact-a-large-number-of-virtual-log-files-part-i.aspx

    Performance impact: a large number of virtual log files – Part II ( Linchi Shea )

    http://sqlblog.com/blogs/linchi_shea/archive/2009/02/12/performance-impact-a-large-number-of-virtual-log-files-part-ii.aspx

    VLFs and transaction log throughput ( Kimberly L. Tripp ) http://www.sqlskills.com/BLOGS/KIMBERLY/post/8-Steps-to-better-Transaction-Log-throughput.aspx.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (9/8/2010)


    Hi Gail, Sorry for hopping in but would this be a typical case of VLF fragmentation ?

    That is what I said earlier.

    GilaMonster (9/8/2010)


    As I said, poor IO performance (which you confirmed was the case) and log fragmentation are the main causes of long database recovery.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks. Just making sure I didn't misinterprete it.

    So, Ioana there is still a aftermath to do after the db got back online.

    The provided urls should get you on track.

    Simplefied steps:

    -- 0 = db file info check # VLFs

    -- 1 = backup log before operation

    -- 2 = Shrink log file

    -- 3 = extend shrinked log file to a suitable size

    -- 4 = Diff backup

    -- 5 = backup log after diff

    TEST IT - TEST IT -

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • GilaMonster (9/8/2010)


    WayneS (9/8/2010)


    Would this affect the recovery time of the database? Would it affect it to this extent?

    Shouldn't and no.

    Recovery just needs the active portion of the log, the portion that contains info on uncommitted transactions and transactions that committed but hadn't been written to disk. A DB in full recovery with no log backups will have a massive log, but the vast majority (the portion that log backups would have removed) will be inactive and hence of no interest to the recovery.

    Recovery will start reading the log at the beginning of the oldest active VLF and will keep reading until it finds the end of the log. That's the only portion that it's interested in, the rest is unnecessary.

    Thanks Gail!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • If the DB was on a USB Flash drive (which sounds like a dodgy idea to me....) then you don't need to defragment it. Fragmentation is an issue for spinning disks because they have to seek all over the place to read a file. Solid state disks & flash drives don't have any seek latentcy, so they aren't bothered by fragmentation.

    Also, flash drives don't last forever. Each block can only be erased and re-written a certain number of times. By defragging the drive you are erasing and re-writing a decent fraction of the blocks, using up your precious limited erase cycles.

    When a flash drive starts wearing out, you get blocks that can't be written to. Usually the existing data can still be read. Sounds like this might be what happened here.

  • John Sinclair (9/9/2010)


    If the DB was on a USB Flash drive (which sounds like a dodgy idea to me....) then you don't need to defragment it. Fragmentation is an issue for spinning disks because they have to seek all over the place to read a file.

    No one mentioned defragging the drive. Log fragmentation is internal in the log file - lots and lots of very small VLFs within the transaction log. A very large number of these VLFs slow down database recovery.

    Refer to the links that ALZDBA posted.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply