Restore WITH RECOVERY freezing on source DB after tail of log backup

  • Hello everyone,

    I've encountered this scenario twice so far and I still don't understand what's happening.

    I'm doing a database migration following roughly these steps.

    1. Restore my FULL/DIFF/Logs WITH NORECOVERY on the new server
    2. Take tail of log backup on old server (usually requires a SET SINGLE_USER also)
    3. Restore tail of log on new server WITH RECOVERY

    So far so good, the new DB is available and the old one is in "Recovering" state so no longer available. Exactly what I want.

    Now for whatever reason there is an issue and we need to rollback to the old server. I'm expecting to simply need to run "RESTORE DATABASE MyDB WITH RECOVERY" but twice now running this command just completely froze the connection. I let it run for 15-30 minutes and absolutely nothing is happening, I see it in sp_whoisactive, it's not blocked by anything and I'm running it in the same session as the SET SINGLE_USER / Tail of Log. Both time I ended up restarting the SQL Server service and the databases come back online instantly, no issues. But I obviously don't want this to my go-to solution.

    I should also mention that both time this happened was on SQL Server 2012, two different instance but both fully patched. We are not talking about huge databases either, both time I encountered the issue was with DBs <5GB.

    Am I doing something wrong?

    To clarify in case I wasn't clear, I'm basically doing the following code (but of course this worked without issue when I tested it so it's probably not that simple)

    CREATE DATABASE TestRestore
    BACKUP DATABASE TestRestore TO DISK = N'C:\Temp\TestRestore_FULL.BAK'
    ALTER DATABASE TestRestore SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    BACKUP LOG TestRestore TO DISK = N'C:\Temp\TestRestore_Tail.trn' WITH NORECOVERY
    RESTORE DATABASE TestRestore WITH RECOVERY /* Freeze here */
  • Anything interesting in the logs?  That is usually where I start all of my troubleshooting.  SQL Server logs as well as the Windows logs.  Never know where it may be logged!

    I am not able to reproduce the problem (similar to you), so I am wondering if MAYBE it is related to having it in the recovery state for a long period of time?

    On a side note - I've never left a database in RECOVERY state intentionally for extended periods of time, nor am I certain what happens after a reboot or service restart.  Does the database remain in RECOVERY state or does it come out of it into normal state after a service restart?

    The reason I ask is I am wondering if a better approach than putting it into recovery MAY be to take that tail log backup and set the database into OFFLINE mode?  This is the approach I usually take when migrating databases as it has a similar effect as the process that you are describing.  Here, if I want to bring the database back, I just set it to ONLINE and it is good to go again.  Downside to this approach is while it is in offline mode, the data file and log file MAY be moved which would cause issues bringing it into an ONLINE state.

    Hopefully someone else chimes in, but I am thinking it MAY be related to it sitting in RECOVERY for an extended period of time that it might be doing something funky under the hood to bring it back into a regular state when you run the "RESTORE WITH RECOVERY" command.  Some sort of sanity checks when it has been IN RECOVERY for extended periods of time?  Also, you say it was in a working state after a service restart - this is why I am wondering if it went into a working state from the RESTORE WITH RECOVERY followed by service restart OR if it was JUST the service restart and the RESTORE WITH RECOVERY had no effect.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Hello Brian and thank you for your response.

    I should have mentioned it in my initial post but there is indeed nothing in the log. Although your post made me think and it's actually interesting that there is absolutely nothing in the log. Usually when using WITH RECOVERY you get the following 3 lines:

    • Starting up database xxxxx
    • CHECKDB for database xxxxx finished [...]
    • Restore is complete on database xxxxx. The database is now available.

    But here I have nothing, not even the "Starting up". The funny thing here is that in this case I was actually doing the Recovery on 4 databases, the first 3 worked fine and I see these messages but it's on the 4th that it froze. (I don't think the number of databases is related, the first time I encountered this issue was with a single DB).

    To answer one of your question I have tested restarting the service with a database set intentionally in "Restoring". It stays "Restoring" after the restart which is interesting because it's a different behavior than my issue (mine restarts normally). I also managed to run "WITH RECOVERY" without issue after the restart.

    You might be right that there might be an issue with leaving a database in "Recovering" for a while. I would rather the database be offline as well but I've avoided doing it because as far as I know you can't go from "Recovering" straight to "Offline" (ALTER DATABASE is not permitted while a database is in the Restoring state) and I assumed there "could" be new connections in that small gap between the RECOVERY and OFFLINE.

    But now that I think about it, I'm usually in SINGLE_USER mode at this point anyway so it's probably safe. Besides, my databases are not "that" critical so even that very small risk is definitely acceptable, especially in a planned maintenance window. I will add this to my future deployment and hopefully I will have more luck.

    Thanks again!

  • The recovery process can be extremely long, especially if there were long running transactions that it has to clean up. It's not unheard of for recover to take 2-3 or even 5-6 times as long as the actual restore process. So, a large database with lots of transactions, or even only a few heavy transactions, might take 4 or 5 hours to restore and then 12 or more hours to recover. This is a big part of why lots and lots of testing is advised when figuring out your data recovery strategy.

    Also, if you reboot, the database still has to go through recovery... from the beginning. So, unless you have a good reason, I wouldn't recommend rebooting the machine.

    What you can do is look at the wait statistics, blocking processes, and even the status in sys.dm_exec_requests (it can show a percentage of a restore process, might be meaningless, might let you know what's going on). Understand if there are external processes slowing things down, or, if it's just a funky recovery process (which does happen, again, totally depending on the transactions in question).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hello Grant and thank you for your response,

    I understand the part about crash recovery (being single threaded and all) but I'm pretty sure it's not what is happening in my case. As mentioned these are very small databases and already in SINGLE_USER at this point (so no long running transactions outside my control). When restarting the service they come back online (not "Restoring") instantly (or close enough). But when I tried "WITH RECOVERY" (before the restart) it just froze and I don't even have the "Starting up database xxxxx" line in the log. If it was a case of long recovery process I would assume that the restart would have taken a long time.

    I did check for blocking process (there was none) and there was no wait_info on the query (based on sp_whoisactive, but yes I realize it highly depends on the moment I ran the stored proc). I didn't think to check wait times further or the restore process % but that's certainly something I can investigate further if it ever happens again.

    Thanks!

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply