Database Restore Hangs

  • Wondering if anyone has seen the following:

    We have a sql server 2008 R2 SP2 running on a virtual server (vmware) which runs windows server 2008 R2 sp1. This server is for training and on it we have a sql agent job that runs a powershell script to refresh a training environment back to a baseline. The basic steps in this script are

    -Stop the application services

    -Backup the current database

    -Restore over the top of the current database with a backup (.bak) file on disk which is the baseline

    -Start the application services

    This works the majority of the time, however I have seen on the odd occasion it fails on the restore. When invoking the restore command, it puts the database in to a restoring state and then does nothing (does not appear to begin the restore). Eventually the invoke command times out and the job is marked as failed. A few things to note:

    -The backup command was successful (verbose output confirms this)

    -Immediately prior to invoking the restore command, the script runs ALTER DATABASE MyDatabase SET SINGLE_USER WITH ROLLBACK IMMEDIATE which was successful (confirmed in errorlog)

    -The restore command is RESTORE DATABASE MyDatabase FROM disk = ‘<pathtobackup>\backup.bak’ WITH REPLACE, STATS

    -The database goes in to a restoring state however no verbose output ever comes (id expect to see 10% processed, 20% etc) so it’s not like its simply been left in a non recovered state

    -There is nothing in the errorlog in relation to the restore command

    -There is nothing in the windows event logs in regards to the restore or any log entries around that time

    -There is no blocking in sql ( the single user command with rollback was successful)

    -There are no disk capacity issues nor would be once the database is restored. It’s a small database (500MB mdf, 70MB ldf)

    -When it had timed out and I got the alert, I checked the processes and there was nothing for the restore command nor was there any other process that had a hold of that particular database

    -This whole process works most of the time so there is no typos with the scripts

    -No other backups or restores are happening at the time the problem occurred

    -I have seen this on occasion when doing a restore via ssms it just sits there executing but never shows any verbose stats. I have to cancel the restore and run it again. Its small databases too so its not like I am just being impatient. I cannot recall if it was this server or not as it was quite a while ago

    -If I rerun the exact restore command it works (which is what I end up doing but I’d like to know the cause)

    -Timeout error is …. Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

    -The sql job runs on the same instance where the database resides

    Seems very adhoc but there is no evidence to indicate what the issue might be. Perhaps it’s the setting it to single user however I would expect to see errors in the output or logs indicating this. Alternatively perhaps something with it being a vm.

  • since this is occasionally, I would suspect the restore process not being the 'single user'.

    Can you double check there isn't anything between the 'set single user' and the 'restore database'.

    That's the reason I don't use "set single user", but put it offline, right before the restore.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • It sounds very much like the database isn't quite in single user mode.

    If users were connected, I would expect an error message from the restore script.

    Is it worth putting a check in prior to the restore script running to ensure the database is in single user mode and there are no users hitting that database (I know you shut down the application service, but I am wondering about the possibility of spids from alternative connections such as SSMS).

    Any disk IO issues noted? I've seen a couple of cases way back where the disk was exceptionally busy causing the restore to hang - Retried when the queue was back to normal and the backup worked.

  • ALZDBA (1/8/2015)


    ...That's the reason I don't use "set single user", but put it offline, right before the restore.

    +1 - I always SET OFFLINE, never SINGLE_USER, as you can't be certain that your session will be the one that gets the lone connection, esp. when your applications / users are still connecting. You don't need to ONLINE the database after the restore, so you're not adding an additional step 🙂

    Oh, and I always use ROLLBACK IMMEDIATE as well. I figure that if I need to restore a database, it's probably pretty urgent...:-P

    [font="Courier New"]sqlmunkee[/font]
    [font="Courier New"]Bringing joy and happiness via SQL Server since 1998[/font]

  • sqlmunkee (1/8/2015)


    ALZDBA (1/8/2015)


    ...That's the reason I don't use "set single user", but put it offline, right before the restore.

    +1 - I always SET OFFLINE, never SINGLE_USER, as you can't be certain that your session will be the one that gets the lone connection, esp. when your applications / users are still connecting. You don't need to ONLINE the database after the restore, so you're not adding an additional step 🙂

    Oh, and I always use ROLLBACK IMMEDIATE as well. I figure that if I need to restore a database, it's probably pretty urgent...:-P

    I'm not a fan of ROLLBACK IMMEDIATE. I had a coworker who put it in all his restore scripts too. Then one day when he was restoring to the QA server... oops, that was the Production server. Hilarity (aka PAIN) ensues.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks all for the replies. Very helpful. I am definitely running set single user and then immediately running the restore command so nothing in between. There should be no other connections when the application stops but I cannot be 100% sure. I will definitely try with the set offline.

    I would most likely use rollback immediate to ensure there are no connections blocking it, and since it is scripted only against this training instance, its fairly safe to have that command in there, but I do understand nothing in life is certain except death and taxes. The only thing that makes me think that this would not be a cause though is that I have seen the exact same thing happen when I have done a restore via ssms. It just sits there in the executing state but I get no progress indication and it will sit there until I cancel it. From memory I think in those cases I see the restore process in sp_who2 as runnable with no blocking but just sits there. I will have to investigate more to be sure though next time it happens.

    Disk IO.. yes potentially and imo this would make more sense. Should I see any entries in sql errorlog or event viewer though, as I see no entries about anything at that point in time. Perhaps I need to look at vmware logs?

  • FWIW I hope you enabled Instant File Initialization as it also works for restore operations !

    Instant File Initialization and restores[/url]

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (1/8/2015)


    FWIW I hope you enabled Instant File Initialization as it also works for restore operations !

    Instant File Initialization and restores[/url]

    I haven't on this particular server as the database is small and restore takes a matter of seconds, so the duration is of no concern for me. However are you suggesting that turning on this setting may assist with the occasional restore hang?

    Thanks

  • burfos (1/9/2015)


    ALZDBA (1/8/2015)


    FWIW I hope you enabled Instant File Initialization as it also works for restore operations !

    Instant File Initialization and restores[/url]

    I haven't on this particular server as the database is small and restore takes a matter of seconds, so the duration is of no concern for me. However are you suggesting that turning on this setting may assist with the occasional restore hang?

    Thanks

    As the zeroing out of the data file will also need time, that can be a point of concern.

    I have never encountered your situation myself !

    Maybe the trace flags in the article can help you debunk this issue: Instant File Initialization and restores[/url]

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (1/9/2015)


    burfos (1/9/2015)


    ALZDBA (1/8/2015)


    FWIW I hope you enabled Instant File Initialization as it also works for restore operations !

    Instant File Initialization and restores[/url]

    I haven't on this particular server as the database is small and restore takes a matter of seconds, so the duration is of no concern for me. However are you suggesting that turning on this setting may assist with the occasional restore hang?

    Thanks

    As the zeroing out of the data file will also need time, that can be a point of concern.

    I have never encountered your situation myself !

    Maybe the trace flags in the article can help you debunk this issue: Instant File Initialization and restores[/url]

    Yeah I think the best bet is the trace flags. Will look in to that. Thanks!

  • Just remember that Instant File Initialization doesn't do squat for log files.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply