Errors discovered in event log during log shipping process

  • Hi,

    Since I created my log shipping setup I have been getting Errors in the event log round about 4AM everymorning to the point where I am checking other services writing out to the SAN or operating at that time. The Errors are regarding the backup of the transaction log to my SAN. The mssql log shipping setup has carried on as normal and hasn't been stopping because of the errors below. However, I have had one episode where I had to rescue the log shipping secondary database as it was trying to restore a corrupted transaction log file. I would like to know if there is any chance that I am loosing any data based on the errors that I have been getting and that data not being present in the secondary database? Here are the errors and they do appear in this sequence every time albeit on the odd occasion where step 5 is missing. Many Thanks

    1)

    BackupIoRequest::ReportIoError: write failure on backup device '\\192.168.1.100\db01\db01_20110526033011.trn'. Operating system error 2(failed to retrieve text for this error. Reason: 15105).

    2)

    BACKUP failed to complete the command BACKUP LOG db01. Check the backup application log for detailed messages.

    3)

    The operating system returned the error '64(failed to retrieve text for this error. Reason: 15105)' while attempting 'SetEndOfFile' on '\\192.168.1.100\db01\db01_20110526033011.trn'.

    4)

    The operating system returned the error '64(failed to retrieve text for this error. Reason: 15105)' while attempting 'FlushFileBuffers' on '\\192.168.1.100\db01\db01_20110526033011.trn'.

    5)

    BackupIoRequest::ReportIoError: write failure on backup device '\\192.168.1.100\db01\db01_20110528033010.trn'. Operating system error 1236(failed to retrieve text for this error. Reason: 15105).

    If you would me to provide any more information then please let me know.

    Many Thanks

    Natty

  • ok - I can't draw anything from the errors but I have worked with san storage from way back when. The real problem with storage this way is that often vendors tell you all sorts of wonderful things about what you can do with the storage ( but most don't work with sql server or probably other rdbms )

    so find out if your san is being backup up at this time, make sure you mention replication, snapshots and specify "absolutley anything" in your question on what's happening on the storage. ( I find storage admins can be very defensive ) I'd guess that the san is being snapshot backed up or similar, the ops tend to do what they call file quiesce http://en.wikipedia.org/wiki/Quiesce this will have this type of effect as what they do is lock open files to back them up ( sql server doesn't like this ) I've encountered this type of problem several times.

    Hope this helps.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • I second the storage comments (and demeanour of people administering them).

    Carlton.

  • All I can say is that it is in the errors.

    If I hadn't opened up a command prompt and checked out the error messages it wouldn't have made me look at the situation in more detail.

    Using net helpmsg 2 or net helpmsg 64 helped me identify that the clue is in the errors, so I decided to look a little deeper into the logs.

    Having already applied logging to the agents I was able to get an account of the backup agent's progress.

    2011-06-08 03:00:10.93----- START OF TRANSACTION LOG BACKUP -----

    2011-06-08 03:00:10.99Starting transaction log backup. Primary ID: '48s03d90-83s1-49s8-48s2-ssy389sd7397'

    2011-06-08 03:00:13.37Retrieving backup settings. Primary ID: '48s03d90-83s1-49s8-48s2-ssy389sd7397'

    2011-06-08 03:00:13.38Retrieved backup settings. Primary Database: 'db01', Backup Directory: '\\192.168.1.100\db01', Backup Retention Period: 12960 minute(s), Backup Compression: Server Default

    2011-06-08 03:00:13.44Backing up transaction log. Primary Database: 'db01', Log Backup File: '\\192.168.1.100\db01\db01_20110608020013.trn'

    2011-06-08 03:02:18.79*** Error: Backup failed for Server 'db01'. (Microsoft.SqlServer.SmoExtended) ***

    2011-06-08 03:02:18.92*** Error: An exception occurred while executing a Transact-SQL statement or batch.(Microsoft.SqlServer.ConnectionInfo) ***

    2011-06-08 03:02:18.92*** Error: Write on "\\192.168.1.100\db01\db01_20110608020013.trn" failed: 2(failed to retrieve text for this error. Reason: 15105)

    BACKUP LOG is terminating abnormally.

    100 percent processed.(.Net SqlClient Data Provider) ***

    2011-06-08 03:02:21.07*** Error: Could not log history/error message.(Microsoft.SqlServer.Management.LogShipping) ***

    2011-06-08 03:02:21.07*** Error: Procedure or function sp_MSproxylogshippingmonitorerror has too many arguments specified.(.Net SqlClient Data Provider) ***

    2011-06-08 03:02:21.08----- END OF TRANSACTION LOG BACKUP -----

    So then I double checked all the files in the SAN and noticed that I was in fact missing logs in the chain although shipping wasn't failing.

    So then I investigated all the other processes that might be writting to the SAN at the time of night the errors were failing and I managed to locate a few more backups occuring on the same subnet that could possibly put the log shipping backups into a problem assuming all resources are being consumed for the other backups or the network is too heavily congested for the backups to safely be written to the SAN.

    I made some adjustments to the other backups as I have total control of that side of things and the backup error time changed as I put some backups to run a little earlier.

    So if you do get any errors, investigate them as they can provide at least an initial eye opener into solving what ever problems you may be having.

    Thanks for all your help

    natty

  • operating system error 2 means the actual file specified cannot be found. Im guessing by the UNC path that you are backing to a local that is not local. Try backing up to a local drive and then allow the copy job to pull to the secondary server

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • I did try to backup data to the local disk and try to allow for the secondary agent to pull data from the disk on the primary server but I ran into some permission errors. I can't quite remeber exactly what they were but I couldn't get the agent to authenticate correctly with the share. Thinking about it again I could probably manage it now. However, at the time and where I was with deployment I decided that it would be best if I moved the data on to another storage point to which I was lucky enough to get the shares to work for the two servers and their agents and I was able to get the agents to talk to the share.

    However, to bring all of the above up to speed the SAN was being hammered by some other backups which were knocking the transaction logs out of the window causing data loss to happen. I did move the backup windows on the items that were hammering the SAN and the eventlog ID's went away. I am still a little wary as to whether I can unleash the solution just yet as I am sure this will happen again further down the line.

    I might just change the SAN and keep the other SAN for backups and not share it with Transaction Log backups.

    Thanks again

    natty

  • well, the error 2 indicates you may still be having those permissions errors. Whether the files are stored on a share on the primary, secondary or a file server the agents on both servers need to be able to read, write and delete form that location. Are the servers on the same domain and network?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply