Log shipping failing all of a sudden

  • I have several databases on a 2012 instance that are mirrored to a second server, and log ship to a third server for reporting purposes.

    Recently, for two of the databases, the log shipping has been failing at least once per day, and sometimes more often.

    i have deleted and recreated both the mirroring and log shipped databases on several occasions, but the problem is still happening.

    The log shipping restore jobs don't get marked as failed in the job history, but if you expand the history you can see errors such as this:

    Date21/11/2013 13:00:01

    LogJob History (LSRestore_OAK-DC\SQLDB1_OtherCarrierLive)

    Step ID1

    ServerCHERRY

    Job NameLSRestore_OAK-DC\SQLDB1_OtherCarrierLive

    Step NameLog shipping restore log job step.

    Duration00:00:30

    Sql Severity0

    Sql Message ID0

    Operator Emailed

    Operator Net sent

    Operator Paged

    Retries Attempted0

    Message

    2013-11-21 13:00:31.95*** Error: Could not apply log backup file '\\as2\dbbackups\TransactionLog\ReportOtherCarrierLive\OtherCarrierLive_20131121124501.trn' to secondary database 'ReportOtherCarrierLive'.(Microsoft.SqlServer.Management.LogShipping) ***

    2013-11-21 13:00:31.95*** Error: An error occurred while processing the log for database 'ReportOtherCarrierLive'. If possible, restore from backup. If a backup is not available, it might be necessary to rebuild the log.

    An error occurred during recovery, preventing the database 'ReportOtherCarrierLive' (5:0) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.

    RESTORE LOG is terminating abnormally.

    Processed 0 pages for database 'ReportOtherCarrierLive', file 'livesystem_Data' on file 1.

    Processed 1 pages for database 'ReportOtherCarrierLive', file 'livesystem_Log' on file 1.(.Net SqlClient Data Provider) ***

    2013-11-21 13:00:31.95*** Error: Could not log history/error message.(Microsoft.SqlServer.Management.LogShipping) ***

    2013-11-21 13:00:31.95*** Error: ExecuteNonQuery requires an open and available Connection. The connection's current state is closed.(System.Data) ***

    2013-11-21 13:00:31.95Skipping log backup file '\\as2\dbbackups\TransactionLog\ReportOtherCarrierLive\OtherCarrierLive_20131121124501.trn' for secondary database 'ReportOtherCarrierLive' because the file could not be verified.

    2013-11-21 13:00:31.97*** Error: Could not log history/error message.(Microsoft.SqlServer.Management.LogShipping) ***

    2013-11-21 13:00:31.97*** Error: ExecuteNonQuery requires an open and available Connection. The connection's current state is closed.(System.Data) ***

    2013-11-21 13:00:31.97The restore operation was successful. Secondary Database: 'ReportOtherCarrierLive', Number of log backup files restored: 0

    2013-11-21 13:00:31.97*** Error: Could not log history/error message.(Microsoft.SqlServer.Management.LogShipping) ***

    2013-11-21 13:00:31.97*** Error: ExecuteNonQuery requires an open and available Connection. The connection's current state is closed.(System.Data) ***

    2013-11-21 13:00:31.97Deleting old log backup files. Primary Database: 'OtherCarrierLive'

    2013-11-21 13:00:31.97*** Error: Could not log history/error message.(Microsoft.SqlServer.Management.LogShipping) ***

    2013-11-21 13:00:31.97*** Error: ExecuteNonQuery requires an open and available Connection. The connection's current state is closed.(System.Data) ***

    Restoring a new backup of the database cures the problem for anywhere between 15minutes and 12 hours, but it always seems to re-occur.

    I have run DBCC on the source databases with no errors reported, and five other databases have the log shipping working without errors.

    Any hints what to look for? Any idea why the history is not showing failures?

  • 2013-11-21 13:00:31.95Skipping log backup file '\\as2\dbbackups\TransactionLog\ReportOtherCarrierLive\OtherCarrierLive_20131121124501.trn' for secondary database 'ReportOtherCarrierLive' because the file could not be verified.

    This line in the error is where I would start. Do you see the same filename (OtherCarrierLive_20131121124501.trn) in every failure of the restore. if so check if the file exists in the specified location. This location should be the destination of the copy job. if the file does exist there check if it exists in your backup location for the primary. If you can find both files compare them(size, modified,etc).

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • The files exist and are being copied correctly every 15 minutes

  • possibly the file \\as2\dbbackups\TransactionLog\ReportOtherCarrierLive\OtherCarrierLive_20131121124501.trn is in use and cannot be accessed by SQL server. You can quickly test this by trying to move(not copy) it to a different folder or use Process Explorer or handle to identify which process(if any) is accessing the file.

    You had mention that only two out of several secondaries are failing. Do the one that are working successfully restore thier backups from the same location (\\as2\dbbackups\TransactionLog\ReportOtherCarrierLive\)? I would guess if it is the same it would be here \\as2\dbbackups\TransactionLog\ with ReportOtherCarrierLive being specific to the database. Does the sql agent service account still have permission to this share.

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • Each database saves to the same location on a NAS drive, but into a database specific subdirectory, and the SQL service has full permissions to the location

  • you could try restoring the file manually. If it works then that will prove there are no issues with the file. If it fails you may get a different error.

    restore log ReportOtherCarrierLive from '\as2\dbbackups\TransactionLog\ReportOtherCarrierLive\OtherCarrierLive_20131121124501.trn' with norecovery;

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • A restore of the offending transaction log worked.

    I have noticed that the database, when the log shipping fails, is shown as "Restoring" in the database list, so has the previous log restore "stalled" in some way?

  • log shipping secondaries will either be in restoring (norecovery) or standby. Do you set your secondaries to standby (thus restoring is an abnormal state)?

    so you can restore the file manually. Does the next trans log backup fail in the same way?

    check msdb.dbo.log_shipping_monitor_error_detail on the secondary server or the monitor server(should be the same info as your error).

    SELECT log_time,message

    FROM log_shipping_monitor_error_detail

    where database_name = '<dbname>'

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • The log shipping is set to leave the target in standby mode, but when it goes wrong it seems to be leaving the database in no recovery mode

  • does your undo(.tuf) file exist and is the creation time consistent with the failed load or the last successful load?

    Are there any errors in the sql error log or the application event log at the time of the failed tlog restore?

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • Dear Chris,

    Please advise if this issue was resolved. I am also facing the similar issue.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply