Secondary replica database issue

  • Last Friday, we had an issue with our production SQL cluster that uses AlwaysOn Availability Groups. I am using SQL 2016 Enterprise.

    The issue caused a few of the databases to go into (Not Synchronizing / Recovery Pending) on our secondary replica node. I was able to Resume Data Movement on all of them except one. It would never complete due to some block or lock on the database. I believe it has some system level lock on it.

    I was able to remove the database from the AG from the primary node. But the replica copy stayed in the same state of (Not Synchronizing / Recovery Pending). I no longer need this database but I can't drop it, detach it, or take it offline on the secondary node. It gives me errors that the database is inaccessible.

    The database_state_desc is RECOVERY_PENDING and suspend_reason_desc is SUSPEND_FROM_REDO.

    There is a process connected to the database: Status=BACKGROUND, Login=sa, Command=DB STARTUP

    I believe this is also causing our TempDB to continue to grow in size and never release space.

    How can I regain access to this database so I can drop it?

    SQL error logs:03/17/2017 23:04:19,spid13s,Unknown,SQL Server has encountered 328 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\SQLSVR\Data\RedGateMonitor.mdf] in database id 24. The OS file handle is 0x0000000000001048. The offset of the latest long I/O is: 0x000002e3600000

    03/17/2017 23:04:00,spid109s,Unknown,During redoing of a logged operation in database 'RedGateMonitor'<c/> an error occurred at log record ID (20670:13759:5). Typically<c/> the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup<c/> or repair the database.

    03/17/2017 23:04:00,spid109s,Unknown,Always On Availability Groups data movement for database 'RedGateMonitor' has been suspended for the following reason: "system" (Source ID 2; Source string: 'SUSPEND_FROM_REDO'). To resume data movement on the database<c/> you will need to resume the database manually. For information about how to resume an availability database<c/> see SQL Server Books Online.

    03/17/2017 23:04:00,spid107s,Unknown,During redoing of a logged operation in database 'RedGateMonitor'<c/> an error occurred at log record ID (20670:13767:16). Typically<c/> the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup<c/> or repair the database.

    03/17/2017 23:04:00,spid110s,Unknown,During redoing of a logged operation in database 'RedGateMonitor'<c/> an error occurred at log record ID (20670:13694:3). Typically<c/> the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup<c/> or repair the database.

    03/17/2017 22:59:16,spid13s,Unknown,SQL Server has encountered 178 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\SQLSVR\Data\RedGateMonitor.mdf] in database id 24. The OS file handle is 0x0000000000001048. The offset of the latest long I/O is: 0x000000fe200000

    Errors at the time of the incident in Windows event log:
    SQL Server has encountered 178 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\SQLSVR\Data\RedGateMonitor.mdf] in database id 24. The OS file handle is 0x0000000000001048. The offset of the latest long I/O is: 0x000000fe200000

    A time-out occurred while waiting for buffer latch -- type 4, bp 000000203018ABC0, page 1:1034642, stat 0xf, database id: 24, allocation unit Id: 72057594059161600, task 0x0000001FD88764E8 : 0, waittime 300 seconds, flags 0x1a, owning task 0x0000001FD9D7A8C8. Not continuing to wait.

    A time-out occurred while waiting for buffer latch -- type 4, bp 000000202332C840, page 1:6398334, stat 0xf, database id: 24, allocation unit Id: 72057594063290368, task 0x0000001FDCDC6CA8 : 0, waittime 300 seconds, flags 0x1a, owning task 0x0000001FD9D7A8C8. Not continuing to wait.

    Time-out occurred while waiting for buffer latch type 4 for page (1:6398334), database ID 24.

    During redoing of a logged operation in database 'RedGateMonitor', an error occurred at log record ID (20670:13694:3). Typically, the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup, or repair the database.

    During redoing of a logged operation in database 'RedGateMonitor', an error occurred at log record ID (20670:13767:16). Typically, the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup, or repair the database.

    Always On Availability Groups data movement for database 'RedGateMonitor' has been suspended for the following reason: "system" (Source ID 2; Source string: 'SUSPEND_FROM_REDO'). To resume data movement on the database, you will need to resume the database manually. For information about how to resume an availability database, see SQL Server Books Online.

  • The situation may have resolved itself now, but when I had something similar I think I was able to restore over the database (then drop it)- is that an option for you at all?

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply