How to recover AG when error is SUSPEND_FROM_CAPTURE

Question

Post reply

How to recover AG when error is SUSPEND_FROM_CAPTURE

SQLSalas

Ten Centuries

Points: 1327
More actions
October 17, 2016 at 6:46 pm

#314520

DESCRIPTION:Always On Availability Groups data movement for database '' has been suspended for the
following reason: "system" (Source ID 3; Source string: 'SUSPEND_FROM_CAPTURE'). To resume data movement on the database, you will need to resume the database manually. For information about how to resume an availability database, see SQL Server Books Online.
I have tried to resume but AG keeps on going to SUSPENDED state. Not much information given as to why it's going suspended? I am hoping not to re-establish the AG for the database from scratch.
Where to look???

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Beatrix Kiddo SSC-Dedicated Points: 32407 More actions · Answer 1

Beatrix Kiddo

SSC-Dedicated

Points: 32407

October 18, 2016 at 3:50 am

#1907073

Anything in the dashboard or the error log?

SQLSalas Ten Centuries Points: 1327 More actions · Answer 2

Nothing really. The dashboard just says Not Synchronizing. The error log is simply that message. No prior message of what it could've been. Space is all good, at least as we see it.

So when we try to manually resume it the database hadr, it always goes to Suspended after awhile.

For now, we just went ahead and remove the database from AG, and then add it back. In some nodes, we were just able to let the logs catch up (restore logs from where it could've possibly stopped synching. It will say whether the log file is too early or too recent). And then just re-join.

There is however one node, that as soon as we join it, AG goes Suspended. So for that node, we just re-established it by restoring from full, then the subsequent logs...

However.... this is also where it bombs out. Everytime we join to this second node... it goes to "Success", and then suspends... so something in this Node 2. Space issue is good as we see it in the OS level. (it's a VM)

Currently, in comms with Microsoft regarding this issue. Will provide updates when there is so to share the info.

Thanks!

Perry Whittle SSC Guru Points: 233914 More actions · Answer 3

find it hard to believe theres no information in the sql server log or event logs

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

SQLSalas Ten Centuries Points: 1327 More actions · Answer 4

Well not entirely no information. Not even tagged an Error, just an information. I would pretty much appreciate any additional information.

For now, this IS the information:

"Always On Availability Groups data movement for database '<DatabaseName>' has been suspended for the following reason: "system" (Source ID 3; Source string: 'SUSPEND_FROM_CAPTURE'). To resume data movement on the database, you will need to resume the database manually. For information about how to resume an availability database, see SQL Server Books Online."

And note that this only happens when we try to join a specific node. Even if in that specific node, we just restored from Full backup, etc.

Jared SSC Guru Points: 61793 More actions · Answer 5

Is all of the hardware and configurations between the nodes exactly the same? Down to disk level? Drive letters? Sector size, offset, allocation unit on each disk?

Jared
CE - Microsoft

Perry Whittle SSC Guru Points: 233914 More actions · Answer 6

SQLSalas (10/19/2016)
Well not entirely no information. Not even tagged an Error, just an information. I would pretty much appreciate any additional information.
For now, this IS the information:
"Always On Availability Groups data movement for database '<DatabaseName>' has been suspended for the following reason: "system" (Source ID 3; Source string: 'SUSPEND_FROM_CAPTURE'). To resume data movement on the database, you will need to resume the database manually. For information about how to resume an availability database, see SQL Server Books Online."
And note that this only happens when we try to join a specific node. Even if in that specific node, we just restored from Full backup, etc.

what version and edition of sql server?

is this a readable or unreadable secondary?

did the database ever reach the synchronised state?

synchronous or asynchronous secondary?

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

SQLSalas Ten Centuries Points: 1327 More actions · Answer 7

SQLKnowItAll (10/19/2016)
Is all of the hardware and configurations between the nodes exactly the same? Down to disk level? Drive letters? Sector size, offset, allocation unit on each disk?

No, they are not same in hardware. Yes on the configurations (not hardware). Node 1 is a physical server, Node 2/3/4 are VMs.

drive letters all exact. Offset allocation unit and all.

Also, this only started happening when we upgraded to SQL2016.

Perry Whittle (10/20/2016)
what version and edition of sql server?
is this a readable or unreadable secondary?
did the database ever reach the synchronised state?
synchronous or asynchronous secondary?

SQL 2016 with CU1

At one point they are readable secondary. We flip-flop between node 1 and node 2 so at some point one has been the primary and the other as the readable secondary.

Yes it has been in synchronized (when synchronous) or synchronizing (when asynchronous). Now, none is set to be readable secondary.

The issue seems to have something to do with filestream. When the AG is re-established from a full/diff/log backup, everything is AOK. however, when there is an entry to a filestream column, that is when primary goes to suspended mode with the "SUSPEND_FROM_CAPTURE" reason.

If we suspend the database for node 2, and then resume in node 1 (the primary). Everything is good.

The moment when we try to resume node 2... which is like Node 2 is trying to capture the unsent logs from Node 1, that log has a transaction involving a filestream, the AG synch for that database goes suspended again.

At least that's what seems to be going on.

The issue has been escalated to a higher level engineer now. 🙁