Is all of the hardware and configurations between the nodes exactly the same? Down to disk level? Drive letters? Sector size, offset, allocation unit on each disk?
No, they are not same in hardware. Yes on the configurations (not hardware). Node 1 is a physical server, Node 2/3/4 are VMs.
drive letters all exact. Offset allocation unit and all.
Also, this only started happening when we upgraded to SQL2016.
Perry Whittle (10/20/2016)
what version and edition of sql server?
is this a readable or unreadable secondary?
did the database ever reach the synchronised state?
synchronous or asynchronous secondary?
SQL 2016 with CU1
At one point they are readable secondary. We flip-flop between node 1 and node 2 so at some point one has been the primary and the other as the readable secondary.
Yes it has been in synchronized (when synchronous) or synchronizing (when asynchronous). Now, none is set to be readable secondary.
The issue seems to have something to do with filestream. When the AG is re-established from a full/diff/log backup, everything is AOK. however, when there is an entry to a filestream column, that is when primary goes to suspended mode with the "SUSPEND_FROM_CAPTURE" reason.
If we suspend the database for node 2, and then resume in node 1 (the primary). Everything is good.
The moment when we try to resume node 2... which is like Node 2 is trying to capture the unsent logs from Node 1, that log has a transaction involving a filestream, the AG synch for that database goes suspended again.
At least that's what seems to be going on.
The issue has been escalated to a higher level engineer now. 🙁