We are running SQL Server 2012 with an asynchronous READ ONLY AlwaysON Availability Group with two databases participating.
We had an issue where we dropped a large defunct table (34 million rows, 16GB including all indexes) from one of our AG databases.
The operation completed almost instantaneously on our production server but we experienced severe latency for the AlwaysON secondary READ ONLY copy (~45 minutes of latency).
In this statement I would define latency as the Estimated Data Loss in seconds as reported on the dashboard.
What we observed via our monitors and perfmon logs...
* CPU on our primary server spiked to 50% for a little more than a minute (this is abnormal for this server as it is over configured and runs at 10%)
* Estimated Data Loss in seconds spiked to 45 minutes ( typically hovers at sub-second )
* Recovery Queue perfmon counter on AO server for affected DB sky rockets to 40,000 kb's and steadily increases to 1,500,000 kb's over a 40 minute timeframe. It then comes down to normal which is somewhere in the 50KB range.
* On the AO server affected Redone Bytes/sec drops down to 0 for the duration of the latency until it starts to recover
I have done some research and this is what I have found and I was hoping to clarify that I am correct in my assessment.
Also, I would like to know if there is a way to drop a large table without incurring the same latency? (other than a slow trickle purge/delete of the table before dropping).
In Paul Randal's article "The Myth that DROP and TRUNCATE TABLE are Non-Logged"
Paul explains the concept of the 'deferred drop queue' that allows for the DROP and TRUNCATE command to be a minimally logged operation.
This is a way for SQL Server to perform a DROP on a large table immediately by unhooking the page allocations for the table and deferring the deallocations of pages and extents to the 'deferred drop queue'.
If I understand this properly, this allows my drop operation to occur almost immediately while deferring the heavy lifting to a "deferred-drop mechanism" that occurs in the background for my primary production server.
My assessment is that the AO secondary READ ONLY AG does not have the concept of a 'deferred drop queue' causing the deferred logged actions to be performed inline which in-turn backs up the recovery queue while it processes all the operations that were performed in a deferred mode on the primary.
Please tell me if I am at least on the right planet with my assessment.
Which leads me to two follow up questions:
1 - Does Microsoft plan on fixing this for secondary READONLY servers?
2 - Is there a better way of dropping/truncating a table on an AO AG primary server DB where you will not incur such harsh latency?