Drop Large Table causes AlwaysON AG Latency

  • BrerSQL

    SSCommitted

    Points: 1672

    We are running SQL Server 2012 with an asynchronous READ ONLY AlwaysON Availability Group with two databases participating.

    We had an issue where we dropped a large defunct table (34 million rows, 16GB including all indexes) from one of our AG databases.

    The operation completed almost instantaneously on our production server but we experienced severe latency for the AlwaysON secondary READ ONLY copy (~45 minutes of latency).

    In this statement I would define latency as the Estimated Data Loss in seconds as reported on the dashboard.

    What we observed via our monitors and perfmon logs...

    * CPU on our primary server spiked to 50% for a little more than a minute (this is abnormal for this server as it is over configured and runs at 10%)

    * Estimated Data Loss in seconds spiked to 45 minutes ( typically hovers at sub-second )

    * Recovery Queue perfmon counter on AO server for affected DB sky rockets to 40,000 kb's and steadily increases to 1,500,000 kb's over a 40 minute timeframe. It then comes down to normal which is somewhere in the 50KB range.

    * On the AO server affected Redone Bytes/sec drops down to 0 for the duration of the latency until it starts to recover

    I have done some research and this is what I have found and I was hoping to clarify that I am correct in my assessment.

    Also, I would like to know if there is a way to drop a large table without incurring the same latency? (other than a slow trickle purge/delete of the table before dropping).

    In Paul Randal's article "The Myth that DROP and TRUNCATE TABLE are Non-Logged"

    http://sqlperformance.com/2013/05/sql-performance/drop-truncate-log-myth

    Paul explains the concept of the 'deferred drop queue' that allows for the DROP and TRUNCATE command to be a minimally logged operation.

    This is a way for SQL Server to perform a DROP on a large table immediately by unhooking the page allocations for the table and deferring the deallocations of pages and extents to the 'deferred drop queue'.

    If I understand this properly, this allows my drop operation to occur almost immediately while deferring the heavy lifting to a "deferred-drop mechanism" that occurs in the background for my primary production server.

    My assessment is that the AO secondary READ ONLY AG does not have the concept of a 'deferred drop queue' causing the deferred logged actions to be performed inline which in-turn backs up the recovery queue while it processes all the operations that were performed in a deferred mode on the primary.

    Please tell me if I am at least on the right planet with my assessment.

    Which leads me to two follow up questions:

    1 - Does Microsoft plan on fixing this for secondary READONLY servers?

    2 - Is there a better way of dropping/truncating a table on an AO AG primary server DB where you will not incur such harsh latency?

    Thank you

  • TheSQLGuru

    SSC Guru

    Points: 134017

    1) Check Connect to see if there are any similar issues posted there.

    2) I would not expect this to be improved by Microsoft.

    3) This could also be because of your READ_ONLY secondary. That system causes the use of the version store IN YOUR PRIMARY TOO, which among other things can lead to that 14-byte pointer bloating and fragmenting your primary database tables and indexes just by reading from the secondary. Can you make a copy of your database in a test environment and see if you see the same performance hit when you truncate that big table with NO access set for the secondary AG database copy?? I would love to hear the results of that testing - and other's might too.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply