Turning off Lock Partitioning - what's the worst that can happen?

  • Hello, does anyone have experience of turning off lock partitioning?

    During online index rebuilds on our SQL Server 2016 (SP2 CU8) Availability Group, we are experiencing difficulties with an unpredictable blocking chain on the read-only secondary nodes, and I have established that this is due to Lock Partitioning.

    WAIT_AT_LOW_PRIORITY gives us a an excellent chance of doing some online index rebuilds on the Primary node, and we accept some blocking on the readable secondaries while the redo spid is blocked by a long running report, for example, and in turn the redo spid blocks other reports.  However, because of what I believe is a "bug" with Lock Partitioning*, the blocking chain is random and impossible to control.

    Therefore, to do our online index rebuilds, we want to have a period without Lock Partitioning on the readable secondary nodes, by applying trace flag 1229 and performing the requisite SQL Server service restart.  Furthermore, because we are not able to schedule 2 x SQL server restarts on each of the readable secondaries (i.e. we want to limit it to one restart), we will need to keep Lock Partitioning disabled for an extended period, ideally one month.

    I am concerned that I do not fully understand what the implication of this might be during a typical working day.  The Lock Partitioning section of the Transaction Locking and Row Versioning Guide suggests that performance might be impacted due to spinlock-related contention.  It would also seem that, without lock partitioning, memory blocks (holding locking structures) may need to be transferred between CPUs.  So, all in all, it would seem that CPU efficiency may be an issue, so I would hazard a guess that we may see x% CPU increase for the same workload and perhaps each SOS scheduler yield wait may increase y%.

    My concern is that I just have no clue how bad x and y may get.  I have applied trace flag 1229 on a test SQL Server where I have generated a reasonable load, and I can report "no noticeable issues".  However, I am unable to generate a Production-like load on this server.

    So, my questions are: Has anyone tried turning off lock partitioning on a readable AG secondary, or any other SQL Server for that matter?  If so, please could you tell me if you experienced any performance degradation?

    Many thanks.

     

    * In a nutshell, the "bug"/behaviour is that a writer spid’s exclusive lock is only taken against the lock partitions 0-n, where n is the partition at which it is blocked by a SCH-S lock. Partitions N+1 and above remain free of any locks, and therefore additional read queries are free to acquire their SCH-S locks on these higher lock partitions (despite their lock requests starting after the SCH-M started waiting).  Those SCH-S locks can later “jump in” and block the SCH-M once its original blocker has cleared.  In my screenshots 1-4, spid 62 is my initial reader, spid 402 is my writer, spid 222 is a reader that starts after 402 is waiting but then cheekily becomes spid 402’s new lead blocker.  Just to clarify, this can be reproduced on any SQL Server 2016 with lock partitioning enabled; it doesn't relate to Always On AG.

    Attachments:
    You must be logged in to view attached files.
  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply