Excessive Blocking despite Read Committed Snapshot Isolation

Question

Post reply

Excessive Blocking despite Read Committed Snapshot Isolation

Leo.Miller

SSChampion

Points: 13070
More actions
March 10, 2021 at 10:20 pm

#3856252

I've got a DynamicsAX database that's seeing a significant amount of blocking despite having RCSI set and the connections using Read Committed.
This Microsoft article ( https://docs.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-ver15&viewFallbackFrom=sql-server-ver12 ) confirms that there should be far less blocking, under the READ COMMITTED section it says: If READ_COMMITTED_SNAPSHOT is set to ON (the default on Azure SQL Database), the Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions.
So why am I seeing a lot of long running blocking? Could it be that somewhere in the Dynamics process flow the app is taking out a specific lock against some objects?
An sp_lock on the blocking SPID shows me this:
As far as I can see the UPDSTAS lock is from auto update stats against a highly volatile but small table. I've net yet captured the specific object the blocked thread is waiting on.
Leo
Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Jo Pattyn SSC-Dedicated Points: 32641 More actions · Answer 1

I don't have the Azure version, but you could capture blocking events for review with Extended-Events on Azure SQL.

It is briefly mentioned in https://docs.microsoft.com/en-us/azure/azure-sql/database/understand-resolve-blocking

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 2

There's nothing magical about snapshot isolation. It helps with reads and blocking. It doesn't help with writes and blocking. If more than one person is attempting to update a given page, they still have to wait for access to that page until the first person completes their update.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 3

Leo.Miller wrote:

I've got a DynamicsAX database that's seeing a significant amount of blocking despite having RCSI set and the connections using Read Committed.

Aren't most of the Clustered Indexes in DynamicsAX based on Random GUIDs? If so, let me know because I actually have a surprisingly easy fix for such problems.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Leo.Miller SSChampion Points: 13070 More actions · Answer 4

No, at least not WRT the tables where I'm seeing blocking. Most of the clustered indexes are now PARTITIONID, DATAAREAID, TableSpecificID. Typically bigint, nvarchar, bigint.

One issue I have is that for most clients I've worked with, for any one company, the first two columns have a cardinality of 1. So you only see any variance in the 3rd column. RECID (bigint) seems to be unique, but it's no longer being used for clustering.

Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Leo.Miller SSChampion Points: 13070 More actions · Answer 5

A follow on of what I have found. There are a lot of SELECT....with (UPDLOCK) commands. Since the database is already set for Read Committed Snapshot Isolation (to reduce deadlocks, etc.). I assume this is being used to prevent multiple users updating records under each other. The idea being to stop double selling of stock. The problem is, if two connections (users) use this hint in transactions, you see blocking, even if RCSI is enabled.

At the moment it looks like the heavy use of the UPDLOCK hint with the amount of time users take to enter and process orders is causing the extensive blocking. The problem now is to persuade the DynamicsAX developers to remove the locking hint, or to do a first read, determine the stock level and if it's below some threshold resubmit the select with the locking hint. Locking an item record just in case it may get double sold when there's more than enough stock is incredibly inefficient, and they have users sitting for 20 minutes plus waiting for their queries to complete. Rather read the data twice on those few occasions when it's a risk not to lock the record.

Are there any other good options? Anyone know DynamicsAX who could help here?

Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 6

Yikes.

Nope. I'm not going to be much help here with DynamicsAX.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ScottPletcher SSC Guru Points: 101238 More actions · Answer 7

Also, be aware RCSI has a 14 byte per row added overhead. If there isn't space available in the existing data pages, you can likely expect a lot of page splits from adding that 14 bytes, which will also slow down processing.

Hold as few a rows as possible with UPDLOCK. For example, if you have to scan thru several rows to identity a single row to hold for someone, do the scan with a simple SELECT and only do the UPDLOCK on the single row they really need to hold, even if you have to re-read just that one row. Again, if that's possible to do.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Ken McKelvey SSCoach Points: 18986 More actions · Answer 8

Leo.Miller wrote:

A follow on of what I have found. There are a lot of SELECT....with (UPDLOCK) commands. Since the database is already set for Read Committed Snapshot Isolation (to reduce deadlocks, etc.). I assume this is being used to prevent multiple users updating records under each other. The idea being to stop double selling of stock. The problem is, if two connections (users) use this hint in transactions, you see blocking, even if RCSI is enabled.
At the moment it looks like the heavy use of the UPDLOCK hint with the amount of time users take to enter and process orders is causing the extensive blocking. The problem now is to persuade the DynamicsAX developers to remove the locking hint, or to do a first read, determine the stock level and if it's below some threshold resubmit the select with the locking hint. Locking an item record just in case it may get double sold when there's more than enough stock is incredibly inefficient, and they have users sitting for 20 minutes plus waiting for their queries to complete. Rather read the data twice on those few occasions when it's a risk not to lock the record.
Are there any other good options? Anyone know DynamicsAX who could help here?

I have never used Dynamics AX but this looks like pessimistic concurrency control. It seems this is configurable in Dynamics AX so it may be worth experimenting with optimistic concurrency control in a test environment.

https://docs.microsoft.com/en-us/dynamicsax-2012/developer/optimistic-concurrency-control

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 9

If I am reading the documentation correctly - switching to OCC in DynamicsAX changes the UPDLOCK to NOLOCK. If that is the case, then I wouldn't make that switch - I would much rather have blocking than invalid/incorrect data.

This assumption could be wrong - I found this: https://cloudblogs.microsoft.com/dynamics365/no-audience/2009/07/08/about-locking-and-blocking-in-dynamics-ax-and-how-to-prevent-it/

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Ken McKelvey SSCoach Points: 18986 More actions · Answer 10

That is strange.

I expect OCC not to have any lock hints for the initial SELECT especially with RCSI. You could try OCC in a test environment to see if any NOLOCKS are produced. If they are I would raise a support call to ask why. It may be that the ROWVERSION column is enough to stop problems with phantom reads and the table is structured in a way to stop 601 (data movement) errors etc

Like you, I think NOLOCK here is questionable but would be interested in knowing if Microsoft think differently.

Leo.Miller SSChampion Points: 13070 More actions · Answer 11

Thanks Ken for pointing me to this article https://docs.microsoft.com/en-us/dynamicsax-2012/developer/optimistic-concurrency-control

Close reading indicates that while you can have Global Optimistic Concurrency Control, it is possible to set Pessimistic Control at a table level either by modifying the table property OCC definition in Dynamics or by adding options to the pseudo code related to the relevant tables. With this information I've been able to show that only a specific range of tables have PCC occurring, and these show up in the blocked query list. What was muddying the waters was the fact that the "head of the blocking chain" is a SPID, not a query, so the query currently returned in the blocking chain command may have nothing to do with the query that actually caused the blocking.

To use a sequence of commands:

SPID 99 > Select top 1 * from ABC with (UPDLOCK) -- runs in under a second so doesn't show up in long running queries and isn't blocked

SPID 99 > Select * from XYZ where col1 = "some condition derived from the first query" -- runs long or waits for user input once completed

SPID 110 > delete from ABC where "some condition" -- This query will now be blocked but the head of the blocking chain actually shows as the second of 99's queries, not the first! I'm not aware that there is any way to change this to show the blocking command, not the last command executed by the blocking SPID.

I've sent the Dynamics developers back to look at all the tables I've identified to see how the table level OCC has been configured.

Cheers, I hope this helps someone else one day.

Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.