How to reduce huge CXPACKET & LATCH_EX (ACCESS_METHODS_DATASET_PARENT) wait times?

  • Problem

    We have been experiencing high levels of user disruption due to SQL timeouts accross our systems since the beginning of the year.

    The SQL-Server instance in question has very high CPU usage (higher than 90% on all 16 cores all the time) during business hours.

    We have also noticed very high wait times: The combination of CXPACKET & LATCH_EX accounts for about 97% of all waits. This is split about 50/50 between CXPACKET & LATCH_EX.

    The non-buffered latch wait accounting for the vast majority (>95%) of LATCH_EX is ACCESS_METHODS_DATASET_PARENT.

    This suggests the problem is to do with parallelism.

    An example of the scale of wait times is:

    CXPACKET : 332,301,799 ms

    LATCH_EX : 267,955,752 ms

    PAGEIOLATCH_SH : 2,955,160 ms

    This was for the period between 08:00-16:24 on Jan 11th.

    Options under consideration

    1) Change MAXDOP from 0 to something between 4 and 8

    2) Modify the cost threshold of parallelism from 50 to a higher number

    Suggestions most welcome on how to ease the very high CPU load we are seeing, and reduce timeouts, in particular whether the proposed course of action is wise, and which numbers to change MAXDOP and cost threshold of parallellism to.

    Background information

    - SQL-Server 2008 R2 running on AMD Opteron 6180 SE, of which 16 cores are given to this instance of SQL-Server.

    - Type of workload: something of the order of 800 connections at the same time during business hours; majority OLTP type workload with some OLAP mixed in.

  • (2) probably won't help. 50 is already high.

    (1) might help, but probably not all that much. With 16 cores, probably 8 is a decent starting point, but don't expect it to fix the problem.

    Access_Method_Dataset_Parent is a latch used when SQL's doing a parallel table scan. OLTP-type workload should not be doing extensive table scans. Certainly not to the point that the latch wait is noticeable.

    What you're going to need to do to fix this is:

    - Identify what queries are getting the latch waits (query sys.dm_os_waiting_tasks)

    - Tune those queries so that they don't need to do table scans. This will be a combination of changing indexes and fixing queries.

    Repeat until performance is acceptable.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail's observations are correct. Some more input:

    * If cost threshold for parallelism is at 50 and you still see lots of CXPACKET (meaning: lots of parallel stuff), then you are doing lots of "heavy" queries. Combined with the high CPU usage, I expect that a lot of your queries do large table or index scans combined with either hash match join, hash match aggregate, and/or sort operators in the plan. The observation of a lot of parallel scans that Gail makes confirms this.

    Either your OLTP work is very badly tuned, or your server is running more OLAP than you think.

    Can you temporarily shut off the OLAP work and allow only the OLTP work? Then at least you know if all these parallel workloads and all the CPU usage is caused by OLAP or by OLTP.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Thank you both for your prompt responses 🙂

    The extensive parallel table scans make sense unfortunately. Hugo is right that lots of our supposedly OLTP work is very badly tuned.

    We will likely experiment with changing MAXDOP to 8, and stick with the current Cost Threshold for Parallelism value of 50. And not expect things to get much better!

    The focus has to be on tuning queries and getting load off the server.

    One extra piece of information is that I can see a very definite time pattern to the CXPACKET and LATCH_EX waits at certain times (using the SQL-Sentry Performance Advisor Wait Stats Analysis report). This gives me a starting point. But how do I trace which query is actually causing these waits?

    I've had a look at the instantaneous wait stats (not the regular one mentioned above) as Gail suggested using:

    SELECT TOP 1000 [waiting_task_address]

    ,[session_id]

    ,[exec_context_id]

    ,[wait_duration_ms]

    ,[wait_type]

    ,[resource_address]

    ,[blocking_task_address]

    ,[blocking_session_id]

    ,[blocking_exec_context_id]

    ,[resource_description]

    FROM [master].[sys].[dm_os_waiting_tasks]

    where wait_type = 'LATCH_EX'

    order by wait_duration_ms desc

    This gave me a session_id which I could trace in dm_exec_sessions. But is there a way of tracking what the actual query is that's experiencing the wait?

    Thanks again for your prompt responses.

    Mat Martin

  • Join to sys.dm_exec_requests on sessionID and then CROSS APPLY to sys.dm_exec_sql_text using the sql_handle column in exec_requests. I think the column's there in SQL 2008.

    Alternately if you know there's a definite period, then you can set up a trace over that time, capture everything that runs and start tuning on the high CPU queries. https://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-1/

    Yes, tuning is going to have to be your focus.

    Shameless self-promotion: I do tuning work extensively. If you want to get someone to look at it for you and you're either in South Africa or don't mind working with a company in a different part of the world, drop me a PM.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail 🙂

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply