The Hidden Cost of MAXDOP: CPU Spikes, Query Slowdowns, and What We Learned

  • Comments posted to this topic are about the item The Hidden Cost of MAXDOP: CPU Spikes, Query Slowdowns, and What We Learned

  • Nice article, has a few follow up questions/wonderings

    We see some of the same at some customer sites, but in general we follow the MS Guidelines of setting MAXDOP to the number of logical cores - max 8.

    I feel you argue for keeping MAXDOP at 1, but I don't agree. So for the sake of discussion, can you elaborate on that? Do you think 1 is best - or just not setting to all cores?

    In my mind CPU spikes are good - they get the job done as fast as possible. And as long as it is spikes ,and not continuously at 95%+ CPU usage - all is fine with me.

    In overall, your server uses less time processing the 4/6 queries when you have MAXDOP=4 and MAXDOP=1. In the 6 queries it almost only uses half the time in total with MAXDOP=4 in relation to MAXDOP=1.

    In real world, we experience a lot of fast queries (~0-1 ms) and some heavy reporting queries. And with MAXDOP=number of logical cores - we sometimes see heavy reporting queries affect fast queries.

    In general we've found over time, that setting MAXDOP=number of cores gives the best overall performance - and adjusting Cost Threshold for Parallelism to 50 (or more).

    But I am considering if it could be a good recommendation for our customers to always set MAXDOP to 1 or 2 below number of cores. So a single heavy and parallelized query leave a 1 or 2 cores available.

    But I would expect us to experience the exact same if then two reporting queries run. And if CPU resources are fine, then not utilizing the last core would just mean slower response times for heavy queries.

    And I don't want to recommend a larger server with 8 cores, for them to just idle at 10-15%.

    Any comments anyone?

  • Hi Anders,

    Thank you

    My article was about covering multiple scenarios on maxdop.. I did not said to set maxdop to 1, But I told you should try to optimise your query first before increasing maxdop..

    I need to know more about your workload..

    You can kindly DM me and I can help.. Here the conversation may go back and forth..

     

     

  • Hi again Anders,

    Since you raised some great points, let me expand a bit technically here for everyone’s benefit — and ask a few questions that might help guide tuning.

    You mentioned that with MAXDOP = 4, the total time to process 6 queries was half of what it took with MAXDOP = 1. That’s expected — parallelism can speed up individual queries. But sometimes this comes at the cost of concurrency, especially if multiple parallel queries compete at once.

    A few questions to better understand your scenario:

    1. What is the typical data size (in GB or rows) involved in those heavy reporting queries? You RAM and CPU Size?

    2. What’s the highest duration or CPU-consuming query, and does it run in parallel or single-threaded?

    3. Are these reports running alongside OLTP queries, or do they run in off-peak hours?

    4. Do you observe CXPACKET, CXCONSUMER, or SOS_SCHEDULER_YIELD waits during spikes?

    5. Have you tuned Cost Threshold for Parallelism beyond 50 in environments where query bursts happen?

    One Suggestion:

    You might try using Query Store or sys.dm_exec_query_stats to identify which queries go parallel and whether they’re blocking others. Sometimes a few aggressive plans dominate all scheduler queues.

    Looking forward to your thoughts — especially how your workloads mix and whether those queries can be tuned further. Happy to help analyze further if you share execution plans or wait stats!

    Best,

    Chandan

  • How many core did the system you tested this on have?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My big takeaway from your article was this bit:

    But perhaps the most valuable insight was this: before touching MAXDOP, fix the query. In our case, adding a missing index and updating statistics reduced execution time from 20 seconds to just 1. No MAXDOP hint was needed.

    My advice to SQL developers is to avoid query hints. They MAY help your query on this specific version of SQL Server, but an update can change behavior. I would estimate that 90%-95% of the time query hints (like MAXDOP) are not required. And in the other 5%-10% of the time, the query hints you should use are a lot less system impacting (such as "OPTIMIZE FOR" query hint or if you are on SQL Standard, and you are SELECTing from an indexed view, you will need to include the index as a query hint). BUT most of the time, let the query optimizer handle the query and let the DBA handle instance level settings like MAXDOP. MOST of the time, if you are adding a query hint for performance, there is a better way to do it. Query hints generally result in technical debt and more complications during upgrades. IF the developer believes that a query hint is required for their query, I like to work with them to prove it isn't. It MAY fix the problem they are seeing (like adding MAXDOP=4 in your scenario), adding an index is a MUCH better solution.

    BUT like all things SQL, the above is not a "rule", just most common approach. Cursors should be avoided, but there are use cases for them. NOLOCK should be avoided, but there are use cases for it.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Without comparing the different execution plans, further comparison is futile!

    😎

    My guess is that in this case, the cost of parallel execution went through the roof, but then again, that's only a guess!

    As has been mentioned, there are many questions unanswered here, i.e. where is the baseline of MAXDOP=0, the different execution plans, CPU and I/O loads, dry-run vs. populated page cache etc.

  • Thank you so much, @Brian Gale — I truly appreciate your thoughtful insights.

    You’ve summarized the key takeaway better than I could. I completely agree: most of the time, query hints like MAXDOP are a shortcut that hides the root issue, and they often create long-term complexity.

    In our case, once we identified the missing index and updated the stats, the improvement was dramatic — and it reminded me how often we overlook fundamentals while chasing parallelism or configuration tweaks.

    I’ll keep your points in mind for future work, especially around query plan stability and hint usage across SQL Server versions. Thank you again for engaging so constructively.

  • Thank you for your response, @Eirikur Eirikisson — truly honored to have you engage.

    You are absolutely right: without comparing the execution plans side-by-side, especially for MAXDOP = 0 vs. 4, the analysis is incomplete. We did look into it briefly, and indeed the parallel plan had a very high cost due to CPU overhead, but I take your point-  a deeper dive into I/O load, cached vs. cold runs, and actual plan shape would’ve strengthened the case.

    I am really learning from your comment and will make sure to include more technical detail in future posts. Grateful for your time and guidance.

    Thank you too much

  • Hi Jeff,

    Thanks for your comment..

    This was 4 core I believe when I performed this testing.. As I recently upgraded it to 8 core for some further investigation

  • Hi again Chandran.

    I am not looking for help solving a specific scenario, we have databases running at several 100 different customers, ranging from sub-GB to multiple TB. With a standard software, so same queries and schema should cover the entire range of customers.

    I just felt the need to emphasize that I disagreed with lowering maxdop as a solution. And I was just wondering if there are any case where following MS guidelines of setting it equal to number of cores doesn't apply.

    Totally agree with "optimize query first".

    Totally agree to avoid query hints.

    So if that was your main point, I missed that.

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply