Upgrade to Compatibility Level 130 (from 120) Causing Parallel Plans to Become Slower Serial Plans

  • Hi, we're testing an upgrade of our data warehouse to SQL Server 2016 Enterprise, V13.0.5026.0.  As part of this testing, we:

    • Turned on the QueryStore but kept compatibility level at 120
    • Ran a series of report queries to get baseline metrics
    • Changed compatibility level to 130
    • Ran the same report queries
    • Reviewed the querystore for regressed queries
    The results showed about 60 of our report queries that had significant regression (query duration increased by 100% or more).  As I started to look into the plan differences, I noticed a trend - all of the CL120 plans were parallel, while all of the CL130 plans are serial.  The CL130 plans show an estimated DOP that is parallel, but the actual DOP ends up being 1.  If I use the new OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE')) hint, the plan goes back to being parallel and I get good performance again.  

    Has anyone else experienced this behavior? If so, what steps did you take to resolve?

    I've attached anonymized versions of the CL130 (serial) and CL120 (parallel) plans.

  • If you are going to post execution plans, please upload them as *.sqlplan files, not images/  The images don't really help in and of them selves.

  • Thanks for the feedback.  I've updated the original post with anonymized versions each plan.
  • We're probably going to need the SQL behind "Function1" here, considering it's used to work out the values of columns 11 through to 93.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Friday, July 20, 2018 10:00 AM

    We're probably going to need the SQL behind "Function1" here, considering it's used to work out the values of columns 11 through to 93.

    Yes I imagine that would be helpful 🙂 . "Function1" is SUM()

  • Here's some additional info that might be helpful.  I was following the blog post here on serial vs. parallel optimization thresholds. 

    Here is info from the final statement in the serial query plan under CL130 (the SELECT):

    You can see that the estimated subtree cost is 8.0343, which is above the cost threshold for parallelism (which is set at the default of 5). You can also see that the estimated DOP is 8, but the actual DOP is 1 at execution time.

    Then I used the “USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE')” hint to nudge the CL130 query to use a parallel plan. Here is info from the final statement in the parallel query plan (the SELECT):

    Here you can see that the estimated subtree cost is 5.31058, and that the estimated and actual DOP is 8.

    So, to summarize:
    •    The serial plan cost is above our threshold for parallelism (8.0343 > 5)
    •    The parallel plan cost is below the serial plan cost (5.31058 < 8.0343)
    •    We know that the query is able to be run in parallel b/c it goes parallel with the hint and runs in parallel under CL120
    •    But CL130 still chooses the serial plan.

  • I'm going to say you may want to look at the code.  SQL Server is timing out when trying to determine the execution plan in both cases.  Breaking up the processing may help improve the performance.

  • Lynn Pettis - Friday, July 20, 2018 11:50 AM

    I'm going to say you may want to look at the code.  SQL Server is timing out when trying to determine the execution plan in both cases.  Breaking up the processing may help improve the performance.

    Thanks Lynn, I have been investigating that as well and will continue to do so (note that the timeouts occur in CL120 and CL130).  But, even with the timeout, it's my understanding that the estimated costs reflect whatever the optimizer got through before the timeout, which all points to the fact that it should choose the parallel plan.

  • chris.o.smith - Friday, July 20, 2018 12:35 PM

    Lynn Pettis - Friday, July 20, 2018 11:50 AM

    I'm going to say you may want to look at the code.  SQL Server is timing out when trying to determine the execution plan in both cases.  Breaking up the processing may help improve the performance.

    Thanks Lynn, I have been investigating that as well and will continue to do so (note that the timeouts occur in CL120 and CL130).  But, even with the timeout, it's my understanding that the estimated costs reflect whatever the optimizer got through before the timeout, which all points to the fact that it should choose the parallel plan.

    You are seeing "the depends" in everything.  Just because the cost threshold for parallelism was exceeded doesn't mean it has use parallelism.  With the new cardinality estimator if could be the internal processes still indicated that a serial plan, though actually slower, was the better plan.  Not seeing the code, it is difficult to even make a guess on what is going on.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply