Parallel Query Plan Choices On (near) identical Environments

  • Hi All
    We have one particular query on a particular view that in our production environment insists on selecting a plan that contains parallelism. Normally this wouldn't be a problem, except in this case the query ended up monopolising all 32 cores on our Primary. 

    Now, the plan has been cleared from the cache, the indexes have been rebuilt and any statistics have been updated, none of this has resulted in a non parallel plan.

    The trouble is, when we try the same process, involving pretty much the same everything in our volume testing environment SQL Server selects a non parallel, and in this case far more optimal plan. 

    Everything between the two environments is the same, with the exception of the data being a month or so older. 

    Unfortunately the code in question isn't inside SQL Server, and any change to the statement itself would require a code change and we're in the middle of a code freeze. We can change the MAXDOP server setting to 1 which forces SQL Server to behave a little bit, albeit still with the same poor plan, or potentially rewrite the view, although the view isn't a vastly complicated one and I can't see this helping.

    If we'd had Query Store enabled in production we may have been able to see at what point the plan regressed, assuming it hasn't always been using the parallel plan, unfortunately we didn't.

    It feels like I'm missing something. Other than differences in data, what would cause a query to generate 2 completely different plans across 2 environments, and what can we do to tell SQL Server not to use a parallel plan?

    Thanks in advance

  • Probably your best bet would be to look at the SQL Instance settings of MAXDOP and Cost Threshold for Parallelism but these would be effective for all the queries. 
    Look at this article from Grant Fritchey 
    https://www.scarydba.com/2016/07/11/same-query-different-servers-different-behavior-now-what/

  • This problem would be far easier to investigate if both plans were available to examine. Is this something you can do?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work - Wednesday, November 28, 2018 3:59 AM

    This problem would be far easier to investigate if both plans were available to examine. Is this something you can do?

    I don't think so, but we're on different compatibility versions in the two environments, the preferred plan is on 2012 and the sub optimal plan on 2016. I suspect that might have something to do with it so we're exploring that at the moment.

    If after that we still draw a blank I will see if I can post the plans, but I have a feeling that won't be possible.

    Thanks for your responses!

  • WoundedParrot - Wednesday, November 28, 2018 4:09 AM

    ChrisM@Work - Wednesday, November 28, 2018 3:59 AM

    This problem would be far easier to investigate if both plans were available to examine. Is this something you can do?

    I don't think so, but we're on different compatibility versions in the two environments, the preferred plan is on 2012 and the sub optimal plan on 2016. I suspect that might have something to do with it so we're exploring that at the moment.

    If after that we still draw a blank I will see if I can post the plans, but I have a feeling that won't be possible.

    Thanks for your responses!

    And there it is. So, the two environments are not "near identical". They are in fact very different. One is using the cardinality estimation engine from SQL Server 7 and the other is using the CE from SQL Server 2014. That's just one difference that the compatibility settings are going to cause. There are more. However, I suspect that's the big one. The cost estimates are driven by the row estimates and those are driven by the CE. However, it would do to check the cost threshold for parallelism too. That's one of the bigger offenders for things being different between machines. Also check the ANSI connection settings and the defaults between both. Very small differences add up quickly.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Wednesday, November 28, 2018 4:21 AM

    WoundedParrot - Wednesday, November 28, 2018 4:09 AM

    ChrisM@Work - Wednesday, November 28, 2018 3:59 AM

    This problem would be far easier to investigate if both plans were available to examine. Is this something you can do?

    I don't think so, but we're on different compatibility versions in the two environments, the preferred plan is on 2012 and the sub optimal plan on 2016. I suspect that might have something to do with it so we're exploring that at the moment.

    If after that we still draw a blank I will see if I can post the plans, but I have a feeling that won't be possible.

    Thanks for your responses!

    And there it is. So, the two environments are not "near identical". They are in fact very different. One is using the cardinality estimation engine from SQL Server 7 and the other is using the CE from SQL Server 2014. That's just one difference that the compatibility settings are going to cause. There are more. However, I suspect that's the big one. The cost estimates are driven by the row estimates and those are driven by the CE. However, it would do to check the cost threshold for parallelism too. That's one of the bigger offenders for things being different between machines. Also check the ANSI connection settings and the defaults between both. Very small differences add up quickly.

    Indeed, that seems to have been the culprit as this query presents vastly different behaviour across the two versions.
    Thanks everyone for your advice.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply