Query Execution Plan, differs between environments

  • I'll start by apologizing for any lack of information, as I'm at a loss as to the specific cause of this problem - I wasn't certain what information would be helpful.

    The problem is this, the same exact query being run against a Development Database and a Production database, is being executed in a different manner between environments. What this results in, is that in the Production environment this query executes in less than 7 seconds - whereas in our Development environment it can take an excess of 3 minutes.

    Bear in mind, that when this was originally created it was based upon the Development database and was running with the same execution times as in Production. Sometime last week, something changed in Development that is causing a different execution plan to be selected...I haven't been able to identify what that change is, though.

    Now, my first thought was to immediately check the Actual Execution Plans, thinking that my Head DBA had made some index changes in Development that weren't made in Production. Upon researching this, I identified that one of the tables I was joining against for comparison purposes, was actually being stored as a Heap (no indexes whatsoever) in both environments - but I wasn't able to identify any missing or new indexes, between the two environments, that I could link as the potential cause of my problem.

    Looking further, at the execution plans - I located where the difference appears to be, between the two execution plans. Basically, in Production it's doing 24 Executions to join against the Persons tables, whereas in Development it's doing 24k executions...I'm at a loss as to why.

    I tried creating the proper indexes on my Heap table, I can bring the execution time down to 41 seconds in Development - but I cannot get the number of executions when joining against the Person information to go back to only 24 executions instead of 24k.

    What am I missing? Tuning Advisor gives me some other indexes that can be created to help increase Performance, but nothing that would impact the query as much as I'm seeing.

  • I'd start by updating the statistics in your development server. Those seem very stale.

    Same queries in different environments can generate different execution plans because they have to adapt to the environment to generate an optimal plan.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I was thinking that as well, so I went ahead and ran a:

    UPDATE STATISTICS dbo.TableName WITH FULLSCAN

    On all tables that are being joined with the SQL query, unfortunately this had no appreciable impact on the execution plan that is being selected.

    I also went ahead and did a rebuild on the indexes that I am joining on, that did not change the execution plan either.

    I can include the actual T-SQL query if that would add any additional insight into my problem.

  • We have similar issues from time to time in our dev environment. If rebuilding statistics/indexes does not help, it may be a major difference in the data that is your issue. Our dev environments sometimes get loaded up with some wacky data that we wouldn't get in our production environment. This is usually a result of developers loading up a bunch of data to test with. Large amounts of data changes can change cardinality and query plans. I would recommend restoring your production DB back over your dev DB and coming up with a regular plan to do so. Find a timing that works for your development/release cycle and make restoring from prod a regular part of your cycle.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • compare execution plans in DEV and PROD

    and use the query hint for index.... that using in prod.

    the same index will use in dev if you put as shown below. after few times, it will talk automatically same execution plan.

    there are so many reasons to use different plans on dev and PROD.

    SELECT PER.FirstName,

    PER.LastName,

    EMP.LoginID

    FROM HumanResources.Employee EMP WITH (INDEX (PK_Employee_BusinessEntityID))

    INNER JOIN Person.Person PER ON PER.BusinessEntityID = EMP.BusinessEntityID;

  • John Rowan (4/11/2016)


    We have similar issues from time to time in our dev environment. If rebuilding statistics/indexes does not help, it may be a major difference in the data that is your issue. Our dev environments sometimes get loaded up with some wacky data that we wouldn't get in our production environment. This is usually a result of developers loading up a bunch of data to test with. Large amounts of data changes can change cardinality and query plans. I would recommend restoring your production DB back over your dev DB and coming up with a regular plan to do so. Find a timing that works for your development/release cycle and make restoring from prod a regular part of your cycle.

    I actually agree, in an ideal environment my Development database would be a regularly updated copy of our Production database...and I actually think how we propagate information from Production into Dev is a possible cause of this issue.

    Currently, my head DBA elected that we go with a Kill/Fill ETL process to populate Dev - basically there is an SSIS package that we run regularly to update the information within Dev. The process had been running while I was testing this ETL process, and didn't have any impact til all of a sudden the query started running with a wonky query execution plan, in Development.

    Query runs fine in Production, so easy fix is to just restore over Development with a Production copy and bada bing - we got a good database that this ETL process runs against fine.

    That being said...I don't understand the reasoning, as to why the Query Optimizer has elected to use a non-performant execution plan, and my concern is that this may crop up in my Production environment - mostly cause I haven't been able to identify the specific cause.

    I'm reluctant to take the easy fix, when this is something that I SHOULD be able to identify...I just haven't been able to do so yet =/

  • Ron007 (4/11/2016)


    compare execution plans in DEV and PROD

    and use the query hint for index.... that using in prod.

    the same index will use in dev if you put as shown below. after few times, it will talk automatically same execution plan.

    there are so many reasons to use different plans on dev and PROD.

    SELECT PER.FirstName,

    PER.LastName,

    EMP.LoginID

    FROM HumanResources.Employee EMP WITH (INDEX (PK_Employee_BusinessEntityID))

    INNER JOIN Person.Person PER ON PER.BusinessEntityID = EMP.BusinessEntityID;

    I had thought of using the INDEX hint for my queries JOIN predicates, but I wasn't able to identify a specific index that I thought was the culprit - it just looked like the Query Optimizer was selecting to do 24k Executions instead of the 24 it was originally executing.

    In regards to the different execution plans between environments, I can see where they would differ based upon how maintenance plans, ETL Processes updating statistics during UPDATE/INSERT etc - but I can't see a reason as to why you would want them to differ...I might be missing something though.

  • Quick question, are the environments identical and with the same settings (maxdop, maxmem etc.)?

    Have you tried to run the query on dev with OPTION (RECOMPILE)?

    😎

  • Eirikur Eiriksson (4/11/2016)


    Quick question, are the environments identical and with the same settings (maxdop, maxmem etc.)?

    Have you tried to run the query on dev with OPTION (RECOMPILE)?

    😎

    MAXDOP, MAXMEN, etc - all same between the two SQL instances. They are on separate physical servers as well, though the physical servers are exact duplicates of one another as well. Actually plan on converting them *EDIT - INTO a SQL Server Failover cluster, once I convince my Head DBA to let me.

    And I honestly did not try the (RECOMPILE) option, after my most recent index changes and statistic updates (I did give it a try when the problem first cropped up to see if the Query Optimizer would provide me with something less terrible, and had no luck then) - just gave it a try, and it had no impact =/

  • First things first: could you post the CREATE statement for the view referenced by the query (Stage_PLEMP_UniqueRecords_View)?

    Having said that, it seems to me that it's mostly a matter of chance that the production environment performs more acceptably than development.

    Both of them drastically underestimate the number of rows coming out of the same filter operator. For both queries about 25000 rows end up coming through the filter operator. The difference is that the plan from production estimates 187, while the plan from development estimates 1.

    That difference of 1 row versus 187 in the outermost branch leads the development environment's version to choose a nested loop join where the production environment's version chooses a hash match, because executing the inner branch 1 time is estimated as acceptable (and might actually be, if it only had to execute it once), but executing it 187 times would not be.

    Because of that, there are branches that the production version only executes once (the reported 24 executions for some operators is because 24 is the degree of parallelism for the query, so it's really one execution by 24 threads), but the development version has to execute about 25000 times (once for each row in the outer branch).

    So, even though the estimates for both are quite wrong, one of the wrong estimates is at least high enough to avoid the terrible choice of a nested loop join at an especially sensitive point.

    I suspect that the filter at the heart of all this is actually from the view, which is why I'd like to see that definition.

    Cheers!

  • Doug.F (4/11/2016)


    I was thinking that as well, so I went ahead and ran a:

    UPDATE STATISTICS dbo.TableName WITH FULLSCAN

    On all tables that are being joined with the SQL query, unfortunately this had no appreciable impact on the execution plan that is being selected.

    I also went ahead and did a rebuild on the indexes that I am joining on, that did not change the execution plan either.

    I can include the actual T-SQL query if that would add any additional insight into my problem.

    Rebuilding an index is done to defragment the index. That won't affect the execution plan. However, rebuilding the index will result in updated statistics. If you think your statistics are out of date, either update the statistics, or rebuild the index. Doing both results in two updates to your statistics.

    The first question I have is, is the data the same in both environments? 100 rows in each table, etc. If not, that alone could be the problem.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Both plans are timing out. That means that, depending on the CPU cycles and estimates from the statistics, you could see different plans on different days on either environment. It won't necessarily be consistent. As everyone else has pointed out, the estimates are off. That alone could easily explain what's going on. Statistics are the principal driver for the optimizer to make choices on the plans. Combine these differences with the timeout and you've largely explained what's going on.

    You have lots of implicit conversion warnings in there. You might want to double check how this query is put together. You have an estimated cost of 9 & 13 (dev & prod), which is awfully low for parallel execution. I suspect you're still operating with the default cost threshold for parallelism of 5. I'd bump that number up some. These functions, LEFT(SPUV.strFirstName,1) and CAST(SPUV.dteDOB AS DATE), are causing table scans and preventing good index use.

    Considering the size of this query overall, it's not that big, the combination of the functions and the parallel execution are what's leading to the timeout in the optimizer. Cleaning those up will help arrive at a more consistent plan. Then, getting better row estimates by ensuring the statistics are up to date will also get you a better plan. For example, production is doing a LOOP join against 25,000 rows because it thinks it's only getting 187.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ron007 (4/11/2016)


    compare execution plans in DEV and PROD

    and use the query hint for index.... that using in prod.

    the same index will use in dev if you put as shown below. after few times, it will talk automatically same execution plan.

    there are so many reasons to use different plans on dev and PROD.

    SELECT PER.FirstName,

    PER.LastName,

    EMP.LoginID

    FROM HumanResources.Employee EMP WITH (INDEX (PK_Employee_BusinessEntityID))

    INNER JOIN Person.Person PER ON PER.BusinessEntityID = EMP.BusinessEntityID;

    I'm absolutely against this advice.

    We clearly have an issue with statistics. Forcing the optimizer to make choices that aren't consistent with the statistics is not the way to solve the problem. In fact, forcing these choices is likely to lead to performance degradation. Query hints must be used very sparingly and after exhausting other avenues of solving the problem.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Jacob Wilkins (4/11/2016)


    First things first: could you post the CREATE statement for the view referenced by the query (Stage_PLEMP_UniqueRecords_View)?Having said that, it seems to me that it's mostly a matter of chance that the production environment performs more acceptably than development.Both of them drastically underestimate the number of rows coming out of the same filter operator. For both queries about 25000 rows end up coming through the filter operator. The difference is that the plan from production estimates 187, while the plan from development estimates 1. That difference of 1 row versus 187 in the outermost branch leads the development environment's version to choose a nested loop join where the production environment's version chooses a hash match, because executing the inner branch 1 time is estimated as acceptable (and might actually be, if it only had to execute it once), but executing it 187 times would not be.Because of that, there are branches that the production version only executes once (the reported 24 executions for some operators is because 24 is the degree of parallelism for the query, so it's really one execution by 24 threads), but the development version has to execute about 25000 times (once for each row in the outer branch).So, even though the estimates for both are quite wrong, one of the wrong estimates is at least high enough to avoid the terrible choice of a nested loop join at an especially sensitive point.I suspect that the filter at the heart of all this is actually from the view, which is why I'd like to see that definition.Cheers!

    That makes a great deal of sense, I'd seen where it was doing a hash match vs a nested loop (which is how I arrived at the number of executions, and to which you've given me some new material to research to expand my understanding) - I didn't think beyond that, and your explanation makes it far more clear as to what the potential cause of my problems are.Definition for the view is:

    is the filter that is causing the Estimated Number of Rows to diverge from Actual, and I think can be said to be a large contributor to the cause of my problems. Extrapolating from that, I'm guessing that SQL Server is giving me this bad estimate because the WHERE clause is predicated upon a GROUP BY's COUNT() function - but I'd need to research more in order to understand why this is occurring.This drills me a lot closer to the actual cause of my problems. Any additional suggestions, or most especially any good recommendations on a book to help expand my understanding of this type of situation, would be greatly appreciated.

  • Grant Fritchey (4/12/2016)


    Ron007 (4/11/2016)


    compare execution plans in DEV and PROD

    and use the query hint for index.... that using in prod.

    the same index will use in dev if you put as shown below. after few times, it will talk automatically same execution plan.

    there are so many reasons to use different plans on dev and PROD.

    SELECT PER.FirstName,

    PER.LastName,

    EMP.LoginID

    FROM HumanResources.Employee EMP WITH (INDEX (PK_Employee_BusinessEntityID))

    INNER JOIN Person.Person PER ON PER.BusinessEntityID = EMP.BusinessEntityID;

    I'm absolutely against this advice.

    We clearly have an issue with statistics. Forcing the optimizer to make choices that aren't consistent with the statistics is not the way to solve the problem. In fact, forcing these choices is likely to lead to performance degradation. Query hints must be used very sparingly and after exhausting other avenues of solving the problem.

    Thank you for this clarification, that was my understanding of how query hints are best used in an environment.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply