Query performs poorly on prod server, works well when run against a restored copy of DB

  • Hi all, 

    I have an query that has been set up and running as part of a sql job in prod for quite some time.  It normally completes in about 4 minutes.   On Jan 24th the process was kicked off at night and ran all night long.  The next morning we found that it was still running.   We killed it and the same thing happened the next night.   While I was digging in to this our business partner took a backup of the DB and restored it to the dev server.   He ran the query on that server and it completed in about 4 minutes like it used to do in production.   Ultimately I was able to fix the issue in prod by running index maintenance and optimize stats, but we'd like to understand why this started happening in the first place (I am monitoring this server with Solarwinds and I can see that the query plan did not change) and why did the performance improve when we restored the same DB to another server with the same version of SQL on it (no additional maintenance was done after the restore).  Hopefully I've provided enough detail.  Does anyone have any ideas for what may have been happening?

  • Based on the fact that you said the query ran fine again in production after you did index maintenance (I'm presuming reorganized) and updated statistics, most likely your statistics in prod were out of date.

    Do you have a regular task to update statistics?  If not, it would be a good idea to create one, then depending on how "busy" the database is have it run several times a week (or more.)  As for re-organizing / rebuilding indexes, I've gone the road suggested by Brent Ozar and haven't done either to my indexes in quite some time, just keep the statistics up-to-date.

  • Thanks, we do not currently have a job set up to do index maint and update stats, but I will be putting one in place.  I'll try and check out that Brent O article that you mention.

    The thing that I still don't understand is why the query would perform better against the same  the same database restored to a different server.   The statistics should have been the same at the point that the database was restored, right?

  • tommiwan - Wednesday, January 31, 2018 9:31 AM

    Thanks, we do not currently have a job set up to do index maint and update stats, but I will be putting one in place.  I'll try and check out that Brent O article that you mention.

    The thing that I still don't understand is why the query would perform better against the same  the same database restored to a different server.   The statistics should have been the same at the point that the database was restored, right?

    Definitely check index stats on tables being used. Just had the same issue at work this week.

  • This might be obvious, but have you checked that it's not blocked by an open transaction?
    It happened to me once.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks, I did check to make sure that there was no blocking going on.   After reading some of Brent Ozar's articles regarding performance issues I'm starting to lean toward parameter sniffing.   

    Would a new query plan be generated when the database is restored to a new server?  If so, that may explain why it worked when we restored to dev.

  • tommiwan - Wednesday, January 31, 2018 12:23 PM

    Thanks, I did check to make sure that there was no blocking going on.   After reading some of Brent Ozar's articles regarding performance issues I'm starting to lean toward parameter sniffing.   

    Would a new query plan be generated when the database is restored to a new server?  If so, that may explain why it worked when we restored to dev.

    Query plans "live" on the server, not in the database, so yes, the restored copy would have a query plan generated from scratch.

    Potentially, you could confirm if it was parameter sniffing, by running the report with the "usual" parameters, then again with the parameters that caused the all night run, on your restored instance.

  • jasona.work - Wednesday, January 31, 2018 1:21 PM

    tommiwan - Wednesday, January 31, 2018 12:23 PM

    Thanks, I did check to make sure that there was no blocking going on.   After reading some of Brent Ozar's articles regarding performance issues I'm starting to lean toward parameter sniffing.   

    Would a new query plan be generated when the database is restored to a new server?  If so, that may explain why it worked when we restored to dev.

    Query plans "live" on the server, not in the database, so yes, the restored copy would have a query plan generated from scratch.

    Potentially, you could confirm if it was parameter sniffing, by running the report with the "usual" parameters, then again with the parameters that caused the all night run, on your restored instance.

    When indexes are rebuilt or reorganized and when statistics are updated on any table involved in the procedure, the plan is removed from cache. Other things could also do that.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • tommiwan - Wednesday, January 31, 2018 9:31 AM

    Thanks, we do not currently have a job set up to do index maint and update stats, but I will be putting one in place.  I'll try and check out that Brent O article that you mention.

    The thing that I still don't understand is why the query would perform better against the same  the same database restored to a different server.   The statistics should have been the same at the point that the database was restored, right?

    Do the DEV and PROD servers have the same number of CPU cores in them?  If the PROD server has more cores than DEV and statistics were out of date as seems to be the case then it is possible that the execution plan went parallel in PROD environment but not in the "lesser" DEV environment even though the data is the same.  What is the results of this query on each server?
    SELECT * FROM sys.configurations
    WHERE name in ('cost threshold for parallelism','max degree of parallelism')

  • Chris Harshman - Wednesday, January 31, 2018 2:08 PM

    tommiwan - Wednesday, January 31, 2018 9:31 AM

    Thanks, we do not currently have a job set up to do index maint and update stats, but I will be putting one in place.  I'll try and check out that Brent O article that you mention.

    The thing that I still don't understand is why the query would perform better against the same  the same database restored to a different server.   The statistics should have been the same at the point that the database was restored, right?

    Do the DEV and PROD servers have the same number of CPU cores in them?  If the PROD server has more cores than DEV and statistics were out of date as seems to be the case then it is possible that the execution plan went parallel in PROD environment but not in the "lesser" DEV environment even though the data is the same.  What is the results of this query on each server?
    SELECT * FROM sys.configurations
    WHERE name in ('cost threshold for parallelism','max degree of parallelism')

    The two servers have the same number of CPUs but the prod server has 20gb ram as opposed to 16gb on the dev server.

    I'm thinking that when we restored the db to the dev server a new query plan would have been generated which explains the performance difference.  After I ran index maintenance and statistics update I can confirm that the query on the prod server is using a new query plan.

  • Hi,
    The exection plan for each stored procedure or querys, resides on the master DB. That´s why when you restored your DB in another server, this execution plans have to be rebuilded.
    The most common problem with a slow query is the cardinality. This is, the execution plan have an estimated number of rows. But in the most cases, there will be a difference between this number and the actual number of rows.
    When you execute a query, the DB selects the minor cost execution plan.But if this plan have a poor cardinality, then your query will take more time than expected.
    The solution for the poor cardinality is to update statistics and reorganize/defragment indexes, as you have been done. I recommend you to check the following sp:
    - sp_createstats
    -sp_updatestats

    Hope this helps.
    Miguel

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply