Query is slow in production but fast in QA

  • Hello,
    I have Select query that runs slow in production but fast in QA. I know there are so many blogs regarding this issue. I have found the issue and it because of a view doing a join on a table and pulling 25 million rows and we recommended users to apply a predicate to reduce the row count.

    Issue here is in QA it runs in 4 minutes and tempdb usage is minimal but in production this particular query is using 123GB tempdb internal space when running.

    QA configuration : 6Gb memory and 4vCPU's and its a VM
    Prod configuration : 64GB memory, 12 physical cores and its physical.

    I know because of more memory, optimizer might pick a different plan but why would it make a difference in tempdb usage and also execution plan in QA and prod is doing a scan of 25 million rows and doing Hatch Match join as well but for some reason query is fast in QA but slow in Prod.

    any help or thoughts would be appreciated.

    Thanks

  • It's hard to say for certain without seeing the execution plans. It could be contention on resources that's there in prod, but not QA. In addition to capturing and comparing the execution plans, I'd also capture the wait statistics specific to the query execution so you can understand what is causing it to run slow. Check all the other system settings between the two environments as well, cost threshold for parallelism, ANSI connection settings, etc.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Friday, January 13, 2017 11:02 AM

    It's hard to say for certain without seeing the execution plans. It could be contention on resources that's there in prod, but not QA. In addition to capturing and comparing the execution plans, I'd also capture the wait statistics specific to the query execution so you can understand what is causing it to run slow. Check all the other system settings between the two environments as well, cost threshold for parallelism, ANSI connection settings, etc.

    Thanks Grant.

    I checked the CTP and ANSI settings and they are same. i'm looking for any other difference between the two instances. Also, i'm trying to update stats on that table that's scanning all the records to see if it helps in anyway. In the mean time any other suggestions are welcome. Also, i dont see any wait stats when run on QA and Prod, all the time query is in Runnable mode and it runs forever in Prod but in QA it stays in runnable mode and suddenly we get the results. I'm also thinking to clear the single use plan to make room for new plans.

    Thanks

  • Robin35 - Friday, January 13, 2017 10:32 AM

    Hello,
    I have Select query that runs slow in production but fast in QA. I know there are so many blogs regarding this issue. I have found the issue and it because of a view doing a join on a table and pulling 25 million rows and we recommended users to apply a predicate to reduce the row count.

    Issue here is in QA it runs in 4 minutes and tempdb usage is minimal but in production this particular query is using 123GB tempdb internal space when running.

    QA configuration : 6Gb memory and 4vCPU's and its a VM
    Prod configuration : 64GB memory, 12 physical cores and its physical.

    I know because of more memory, optimizer might pick a different plan but why would it make a difference in tempdb usage and also execution plan in QA and prod is doing a scan of 25 million rows and doing Hatch Match join as well but for some reason query is fast in QA but slow in Prod.

    any help or thoughts would be appreciated.

    Thanks

    Are you saying that both systems have the same number of rows in the tables?  Just making sure.

    Also, have you checked to ensure that identical indexes are in place and have you checked to ensure that the stats on the underlying tables have been updated recently?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, January 13, 2017 12:58 PM

    Robin35 - Friday, January 13, 2017 10:32 AM

    Hello,
    I have Select query that runs slow in production but fast in QA. I know there are so many blogs regarding this issue. I have found the issue and it because of a view doing a join on a table and pulling 25 million rows and we recommended users to apply a predicate to reduce the row count.

    Issue here is in QA it runs in 4 minutes and tempdb usage is minimal but in production this particular query is using 123GB tempdb internal space when running.

    QA configuration : 6Gb memory and 4vCPU's and its a VM
    Prod configuration : 64GB memory, 12 physical cores and its physical.

    I know because of more memory, optimizer might pick a different plan but why would it make a difference in tempdb usage and also execution plan in QA and prod is doing a scan of 25 million rows and doing Hatch Match join as well but for some reason query is fast in QA but slow in Prod.

    any help or thoughts would be appreciated.

    Thanks

    Are you saying that both systems have the same number of rows in the tables?  Just making sure.

    Also, have you checked to ensure that identical indexes are in place and have you checked to ensure that the stats on the underlying tables have been updated recently?

    yes, they have same number of rows and stats were updated with 100 percent sample rate, not sure if the data is skewed. but in production i'm seeing the session in sleep_task wait ...and also i captured the wait stats information and i see LATCH_EX waits and SOS_Scheduler yeild waits . this explains there is a tempdb contention but not sure how to prove it. I reached out to storage team to check for any performance issues from their end.

  • Robin35 - Friday, January 13, 2017 1:40 PM

    Jeff Moden - Friday, January 13, 2017 12:58 PM

    Robin35 - Friday, January 13, 2017 10:32 AM

    Hello,
    I have Select query that runs slow in production but fast in QA. I know there are so many blogs regarding this issue. I have found the issue and it because of a view doing a join on a table and pulling 25 million rows and we recommended users to apply a predicate to reduce the row count.

    Issue here is in QA it runs in 4 minutes and tempdb usage is minimal but in production this particular query is using 123GB tempdb internal space when running.

    QA configuration : 6Gb memory and 4vCPU's and its a VM
    Prod configuration : 64GB memory, 12 physical cores and its physical.

    I know because of more memory, optimizer might pick a different plan but why would it make a difference in tempdb usage and also execution plan in QA and prod is doing a scan of 25 million rows and doing Hatch Match join as well but for some reason query is fast in QA but slow in Prod.

    any help or thoughts would be appreciated.

    Thanks

    Are you saying that both systems have the same number of rows in the tables?  Just making sure.

    Also, have you checked to ensure that identical indexes are in place and have you checked to ensure that the stats on the underlying tables have been updated recently?

    yes, they have same number of rows and stats were updated with 100 percent sample rate, not sure if the data is skewed. but in production i'm seeing the session in sleep_task wait ...and also i captured the wait stats information and i see LATCH_EX waits and SOS_Scheduler yeild waits . this explains there is a tempdb contention but not sure how to prove it. I reached out to storage team to check for any performance issues from their end.

  • On your end, is TempDB configured the same on both machines?  Also, do you have Trace Flag 1118 turned on at startup?  If not, you need to do that.  I've seen it solve a wealth of problems.

    Also and if it hasn't already been done, consider creating more than 1 data file for TempDB.  My recommendation is 1 file per physical core up to a max of 8 files.  All of the files must be configured identically and need to be the exact same size.  Don't turn off autogrowth.

    Just an FYI, on my 16 core system (32 logical), I have 8 files and 1 log file for TempDB.  Each has been configured with an initial size of 2GB and none of them have grown in the last 5 years.  I have multiple databases that operate as batch and OLTP that are each ~700 GB each and 256 GB of RAM.  Even when I only had 128 GB of RAM, TempDB still didn't grow.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, January 13, 2017 2:08 PM

    On your end, is TempDB configured the same on both machines?  Also, do you have Trace Flag 1118 turned on at startup?  If not, you need to do that.  I've seen it solve a wealth of problems.

    Also and if it hasn't already been done, consider creating more than 1 data file for TempDB.  My recommendation is 1 file per physical core up to a max of 8 files.  All of the files must be configured identically and need to be the exact same size.  Don't turn off autogrowth.

    Just an FYI, on my 16 core system (32 logical), I have 8 files and 1 log file for TempDB.  Each has been configured with an initial size of 2GB and none of them have grown in the last 5 years.  I have multiple databases that operate as batch and OLTP that are each ~700 GB each and 256 GB of RAM.  Even when I only had 128 GB of RAM, TempDB still didn't grow.

    Thanks Jeff. Yes we have 4 tempdb data file with 40GB each on Prod and QA also has 4 tempdb files. we pre allocated the size and auto growth is also enabled.

    Regarding the Tempdb, i will see if we can enable it.

  • Apparently the spaminator ate my last post.  If you want to read more about Trace Flag 1118, please see the following link (trying again)...http://www.sqlskills.com/blogs/paul/misconceptions-around-tf-1118/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply