2 identical plans - different performance when query run on different servers

  • Hi - This is a very odd behaviour and the opposite of what you may expect.
    I have a stored procedure that returns a small amount of data. On the production box (60Gb memory, 8 cpu's, sql server max memory set to50Gb, 6 tempdb files) the query runs in excess of a minute.
    Take a copy of the production database, place it on the staging server (2 cpu's, 6Gb mem, 1 tempdb file) the query runs in around 20 secs.
    At the time of running the queries at 10pm there is no load on the server.
    The execution plans are identical ( I have attached them both)
    I have tried updating stats with full scan, adding option recompile to the stored proc. 
    The server settings are the same too, also attached along with both plans

  • PearlJammer1 - Tuesday, August 14, 2018 5:31 PM

    Hi - This is a very odd behaviour and the opposite of what you may expect.
    I have a stored procedure that returns a small amount of data. On the production box (60Gb memory, 8 cpu's, sql server max memory set to50Gb, 6 tempdb files) the query runs in excess of a minute.
    Take a copy of the production database, place it on the staging server (2 cpu's, 6Gb mem, 1 tempdb file) the query runs in around 20 secs.
    At the time of running the queries at 10pm there is no load on the server.
    The execution plans are identical ( I have attached them both)
    I have tried updating stats with full scan, adding option recompile to the stored proc. 
    The server settings are the same too, also attached along with both plans

    So how do the two plans have different versions for the Showplan XML namespace?

    Sue

  • PearlJammer1 - Tuesday, August 14, 2018 5:31 PM

    Hi - This is a very odd behaviour and the opposite of what you may expect.
    I have a stored procedure that returns a small amount of data. On the production box (60Gb memory, 8 cpu's, sql server max memory set to50Gb, 6 tempdb files) the query runs in excess of a minute.
    Take a copy of the production database, place it on the staging server (2 cpu's, 6Gb mem, 1 tempdb file) the query runs in around 20 secs.
    At the time of running the queries at 10pm there is no load on the server.
    The execution plans are identical ( I have attached them both)
    I have tried updating stats with full scan, adding option recompile to the stored proc. 
    The server settings are the same too, also attached along with both plans

    Any differences in trace flags and are both at exactly the same CU?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Since the most expensive statement is select into a temp table, my first question would be what are the configurations of both tempdbs and is there any difference between the IO subsystems.
    😎
    This is the only non-trivial part of the execution plans and the only one where the optimizer times out.

  • Jeff Moden - Tuesday, August 14, 2018 7:45 PM

    PearlJammer1 - Tuesday, August 14, 2018 5:31 PM

    Hi - This is a very odd behaviour and the opposite of what you may expect.
    I have a stored procedure that returns a small amount of data. On the production box (60Gb memory, 8 cpu's, sql server max memory set to50Gb, 6 tempdb files) the query runs in excess of a minute.
    Take a copy of the production database, place it on the staging server (2 cpu's, 6Gb mem, 1 tempdb file) the query runs in around 20 secs.
    At the time of running the queries at 10pm there is no load on the server.
    The execution plans are identical ( I have attached them both)
    I have tried updating stats with full scan, adding option recompile to the stored proc. 
    The server settings are the same too, also attached along with both plans

    Any differences in trace flags and are both at exactly the same CU?

    Good question and one I have overlooked! On examination of the two servers I find they are at different SP and CU. The slower production is at SQL SERVER 2012 SP4 (KB4018073) - 11.0.7001.0 (X64), while the quicker staging server is on SQL SERVER 2012 (SP3-CU10) (KB4025925) 11.0.6607.3 (X64). So the more 'up to date' server has worse performance. Hmmmmmm.

  • Eirikur Eiriksson - Wednesday, August 15, 2018 1:00 AM

    Since the most expensive statement is select into a temp table, my first question would be what are the configurations of both tempdbs and is there any difference between the IO subsystems.
    😎
    This is the only non-trivial part of the execution plans and the only one where the optimizer times out.

    Thanks for your reply. So the servers are hosted in AWS, so questions regarding the i/o subsystem I am unable to answer. As regards config of tempdb. Both servers have tempdb configured on the same drive as the all the data files. D:\DATA for the database files. D:\Logs for the transaction logs. D:\Tempdb for the tempdb files. I am under the impression that because this is shared storage on SAN hosted by AWS (EC2 INSTANCES) That the old way of keeping your log files, tempdb, and data files all on different physical discs is yesterdays old news - and today with modern san this no longer applies? The faster staging server has only one tempdb file (2 cpu cores), and the slower production had 4 tempdb files (4 cpu cores) and now that we have 'beefed' up the server it has 8 cores and 6 tempdb files.

  • PearlJammer1 - Wednesday, August 15, 2018 2:30 AM

    Jeff Moden - Tuesday, August 14, 2018 7:45 PM

    PearlJammer1 - Tuesday, August 14, 2018 5:31 PM

    Hi - This is a very odd behaviour and the opposite of what you may expect.
    I have a stored procedure that returns a small amount of data. On the production box (60Gb memory, 8 cpu's, sql server max memory set to50Gb, 6 tempdb files) the query runs in excess of a minute.
    Take a copy of the production database, place it on the staging server (2 cpu's, 6Gb mem, 1 tempdb file) the query runs in around 20 secs.
    At the time of running the queries at 10pm there is no load on the server.
    The execution plans are identical ( I have attached them both)
    I have tried updating stats with full scan, adding option recompile to the stored proc. 
    The server settings are the same too, also attached along with both plans

    Any differences in trace flags and are both at exactly the same CU?

    Good question and one I have overlooked! On examination of the two servers I find they are at different SP and CU. The slower production is at SQL SERVER 2012 SP4 (KB4018073) - 11.0.7001.0 (X64), while the quicker staging server is on SQL SERVER 2012 (SP3-CU10) (KB4025925) 11.0.6607.3 (X64). So the more 'up to date' server has worse performance. Hmmmmmm.

    Hi Jeff - further update. I have put the staging server to the same SP level at the production server, and it still runs the query quickly, whereas production takes an age!!

  • PearlJammer1 - Wednesday, August 15, 2018 4:03 AM

    PearlJammer1 - Wednesday, August 15, 2018 2:30 AM

    Jeff Moden - Tuesday, August 14, 2018 7:45 PM

    PearlJammer1 - Tuesday, August 14, 2018 5:31 PM

    Hi - This is a very odd behaviour and the opposite of what you may expect.
    I have a stored procedure that returns a small amount of data. On the production box (60Gb memory, 8 cpu's, sql server max memory set to50Gb, 6 tempdb files) the query runs in excess of a minute.
    Take a copy of the production database, place it on the staging server (2 cpu's, 6Gb mem, 1 tempdb file) the query runs in around 20 secs.
    At the time of running the queries at 10pm there is no load on the server.
    The execution plans are identical ( I have attached them both)
    I have tried updating stats with full scan, adding option recompile to the stored proc. 
    The server settings are the same too, also attached along with both plans

    Any differences in trace flags and are both at exactly the same CU?

    Good question and one I have overlooked! On examination of the two servers I find they are at different SP and CU. The slower production is at SQL SERVER 2012 SP4 (KB4018073) - 11.0.7001.0 (X64), while the quicker staging server is on SQL SERVER 2012 (SP3-CU10) (KB4025925) 11.0.6607.3 (X64). So the more 'up to date' server has worse performance. Hmmmmmm.

    Hi Jeff - further update. I have put the staging server to the same SP level at the production server, and it still runs the query quickly, whereas production takes an age!!

    ...and as regards Trace flags that are enabled on Production I have 1222 for deadlocks, and 3226 to stop successful log entires going into the error log. On staging just trace flag 1222 is enabled (all at the global level)

  • PearlJammer1 - Wednesday, August 15, 2018 2:40 AM

    Eirikur Eiriksson - Wednesday, August 15, 2018 1:00 AM

    Since the most expensive statement is select into a temp table, my first question would be what are the configurations of both tempdbs and is there any difference between the IO subsystems.
    😎
    This is the only non-trivial part of the execution plans and the only one where the optimizer times out.

    Thanks for your reply. So the servers are hosted in AWS, so questions regarding the i/o subsystem I am unable to answer. As regards config of tempdb. Both servers have tempdb configured on the same drive as the all the data files. D:\DATA for the database files. D:\Logs for the transaction logs. D:\Tempdb for the tempdb files. I am under the impression that because this is shared storage on SAN hosted by AWS (EC2 INSTANCES) That the old way of keeping your log files, tempdb, and data files all on different physical discs is yesterdays old news - and today with modern san this no longer applies? The faster staging server has only one tempdb file (2 cpu cores), and the slower production had 4 tempdb files (4 cpu cores) and now that we have 'beefed' up the server it has 8 cores and 6 tempdb files.

    First of all. strongly suggest you to enable trace flag 1118, resize all tempdb files to the exact same size and if applicable, turn on 1117 to have all tempdb files grow equally.
    😎

    Next step is to check the actual IO related waits and the AWS EC2 IOPS provisioning, as I said earlier, your problem is in a single select into tempdb table statement and that is what you have to focus on! The most likely causes are underprovisioned IOPS and allocation bitmap contention.

  • Eirikur Eiriksson - Wednesday, August 15, 2018 4:45 AM

    PearlJammer1 - Wednesday, August 15, 2018 2:40 AM

    Eirikur Eiriksson - Wednesday, August 15, 2018 1:00 AM

    Since the most expensive statement is select into a temp table, my first question would be what are the configurations of both tempdbs and is there any difference between the IO subsystems.
    😎
    This is the only non-trivial part of the execution plans and the only one where the optimizer times out.

    Thanks for your reply. So the servers are hosted in AWS, so questions regarding the i/o subsystem I am unable to answer. As regards config of tempdb. Both servers have tempdb configured on the same drive as the all the data files. D:\DATA for the database files. D:\Logs for the transaction logs. D:\Tempdb for the tempdb files. I am under the impression that because this is shared storage on SAN hosted by AWS (EC2 INSTANCES) That the old way of keeping your log files, tempdb, and data files all on different physical discs is yesterdays old news - and today with modern san this no longer applies? The faster staging server has only one tempdb file (2 cpu cores), and the slower production had 4 tempdb files (4 cpu cores) and now that we have 'beefed' up the server it has 8 cores and 6 tempdb files.

    First of all. strongly suggest you to enable trace flag 1118, resize all tempdb files to the exact same size and if applicable, turn on 1117 to have all tempdb files grow equally.
    😎

    Next step is to check the actual IO related waits and the AWS EC2 IOPS provisioning, as I said earlier, your problem is in a single select into tempdb table statement and that is what you have to focus on! The most likely causes are underprovisioned IOPS and allocation bitmap contention.

    Ok thanks for that. I have noticed a major difference in the configuration of the discs. When i run 'fsutil fsinfo ntfsinfo d:\' I get similar output on both servers EXCEPT for the value of BYTES PER CLUSTER, here is the key output: 
    Bytes Per Sector: 512
    Bytes Per Physical Sector: 512
    Bytes Per Cluster: 65536 ON THE FAST SERVER AND 4096 ON THE SLOW SERVER

    IS THIS WORTH CHASING UP WITH OUR HOSTING PEOPLE? TO ME IT SEEMS SIGNIFICANT.

  • PearlJammer1 - Wednesday, August 15, 2018 5:04 AM

    Eirikur Eiriksson - Wednesday, August 15, 2018 4:45 AM

    PearlJammer1 - Wednesday, August 15, 2018 2:40 AM

    Eirikur Eiriksson - Wednesday, August 15, 2018 1:00 AM

    Since the most expensive statement is select into a temp table, my first question would be what are the configurations of both tempdbs and is there any difference between the IO subsystems.
    😎
    This is the only non-trivial part of the execution plans and the only one where the optimizer times out.

    Thanks for your reply. So the servers are hosted in AWS, so questions regarding the i/o subsystem I am unable to answer. As regards config of tempdb. Both servers have tempdb configured on the same drive as the all the data files. D:\DATA for the database files. D:\Logs for the transaction logs. D:\Tempdb for the tempdb files. I am under the impression that because this is shared storage on SAN hosted by AWS (EC2 INSTANCES) That the old way of keeping your log files, tempdb, and data files all on different physical discs is yesterdays old news - and today with modern san this no longer applies? The faster staging server has only one tempdb file (2 cpu cores), and the slower production had 4 tempdb files (4 cpu cores) and now that we have 'beefed' up the server it has 8 cores and 6 tempdb files.

    First of all. strongly suggest you to enable trace flag 1118, resize all tempdb files to the exact same size and if applicable, turn on 1117 to have all tempdb files grow equally.
    😎

    Next step is to check the actual IO related waits and the AWS EC2 IOPS provisioning, as I said earlier, your problem is in a single select into tempdb table statement and that is what you have to focus on! The most likely causes are underprovisioned IOPS and allocation bitmap contention.

    Ok thanks for that. I have noticed a major difference in the configuration of the discs. When i run 'fsutil fsinfo ntfsinfo d:\' I get similar output on both servers EXCEPT for the value of BYTES PER CLUSTER, here is the key output: 
    Bytes Per Sector: 512
    Bytes Per Physical Sector: 512
    Bytes Per Cluster: 65536 ON THE FAST SERVER AND 4096 ON THE SLOW SERVER

    IS THIS WORTH CHASING UP WITH OUR HOSTING PEOPLE? TO ME IT SEEMS SIGNIFICANT.

    This is definitely worth looking into but less likely to have as much impact as allocation bitmap contention / different tempdb data file sizes.
    😎 

    Can you post the full tempdb file settings and configs for both servers ?

  • Eirikur Eiriksson - Wednesday, August 15, 2018 5:34 AM

    PearlJammer1 - Wednesday, August 15, 2018 5:04 AM

    Eirikur Eiriksson - Wednesday, August 15, 2018 4:45 AM

    PearlJammer1 - Wednesday, August 15, 2018 2:40 AM

    Eirikur Eiriksson - Wednesday, August 15, 2018 1:00 AM

    Since the most expensive statement is select into a temp table, my first question would be what are the configurations of both tempdbs and is there any difference between the IO subsystems.
    😎
    This is the only non-trivial part of the execution plans and the only one where the optimizer times out.

    Thanks for your reply. So the servers are hosted in AWS, so questions regarding the i/o subsystem I am unable to answer. As regards config of tempdb. Both servers have tempdb configured on the same drive as the all the data files. D:\DATA for the database files. D:\Logs for the transaction logs. D:\Tempdb for the tempdb files. I am under the impression that because this is shared storage on SAN hosted by AWS (EC2 INSTANCES) That the old way of keeping your log files, tempdb, and data files all on different physical discs is yesterdays old news - and today with modern san this no longer applies? The faster staging server has only one tempdb file (2 cpu cores), and the slower production had 4 tempdb files (4 cpu cores) and now that we have 'beefed' up the server it has 8 cores and 6 tempdb files.

    First of all. strongly suggest you to enable trace flag 1118, resize all tempdb files to the exact same size and if applicable, turn on 1117 to have all tempdb files grow equally.
    😎

    Next step is to check the actual IO related waits and the AWS EC2 IOPS provisioning, as I said earlier, your problem is in a single select into tempdb table statement and that is what you have to focus on! The most likely causes are underprovisioned IOPS and allocation bitmap contention.

    Ok thanks for that. I have noticed a major difference in the configuration of the discs. When i run 'fsutil fsinfo ntfsinfo d:\' I get similar output on both servers EXCEPT for the value of BYTES PER CLUSTER, here is the key output: 
    Bytes Per Sector: 512
    Bytes Per Physical Sector: 512
    Bytes Per Cluster: 65536 ON THE FAST SERVER AND 4096 ON THE SLOW SERVER

    IS THIS WORTH CHASING UP WITH OUR HOSTING PEOPLE? TO ME IT SEEMS SIGNIFICANT.

    This is definitely worth looking into but less likely to have as much impact as allocation bitmap contention / different tempdb data file sizes.
    😎 

    Can you post the full tempdb file settings and configs for both servers ?

    Hi - Here is the tempdb info. In the spread sheet there is also a screen shot of the waits from the production box. I haven't seen any PageLatch_xx wait types that look like a significant bottle neck (7th on list at wait percentage of 3.30% of the waits) which are usually an indicator of the issues in tempdb.

  • PearlJammer1 - Wednesday, August 15, 2018 2:30 AM

    Jeff Moden - Tuesday, August 14, 2018 7:45 PM

    PearlJammer1 - Tuesday, August 14, 2018 5:31 PM

    Hi - This is a very odd behaviour and the opposite of what you may expect.
    I have a stored procedure that returns a small amount of data. On the production box (60Gb memory, 8 cpu's, sql server max memory set to50Gb, 6 tempdb files) the query runs in excess of a minute.
    Take a copy of the production database, place it on the staging server (2 cpu's, 6Gb mem, 1 tempdb file) the query runs in around 20 secs.
    At the time of running the queries at 10pm there is no load on the server.
    The execution plans are identical ( I have attached them both)
    I have tried updating stats with full scan, adding option recompile to the stored proc. 
    The server settings are the same too, also attached along with both plans

    Any differences in trace flags and are both at exactly the same CU?

    Good question and one I have overlooked! On examination of the two servers I find they are at different SP and CU. The slower production is at SQL SERVER 2012 SP4 (KB4018073) - 11.0.7001.0 (X64), while the quicker staging server is on SQL SERVER 2012 (SP3-CU10) (KB4025925) 11.0.6607.3 (X64). So the more 'up to date' server has worse performance. Hmmmmmm.

    IIRC, there was a patch all the way back to SQL Server 2012 to cover the security problem caused by many CPU chips that Intel and other manufacturers had instilled in their chips' bios, etc.  It was pretty well advertised that the fixes would slow some code down.  IIRC, it came out a bit after SP4 on 2012.  You may have to rewrite the affected queries for performance.  It could be something else but the time frame seems about right.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, August 15, 2018 7:03 AM

    PearlJammer1 - Wednesday, August 15, 2018 2:30 AM

    Jeff Moden - Tuesday, August 14, 2018 7:45 PM

    PearlJammer1 - Tuesday, August 14, 2018 5:31 PM

    Hi - This is a very odd behaviour and the opposite of what you may expect.
    I have a stored procedure that returns a small amount of data. On the production box (60Gb memory, 8 cpu's, sql server max memory set to50Gb, 6 tempdb files) the query runs in excess of a minute.
    Take a copy of the production database, place it on the staging server (2 cpu's, 6Gb mem, 1 tempdb file) the query runs in around 20 secs.
    At the time of running the queries at 10pm there is no load on the server.
    The execution plans are identical ( I have attached them both)
    I have tried updating stats with full scan, adding option recompile to the stored proc. 
    The server settings are the same too, also attached along with both plans

    Any differences in trace flags and are both at exactly the same CU?

    Good question and one I have overlooked! On examination of the two servers I find they are at different SP and CU. The slower production is at SQL SERVER 2012 SP4 (KB4018073) - 11.0.7001.0 (X64), while the quicker staging server is on SQL SERVER 2012 (SP3-CU10) (KB4025925) 11.0.6607.3 (X64). So the more 'up to date' server has worse performance. Hmmmmmm.

    IIRC, there was a patch all the way back to SQL Server 2012 to cover the security problem caused by many CPU chips that Intel and other manufacturers had instilled in their chips' bios, etc.  It was pretty well advertised that the fixes would slow some code down.  IIRC, it came out a bit after SP4 on 2012.  You may have to rewrite the affected queries for performance.  It could be something else but the time frame seems about right.

    Hi Jeff, both servers are at SP4 and the staging server still runs the query faster. I am wondering if it is anything to do with the  BYTES PER CLUSTER which is set to 64k on the faster server and only 4k on the slow one. I am in the process of testing this theory out but so far not good. I have got our provider to spin up a test server with 64k block sizes and the query runs fast which was good, but when he formatted with 4k size it still ran fast - i was hoping to see the slow performance. However he had installed sql2016 and not 2012 - so i'm in the process of installing 2012 to test in the hope that this slow down is a factor of sql2012 and the 4k block size.

  • Have you checked to see if there are patch differences between Windows installs. The initial SPECTRE/MELTDOWN fixes had fairly significant impact on certain types of workload and it may well be that something outside of SQL Server is causing the issue.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply