Big difference of SP execution on two servers

Question

Big difference of SP execution on two servers

Igor Micev

SSC-Dedicated

Points: 33109
More actions
April 6, 2016 at 3:19 am

#315286

Hi All,
Server 1 (cluster)
Microsoft SQL Server 2014 (SP1-CU3) (KB3094221) - 12.0.4427.24 (X64)
Oct 10 2015 17:18:26
Copyright (c) Microsoft Corporation
Standard Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: ) (Hypervisor)
Server 2 (stand alone)
Microsoft SQL Server 2014 - 12.0.2546.0 (X64)
Jun 6 2015 00:57:37
Copyright (c) Microsoft Corporation
Standard Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: )
The same SP executed on Server 1 is slow, and executed on Server 2 is fast.
Both servers have identical hardware resources. Same server configuration. Tempdb is not overloaded.
The @@version number is different.
I wander what could be the reason for this.
Any suggestion/advise is welcome...
Igor Micev,My blog: www.igormicev.com

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 1

It is VM vs. Hardware, which can't be discounted. Just because the resource settings in the VM are the same as the hardware doesn't mean they're actually the same. The VM setup can have issues.

However, what other differences are there? Cost threshold for parallelism is the same? Maxdop? ANSI settings? Statistics? Identical databases, including data? Differences in index maintenance?

Assuming all this is identical, I'd use two sources to further investigate, the execution plans and the wait statistics. If the plans are different, there you go. If the plans are the same, then what is the slower system waiting on during the processing? That can tell you what the issue is.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

jeff.mason SSCrazy Eights Points: 9585 More actions · Answer 2

Agreed -- if the hardware and versions are the same, the data layout has to be different. I'd check fragmentation and stats on all affected indexes as that will cause the plan to completely change even though the code is the same. Also, check table sizes. Crazy at it may seem, one DB may have 100 rows and the other 100,000 in the same table. But Grant is right, the execution plan is the best place to start troubleshooting. If they are different, look for bottlenecks on the slow one in the plan which can direct you to the specific indexes to investigate.

Igor Micev SSC-Dedicated Points: 33109 More actions · Answer 3

Grant Fritchey (4/6/2016)
It is VM vs. Hardware, which can't be discounted. Just because the resource settings in the VM are the same as the hardware doesn't mean they're actually the same. The VM setup can have issues.
However, what other differences are there? Cost threshold for parallelism is the same? Maxdop? ANSI settings? Statistics? Identical databases, including data? Differences in index maintenance?
Assuming all this is identical, I'd use two sources to further investigate, the execution plans and the wait statistics. If the plans are different, there you go. If the plans are the same, then what is the slower system waiting on during the processing? That can tell you what the issue is.

Thanks for your reply.

It's all the same.

The wait stats are different, but:

Even better on Server 1. For e.g. avg percentage is 28% for the SP_SERVER_DIAGNOSTIC_SLEEP (because of the cluster) wait type, 14% for QDS_SHUTDOWN_QUEUE, 11% for TRANSACTION_MUTEX, and all the rest is 0-2%, CXPACKET is just 1.1%.

On the Server 2, the CXPACKET is 40% then is TRANSACTION_MUTEX with 33%, THREADPOOL with 10% and so on, where the SP is executing much faster.

It remains that I compare the execution plans. Thanks again and I'll try to update here back.

Igor Micev,My blog: www.igormicev.com

Igor Micev SSC-Dedicated Points: 33109 More actions · Answer 4

Hi friends,

I'm coming back to this thread after investigating the issue with sys admin guys. I checked the settings Grant asked, and they are all identical.

Thanks for the initial replies.

The investigation came to the following:

- Disks on Server 1 (cluster with VMs) are 5-6 times slower than the dedicated physical disks on Server 2. Even though the vm disks use fibre channel - it's not comparable. Both disks are 10K rpm, and both are RAID 10 configured. Strange for the such big difference in response.

The execution plans of the same stored procedure on the two servers could never be the same as the data is not the same. The old version of the SP had bigger differences in execution plans rather than the new version that I code-refactored.

The code-refactoring was replacing some variable @temp tables without indexes with #temp tables with indexes on them. The execution plans changed, still not identical, but very similar. Very close on the cost weights on the key-elements in the plans, i.e. the weights usually fall on the indexes elements. See attachment.

In one of the plans I have Merge and in the other one Nested loops element. That is one difference, but this time the executing time of the two stored procedures is same (some 10-20ms difference, which is ok).

I also came to the following:

-SQL Re-compilations/sec 55K on Server 1 and just 100 on Server 2

-SQL Compilations/sec 125K on Server 1 and just 250 on Server 2

measured on periods of 10 minutes, 12h ,... is nearly the same.

This is a big difference, and I still don't have an idea what could it be.

Igor Micev,My blog: www.igormicev.com

frederico_fonseca SSCoach Points: 16014 More actions · Answer 5

I see to main issues here

1 - VM setup seems to indicate an issue if performance of HD's is that bad compared to the physical.

What is the VM type? Hiper-V or VMWare? getting all hardware/config for this would eventually allow some here to help on it. And are there other VM's apart from the SQL cluster on that physical?

2 - number of recompiles - this would seem to indicate a few possibilities

- something different on both sql server instances (compare sys.configurations and startup options)

- memory issues (can be derived from VM setup)

- some issue with CU3 - this will be harder to tell

but... you say you replaced table variables with temp tables - has this change been done in both the VM and the physical server? or is the VM running with temp tables and the physical with table variables? using temp table will cause recompiles.

Igor Micev SSC-Dedicated Points: 33109 More actions · Answer 6

frederico_fonseca (4/27/2016)
I see to main issues here
1 - VM setup seems to indicate an issue if performance of HD's is that bad compared to the physical.
What is the VM type? Hiper-V or VMWare? getting all hardware/config for this would eventually allow some here to help on it. And are there other VM's apart from the SQL cluster on that physical?
2 - number of recompiles - this would seem to indicate a few possibilities
- something different on both sql server instances (compare sys.configurations and startup options)
- memory issues (can be derived from VM setup)
- some issue with CU3 - this will be harder to tell
but... you say you replaced table variables with temp tables - has this change been done in both the VM and the physical server? or is the VM running with temp tables and the physical with table variables? using temp table will cause recompiles.

It's VMWare. The VMs were ordered from a hosting company, so there are probably other VMs. I asked if the CPU and Memory resources are dedicated for the cluster and the answer was Yes.

I've replaced the table variables with temp tables on the Cluster environment and improved the execution on Server 1. It's done only for one stored procedure. The same change on the Server 2 (stand alone) didn't bring any improvements. The code in all objects is identical.

Igor Micev,My blog: www.igormicev.com