August 12, 2013 at 5:07 pm
I've got the following issue with a simple select * from large_table
running on SQLServer 2012. The physical machine (Production) returns the result set in approx 3.5 mins, the virtual (UAT) returns in 1.8 mins. The table size is equivalent in both environments. The physical machine that hosts the virtual machine is exactly the same spec as the physical machine so use the same CPU's.
I've made sure there's no physical I/O going on, all the reads are logical.
A breakdown of the CPU/elapsed times are:
Production UAT
CPU Times (ms) 18220 11279
Elps Time (ms) 232879 110659
So it looks like both environments are waiting on something as there's a big difference between CPU and Elapsed time. I've then traced the waits for the sessions involved and got the following results
Production UAT
NETWORK_IO 19807 96869
PREEMPTIVE_OS_WAIT_FORSINGLEOBJECT 19727 95873
So the waits in UAT are much larger than Production where its slow????
I'm fairly new to SQL Server, coming from an Oracle background so maybe my wait collection technique is wrong. I've used event sessions to capture the waits i.e.
create event session session_waits on server
add event sqlos.wait_info
(WHERE sqlserver.session_id=102 and duration>0)
, add event sqlos.wait_info_external
(WHERE sqlserver.session_id=102 and duration>0)
add target package0.asynchronous_file_target
(SET filename=N'c:\temp\wait_stats.xel', metadatafile=N'c:\temp\wait_stats.xem');
Any help/ideas on what could be going would be much appreciated.
Thanks,
Fraze
August 12, 2013 at 5:39 pm
Umpteen things could be at play here. Most likely culprits include:
1) are you ABSOLUTELY sure all the IO is logical? If not, physical IO issues would be my first place.
2) are you CERTAIN that there is no blocking at all that is keeping one query from running for some period of time?
3) get a copy of sp_whoisactive (latest version is v11.11 I think, from sqlblog.com). run that iteratively while each select is running and see what percolates up. You can also do differential analysis with it, which could be key to finding out real issue(s)
4) Network config differences. TCPChimney is a common issue here. Check network stack and transfer/latency rates.
5) Create variables and to the select of each field into a variable. That will let you test pure read speed without network/client coming into play.
Those are just the tip of the iceberg.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
August 12, 2013 at 5:47 pm
Great, thanks for the reply.
1) Yep, I'm sure its all logical - I've re-run the query 20+ times and get the same performance difference consistently - I've used
SET STATISTICS IO ON
SET STATISTICS TIME ON
to get the I/O stats
2) I'm reasonably certain there are no locks - at least non of the tools I'm using show any locking.
I'll investigate the other suggestions.
Thanks,
Fraze
August 12, 2013 at 7:09 pm
So I've gone with the suggestion of selecting into variables (nice idea - thanks) and performance is still twice as slow on the physical machine vs the virtual one
Physical
CPU time - 24,039 ms
Elapsed - 29,864 ms
Virtual
CPU time - 12,090 ms
Elapsed - 12,769 ms
I'm going to copy the table to another physical machine to test there
Thanks,
Fraze
August 12, 2013 at 7:38 pm
OK, so I've tested on an instance on another physical machine that is exactly the same hardware spec as the other physical and the physical running the VM. In all three tests all were logical I/O and 99% of time was on the CPU.
The new physical was slightly quicker than the VM which I'd expect, then the "problem" physical was twice as slow as in previous tests.
Any pointers as to what could be the issue? O/S or CPU issue?
Thanks,
Fraze
August 13, 2013 at 8:42 am
fraze2001 (8/12/2013)
OK, so I've tested on an instance on another physical machine that is exactly the same hardware spec as the other physical and the physical running the VM. In all three tests all were logical I/O and 99% of time was on the CPU.The new physical was slightly quicker than the VM which I'd expect, then the "problem" physical was twice as slow as in previous tests.
Any pointers as to what could be the issue? O/S or CPU issue?
Thanks,
Fraze
Can't believe I forgot this one: check the BIOS and Windows Power Settings to ensure everything is "High Power" and not set to throttle CPUs in any way. That is a HUGE issue I see over and over on client servers when I give them a performance review.
Have you made sure that no one did something silly like mess with affinity settings either in windows or SQL Server?
Cant recall: is maxdop 0 on all machines? Wondering if NUMA issues could be coming into play here. Also, there was a very nasty bug related to NUMA and CPU/Memory utilization that wasn't fixed until SQL 2012 SP1 CU4. Could be hitting that? I was pretty stunned to see what that one could do to a server, although I don't think it applies here in any case now that I think about it.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
August 13, 2013 at 6:45 pm
Thanks very much Kevin,
Well the power settings came up on some further digging I did and sure enough the physical machine that was performing "well" High and the "poor" performing one had it set to Balanced, so I made the change and performance has improved significantly for some queries. I'm seeing 100% improvement in some of the long running jobs!
I haven't checked the BIOS yet as it is a Production machine so can't reboot them yet.
The "problem" machine is base SQL Server 2012, no SP's and does have hardware NUMA - 2 nodes, where as the virtual machine has no NUMA. The other physical I tested on was a SQL Server 2008 SP1 machine with 2 NUMA nodes.
With my test query I've seen the CPU come down by 50% after applying the power settings, but the elapsed time is still around the same which is odd. I've tried to trap any wait events through Extended Events, but its coming back with nothing so I can't explain the difference between CPU time and elapsed time. The elapsed time is wildly inconsistent too, ranging from 19s to 33s while the CPU time is always around the same, approx. 16s. There server is quiet so I cant see this being time spent on the CPU queue. I've added an attachment showing the client stats for a few runs of the same query.
Thanks,
Fraze
August 15, 2013 at 2:13 pm
Glad we found some improvements. There are still quite a few things to check, but I think though that anything more will be beyond the capabilities of a forum to really dig into.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 8 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply