January 10, 2014 at 12:51 am
We have an SQL server 2012 SP1 CU6 running on a virtual machine that was P2V'ed from a physical blade server.
configured from the physical server was maintained hence it is still configured with 4 CPU's and 6 Cores hence having access to 24 cores in total.
The physical server had 128 gb.
The host (DL580G7, 4 cpu's with 10 cores, 1 TB memory) is running ESXI 5.1
All job are running slightly better on the VM due the better hardware.
Now the funny part starts. After doing the initial testing with the machine configuration which was all good we wanted to increase the VM memory to 512 gb. as we have a total of 1 TB of memory in our hosts and no memory pressure.
We configured the maximum server memory to 80% of the 512 gb. memory and then some of our queries starting taking ages.
We tweaked and adjusted the maximum server memory and found out if we configured it to 330 gb. then server runs just fine but if we go above say 335 gb. then the following
job.:
USE Datamart
exec sp_executesql N'set dateformat dmy exec ssis.tblSumOfStock @P1',N'@P1 datetime2(0)','2013-01-31 00:00:00'
GO
Normally the job takes between 76 and 77 sec. but with the increased memory setting it has taken between 19960 and 35254 seconds (roughly 5h 30m - 9h 45m). and if max memory is lowered execution time return to 76 sec.
We have tried the following.
1.: Reserving all memory on the host
2.: Running sql server with flag 0815 and 8048 to see if numa was the cause but with 330 gb. is still goes across several numa nodes.
3.: Tried isolation the VM on two different hosts - Same issue.
4.: Tried SQL flag to aquirie all memory on startup.
5.: MAXDOP -1 job still very slow
6.: Increasing parallel cost to different values - again slow run
7.: Comparing execution plans with the different memory settings does not give any clues to why.
I simply cant figure when this is happening. Have any of you noticed anything like this or know what could cause this ?
When running the job with max memory setting below and above the 330 gb mark the server acquires a total of 60 gb memory hence there is no memory pressure.
January 10, 2014 at 7:20 am
1) Have you done a wait stats and file IO stall analyses during both sets of runs?
2) Do you have vCPU Hot Plug enabled in VMware?
3) Reference this to check some NUMA/Memory items: http://blogs.msdn.com/b/psssql/archive/2012/12/13/how-it-works-sql-server-numa-local-foreign-and-away-memory-blocks.aspx
4) What does dbcc memorystatus reveal during runs with each memory setting?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 13, 2014 at 3:03 am
1) Have you done a wait stats and file IO stall analyses during both sets of runs?
Please see attached output,
2) Do you have vCPU Hot Plug enabled in VMware?
Hotplug is disabled for cpu as weel as memory
3) Reference this to check some NUMA/Memory items: http://blogs.msdn.com/b/psssql/archive/2012/12/13/how-it-works-sql-server-numa-local-foreign-and-away-memory-blocks.aspx
I have tried with flags 8015 and 8048 to ingore numa completely.
Running ESXTOP shows all memory on local numa in both scenarios.
4) What does dbcc memorystatus reveal during runs with each memory setting?
Please see attached output.
January 13, 2014 at 9:28 am
A) not sure what the xml files are but they aren't query plan(s) of the actual process. Looks like you may have trapped msdb job activity?
B) the code you listed as running is just a name. To investigate you need the code that is called and executed, not a sproc name.
C) I used winmerge to do a quick comparison of the memorystatus files:
Pages Free is 3 orders of magnitude lower for each NUMA node. That could be reason enough for the perf difference. I could see no other indication in memorystatus output as to why that happened. Bug or mis-config or poor interplay between VM/Win/SQL are my guesses.
D) this looks to be something too complex and time-consuming for free forum help, and I don't think we would be able to come up with something anyway. I would seek out Microsoft and VMware both in conference on this one.
Best of luck with it! And please to reply back to this thread when you find a resolution so that others may learn from (and be able to find via search) your problem!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 15, 2014 at 4:35 am
Hi All,
The terible performance was caused by the query optimzer it decided to perform some merge joins in the slow plan and hash joins in the fast plan.
The problematic storage procedure was altered to include at hint option (hash joins) to pursuad the optimizer to use this instead. After this was done the job completed with the high max memory setting in a time equal to the fast execution plan.
Why it did is still unclear and and it still unclear why it has such a drastic effect on the execution time.
It you want to follow up it can be done here.
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply