Doubled the size of my SQL Server and now getting bad performance

Question

Doubled the size of my SQL Server and now getting bad performance

sudonjn

Right there with Babe

Points: 798
More actions
October 21, 2011 at 1:01 pm

#259148

We recently doubled the size of our SQL Server from two 4-way processors and 32GB of RAM to two 8-way processors and 128GB of RAM. We are running Windows Server SP2 Enterprise 64-Bit and SQL 2005 SP3 Enterprise 64-Bit. SQL has a max memory setting of 118GB, and SQL is the only thing running on the box.
Over a 2-3 day period, our wait times slowly grow from around 3,000 seconds per hour to around 4,500. On the third day, our wait times skyrocket to 30,000 over a two hour period, and SQL is very slow to respond. Our CPU remains steady around 30%, there is no memory paging, and the Disk I/O is fine. Ignite is saying all of our wait times are Memory/CPU related. We are trying to find the problem, but non of our monitoring tools point to an issue, except that our Memory/CPU wait times are through the roof.
Dropping the Max Server Memory just 1GB will remedy the problem, and the cycle starts again. This led us to believe there is a memory problem, but giving the OS 10GB of memory should be more than sufficient.
Any ideas on what to monitor or thoughts on what the problem might be would be greatly appreciated.

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply

george sibbald SSC Guru Points: 104210 More actions · Answer 1

whats does the available memory perfmon counter say?. How much memory is SQL actually using?

a max memory setting of 118GB is the buffer pool only, other SQL caches will add to that. Try Decreasing max memory by a reasonable amount (say down to 110GB)

---------------------------------------------------------------------

Jack Corbett SSC Guru Points: 184397 More actions · Answer 2

Are you seeing any CXPacket waits? I'm wondering if the upgrade has caused more parallel processing and this is causing issues?

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

sudonjn Right there with Babe Points: 798 More actions · Answer 3

I am not seeing an abnormally high number of CXPacket waits.

sudonjn Right there with Babe Points: 798 More actions · Answer 4

I am showing the available memory as 7.5GB right now. We reset the Max Memory about 36 hours ago.

sudonjn Right there with Babe Points: 798 More actions · Answer 5

My knowledge of wait types is limited. Are CXPacket waits the only way to see problems with parallelism? Is Ignite possibly bundling up different wait types under "Memory/CPU" that may be concealing the issue?

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 6

What's the exact version?

What wait types are you seeing?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

george sibbald SSC Guru Points: 104210 More actions · Answer 7

post the results of this query:

SELECT TOP 10

[Wait type] = wait_type,

[Wait time (s)] = wait_time_ms / 1000,

[% waiting] = CONVERT(DECIMAL(12,2), wait_time_ms * 100.0

/ SUM(wait_time_ms) OVER())

FROM sys.dm_os_wait_stats

WHERE wait_type NOT LIKE '%SLEEP%'

ORDER BY wait_time_ms DESC;

---------------------------------------------------------------------

sudonjn Right there with Babe Points: 798 More actions · Answer 8

The exact version is 9.00.4340.00.

During the peak of the problem, we are seeing 95% of the waits are of type "Memory/CPU". Followed by Writelog, PageLatch_SH, CXPacket, Latch_EX, ...

This is what I am getting from Ignite. Again, I am not sure if Ignite is bundling up several wait types into this category of Memory/CPU.

Being that the "solution" is to drop the Max Memory Setting, it seems to be a memory issue, but why aren't we paging?

sudonjn Right there with Babe Points: 798 More actions · Answer 9

Wait typeWait time (s)% waiting

CXPACKET49125222.81

SQLTRACE_BUFFER_FLUSH30631514.22

BROKER_TASK_STOP1947059.04

SOS_SCHEDULER_YIELD1783828.28

LATCH_EX1666517.74

PAGEIOLATCH_SH1644227.63

WRITELOG1412016.56

BROKER_RECEIVE_WAITFOR1181395.49

BACKUPIO998744.64

BACKUPBUFFER985254.57

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 10

SQLTrace and BrokerTask are system waits, they should be ignored.

That high latch wait worries me. Can you query sys.dm_os_latch_stats and see what the top latch wait types are? Other than that, SOS_Scheduler_yield implies your queries are inefficient or your server does not have enough CPU. Then PageIOLatch, WriteLog, BackupIO and backupBuffer are mostly disk-related. The first 2 are an indication that the IO subsystem may be underpowered.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jack Corbett SSC Guru Points: 184397 More actions · Answer 11

I'll defer to Gail on this, but CXPacket is the top wait type so I'd definitely look at what is getting parallelized and perhaps modify the Threshold for Parallelism setting. You can query sys.dm_exec_cached_plans and sys.dm_exec_query_plan() and query the XML plans to find the queries that are being parallelized.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

sudonjn Right there with Babe Points: 798 More actions · Answer 12

We just doubled the number of CPUs, going from two Quad-Cores to two 8-Cores, and the queries haven't changed.

If parallelism is the problem, is it best to set the MAXDOP on the server or tweak individual queries? Right now it is set to 0 at the instance level.

Everything I have read about low CPU %, but high CPU wait time, says parallelism is the problem, however, the number of CXPacket waits is extremely low compared to the Memory/CPU waits... Maybe I am just relying on Ignite too much.

Here are the results of sys.dm_os_latch_stats ordered by wait_time_ms.

latch_classwaiting_requests_countwait_time_msmax_wait_time_ms

BUFFER1982344202184392033406

ACCESS_METHODS_SCAN_RANGE_GENERATOR245483861620638432688

NESTING_TRANSACTION_READONOY44454274179456218

ACCESS_METHODS_DATASET_PARENT46034711492844360

ACCESS_METHODS_HOBT_VIRTUAL_ROOT11995498567297

NESTING_TRANSACTION_FULL845905183824125

ACCESS_METHODS_ACCESSOR_CACHE9712997324203

LOG_MANAGER1607230469

ALLOC_IAM_PAGE_RANGE_CACHE31219782

ALLOC_FREESPACE_CACHE10458107931

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 13

Rather than knee-jerk reducing parallelism maxdop, I'd look at the queries running in parallel and see what else they're waiting for and see if they can be optimised.

I don't know what Ignite does, but Memory/CPU is not a wait type. It's probably a collection and without knowing what's in that collection it's hard to say anything. Most people classify CXPacket as a CPU wait (it's not memory and it's not IO).

Last time I saw performance degrade after a CPU upgrade it was because the queries were written terribly, got awful execution plans and got paralleled both inappropriately and skew.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 14

I just got word from the ultimate authority of all things storage engine. The combination of CXPacket and SOS Scheduler along with the Access Method latch types indicates you're doing a ton of concurrent parallel table scans. Check indexing, check the queries make sure they're SARGable (can use index seeks)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass