Memory_Parallelism Performance Problem

Question

Post reply

Memory_Parallelism Performance Problem

'nix

Old Hand

Points: 302
More actions
March 21, 2008 at 11:54 am

#120643

Running Sql server 2005 on a 4XQuadcore Intel processor w128 gig of ram. Max server memory set to 120 Gig. We did a server migration from a 8Xdual core machine with less ram last week
Since this migration we have seen a sudden jump on the 3rd day in producton of locks, especially those involving Parallelism where we had no issues before. All other configuration settings on the box and in sql are set the same as the old server and have been verified. Here are my sp_config values:
Here is a copy of the memory standard report in .txt format
nameminimummaximumconfig_valuerun_value
Ad Hoc Distributed Queries0100
affinity I/O mask-2147483648214748364700
affinity mask-2147483648214748364700
Agent XPs0111
allow updates0100
awe enabled0100
blocked process threshold08640000
c2 audit mode0100
clr enabled0111
common criteria compliance enabled0100
cost threshold for parallelism03276755
cross db ownership chaining0100
cursor threshold-12147483647-1-1
Database Mail XPs0111
default full-text language0214748364710331033
default language0999900
default trace enabled0111
disallow results from triggers0100
fill factor (%)010000
ft crawl bandwidth (max)032767100100
ft crawl bandwidth (min)03276700
ft notify bandwidth (max)032767100100
ft notify bandwidth (min)03276700
index create memory (KB)704214748364700
in-doubt xact resolution0200
lightweight pooling0100
locks5000214748364700
max degree of parallelism06400
max full-text crawl range025644
max server memory (MB)16214748364721474836472147483647
max text repl size (B)021474836476553665536
max worker threads1283276700
media retention036500
min memory per query (KB)512214748364710241024
min server memory (MB)0214748364708
nested triggers0111
network packet size (B)5123276740964096
Ole Automation Procedures0100
open objects0214748364700
PH timeout (s)136006060
precompute rank0100
priority boost0100
query governor cost limit0214748364700
query wait (s)-12147483647-1-1
recovery interval (min)03276700
remote access0111
remote admin connections0100
remote login timeout (s)021474836472020
remote proc trans0100
remote query timeout (s)02147483647600600
Replication XPs0100
scan for startup procs0100
server trigger recursion0111
set working set size0100
show advanced options0111
SMO and DMO XPs0111
SQL Mail XPs0100
transform noise words0100
two digit year cutoff1753999920492049
user connections03276700
user options03276700
Web Assistant Procedures0100
xp_cmdshell0100
Memory Consumption
on svr-sql-prod03\prod at 3/21/2008 12:20:03 PM
This report provides detailed data on the memory consumption of components within the Instance as well as historical data
f ri rddb h Df lT
Memory Grants Outstanding 0
Memory Grants Pending 0
Page life expectancy
Memory Usage By Components
Component Type
Allocated
memory(KB)
Virtual Memory
Reserved(KB)
Virtual Memory
Committed(KB)
CACHESTORE_SQLCP 9,510,904 0 0
CACHESTORE_OBJCP 572,504 0 0
USERSTORE_TOKENPERM 286,320 0 0
MEMORYCLERK_SOSNODE 209,272 0 0
MEMORYCLERK_SQLGENERAL 147,568 0 0
OBJECTSTORE_LOCK_MANAGER 102,760 524,288 524,288
MEMORYCLERK_SQLUTILITIES 43,192 840 840
CACHESTORE_PHDR 42,976 0 0
USERSTORE_SCHEMAMGR 41,848 0 0
MEMORYCLERK_SQLCLR 41,824 6,312,960 56,204
OBJECTSTORE_SNI_PACKET 39,752 0 0
MEMORYCLERK_SQLSTORENG 34,592 12,096 12,096
OBJECTSTORE_LBSS 33,952 0 0
USERSTORE_DBMETADATA 27,640 0 0
MEMORYCLERK_SQLCONNECTIONPOOL 20,752 0 0
MEMORYCLERK_BHF 14,176 0 0
USERSTORE_SXC 5,192 0 0
CACHESTORE_SYSTEMROWSET 5,096 0 0
USERSTORE_OBJPERM 3,072 0 0
MEMORYCLERK_SNI 2,552 0 0
MEMORYCLERK_SQLOPTIMIZER 2,328 0 0
MEMORYCLERK_SQLBUFFERPOOL 2,056 134,365,184 131,072
CACHESTORE_BROKERTBLACS 640 0 0
OBJECTSTORE_SERVICE_BROKER 464 0 0
MEMORYCLERK_SQLSERVICEBROKER 440 0 0
CACHESTORE_TEMPTABLES 440 0 0
MEMORYCLERK_HOST 232 0 0
CACHESTORE_XPROC 232 0 0
CACHESTORE_BROKERREADONLY 128 0 0
MEMORYCLERK_SQLSERVICEBROKERTRANSPO 80 0 0
CACHESTORE_CLRPROC 72 0 0
CACHESTORE_XMLDBTYPE 32 0 0
MEMORYCLERK_SQLQUERYEXEC 32 0 0
CACHESTORE_VIEWDEFINITIONS 16 0 0
MEMORYCLERK_SQLXP 16 0 0
CACHESTORE_EVENTS 16 0 0
CACHESTORE_BROKERRSB 16 0 0
CACHESTORE_NOTIF 16 0 0
CACHESTORE_STACKFRAMES 16 0 0
MEMORYCLERK_FULLTEXT 16 0 0
CACHESTORE_XMLDBELEMENT 8 0 0
CACHESTORE_XMLDBATTRIBUTE 8 0 0
MEMORYCLERK_SQLHTTP 8 0 0
CACHESTORE_BROKERUSERCERTLOOKUP 8 0 0
CACHESTORE_BROKERDSH 8 0 0
CACHESTORE_BROKERTO 8 0 0
CACHESTORE_BROKERKEK 8 0 0
MEMORYCLERK_SQLCLRASSEMBLY 0 4,244 4,244
MEMORYCLERK_SQLXML 0 0 0
MEMORYCLERK_FULLTEXT_SHMEM 0 0 0
MEMORYCLERK_SQLSOAPSESSIONSTORE 0 0 0
MEMORYCLERK_SQLQERESERVATIONS 0 0 0
MEMORYCLERK_SQLQUERYCOMPILE 0 0 0
MEMORYCLERK_SQLQUERYPLAN 0 0 0
MEMORYCLERK_SQLSOAP 0 0 0
MEMORYCLERK_QSRANGEPREFETCH 0 0 0
11,193,288
141,219,612
728,744
46066
AWE Memory
Allocated(KB)
Shared Memory
Reserved(KB)
Shared Memory
Committed(KB)
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
124,830,976 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
Total of all memory columns 277972620
which is just over 265 GIGs. This is way above our max memory setting of 120 Gig. How can this be? Could this be part of the problem?
The system slows way way down with cpu usage NOT going up. The waits are showing up with high numbers in both EC, EX wait types and Parallelism. sorry I dont have exact numbers but they both pegged the graph and are triple or more our normal. We rarely saw either on performance dashboard at all. Our norm is Cachestore and Broker_recieve_wait_for
This is on our production box as there was no effective way to test this level of hardware in a prod environment prior to rollout (It was not done)
The fixes we have implemented so far that are helping was two days ago set max parellelism to 8 then today to 2 and its alleviated the issue but not fixed the underlying causation. The server ran fine for 2 days after setting it to 8 which is how long it ran fine after introduction to production and if this happens again in two days there is nowhere else to go but set it to 1.
What could be causing this is the question1 and 2 is how can we show memory usage higher than that set for max server memory? Are the two related or is it cause and effect. We have an open ticket with Microsoft but I think you guys can beat them to the right answer both in quality and speed. They didnt think we should change the Parallelism setting but that got us this far.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Mohan Kumar SSCertifiable Points: 5365 More actions · Answer 1

Output of sp_configure indicates that you have not limited max memory setting for SQL Server.

max server memory (MB) 16 2147483647 2147483647 2147483647

If you want to limit max memory to 120 GB then you need to configure 122880. You may use following statement to do this:

sp_configure 'max server memory (MB)', 122880

GO

reconfigure with override

GO

[font="Verdana"]--www.sqlvillage.com[/size][/font]

Steve Jones - SSC Editor SSC Guru Points: 734552 More actions · Answer 2

You've got a lot of info there; too much to really go through in details, but can you answer a few questions.

What is the amount of memory in the new machine? You say much less, but didn't specify what.

If you open SSMS, what do you have max memory configured to.

My guess is that your system was using lots of memory. The additional CPUs do not necessarily help here.

Also, I'm not sure that you can total up all the memory there as the memory being used. There will be things paged out to disk, which might be counted as memory in some counters.

'nix Old Hand Points: 302 More actions · Answer 3

interesting. I'm sure I checked it as well as my boss and two other DBAs. How could we all have missed it. Its now set and I didnt change it. I need to check with the others and see who did make the change. I have since found out some more info. The old server had a large L3 Cache and the new one has larger L2 cache but no L3. I've read up that L2 can be shared across multiple cores but not across physical processors but the L3 is shared across the processors as well as the cores. I've looked up the specs for high end data warehousing and Database servers and all have large L3 caches. Being as L3 is twice as fast as L2 and is shared across the processors could this be a contributing factor?

The old server had 64 gig of ram with 124 L2 and 16mg L3

The new server has 400k+ L2 and no L3

Also we are now running with 4 quad cores vs 8 dual cores (two boxes configured by IBM to run one OS) I think this also would give us 2 L3 caches since its two boxed configured to work together.

Processor use drops from norm of 30 percent to 15 percent with locks high (not parallelism this time)

Steve Jones - SSC Editor SSC Guru Points: 734552 More actions · Answer 4

In terms of CPU drop, is that between the two boxes (old and new) or after you set max memory?

Not sure about L3, have to ask someone else about that, but 4 quads should be slightly less efficient than 8 duals, though I'm not sure how much. Depending on how the workload goes, if there are some really separate tasks (not parallelism of one query), I'd think the 8 CPUs would be much better.

Is the old box still around for testing?

So the old server was 64GB of RAM and the new server is 128GB of RAM? It wasn't quite clear from how you're posting if this is the case.

'nix Old Hand Points: 302 More actions · Answer 5

In terms of CPU drop, is that between the two boxes (old and new) or after you set max memory?

Not sure about L3, have to ask someone else about that, but 4 quads should be slightly less efficient than 8 duals, though I'm not sure how much. Depending on how the workload goes, if there are some really separate tasks (not parallelism of one query), I'd think the 8 CPUs would be much better.

Is the old box still around for testing?

So the old server was 64GB of RAM and the new server is 128GB of RAM? It wasn't quite clear from how you're posting if this is the case.

That is the difference in when we are experiencing issues and when things are running smoothly on the NEW machine. When things are smooth it runs at 35-40%. When things are going nuts its throttled DOWN to 15%. Looks like the CPUs are waiting on something (MEMORY READ OR Disk I/O?) I am sceptical on the disk I/O as they are the same ones we used on the old system, we just switched the LUNS to the new machines.

the old box is still around but not currently online (were using the same LUNs on the new machine)

The old server was 64 gig and the new one is 128 gig.

recent perfmon trace shows that 3 of 4 cores on each physical processor are running at 5% and the other on on each physical processor is running at about 40%.

Frank Brennan SSC Veteran Points: 212 More actions · Answer 6

Depending upon your actual ss memory usage you could be negatively impacting the OS's ability to service resource (CPU, MEMORY, I/O) requests. I would reduce the max memory in ss to about 110 (of real memory) assuming you are only using the server for ss.

Also,

1. If you aren't already, you should think about implementing x64 on this hardware to take max advantage (cache above 4GB, etc.). It looks like you may be running x64 since you don't have AWE enabled.

2. You should be running on ss2k5 build 3175 or better.

3. You should also check to see if you have "lock pages in memory" local privileges on the OS granted to the NT account running your engine. Otherwise, you may be getting some VM paging.

A couple of notes:

The problem that you are describing is a bit like the issue that was found regarding thread ss thread management on smp systems with hyper-threading. Performance takes a hit when ever there is a high thread count. The waits spike and very little gets done. However, this was due to contention of background threads while sharing the floating point process and L2 cache on each core. On the quad-core there is no HT and thus no sharing. I also believe that you have dual FSBs (probably about 1066Mhz each) which provide much better real memory access performance. This helps a log in support of request coming from those threads.

Another issue which was found is a problem involving NUMA on various servers. On some IBM system like the 3950 I believe they still employ the old architecture from x445. The sticks are connected to separate expansion cards and sync'ed across a relative slow BUS.

I'm editing this post to add one more thing which I feel may be of some help...

One major contributor to increased thread count (maxdop permitting) is the creation of not-so-optimal execution plans. A simple way to verify this is by simply running one of the sp's which isn't performing well on your new box on both the old box and the new box. capture the execution plans for each via trace and compare the two. I was able to do this while working a like problem and after capturing the optimal plan on one server i forced it on the slow performing server and even though the hardware was different (CPU, memory, etc.) the sp execute optimally.

If you haven't done so already, you should run sp_createstats and update stats with a full sample. Then DBCC FREEPROCCACHE to ensure that you pickup new plans.

Good luck.

'nix Old Hand Points: 302 More actions · Answer 7

1. If you aren't already, you should think about implementing x64 on this hardware to take max advantage (cache above 4GB, etc.). It looks like you may be running x64 since you don't have AWE enabled.

2. You should be running on ss2k5 build 3175 or better.

3. You should also check to see if you have "lock pages in memory" local privileges on the OS granted to the NT account running your engine. Otherwise, you may be getting some VM paging.

1. We are running x64

2. checking into the build level upgrade we are at 3186 now

3. please explain more about this and how to check

Old server:

X3950 CPUS

Intel Zeon PM 7140N

Intel Xeon CPU 3.33 ghz

Famil F

Model 6

Stepping 8

Ext. Family F

Ext. Model 6

Revision B0

instructions MMX, SSE, SSE2, SSE3, EM64T

Core Speed 3336.3 MHz

Multiplier x20

Bus speed 166.8 Mhz

Rated FSB

667.3 MHz

Cache L1 Data 2x16 kbytes

L2 Trace 2x12Kuops

Level2 2 2x1024 kbytes

level 3 16 MBytes

Cores 2 Threads 2

New servers

M2s

Intel Zeon X7350

Socket 604 mpga

Intel Xeon x7350 @ 2.93 GHz

Family 6

Model F

Stepping B

Ext Family 6

Ext Model F

revision G0

Instructions MMX, SSE, SSE2, SSE3, SSSE3 EM64T

Core Speed

1599.3 MHz

Multplier X6.0

bus Speed 266.5

Rated FSB 1066.2 MHz

cache

L1 4 x 32 kb

L1 inst 4x32 kb

L2 2x4096 kb

Cores 4 Threads 4

Frank Brennan SSC Veteran Points: 212 More actions · Answer 8

"lock pages in memory" privileges are set within the "Local Security Policy" mmc snapin. Within the snapin you will find this setting in the "User Rights Assignment" node of the tree. Select this polity from the right window pane and add either the nt account which is running your sql engine or the nt group which contain the account. With these privilges set the account will now have the local privileges necessary to lock real memory pages and prevent the OS from paging you out to vm. Restart the ss engine. However, be careful not to set your ss max server memory too high as you will negatively impact the OS and other apps.

'nix Old Hand Points: 302 More actions · Answer 9

I checked and we have it set correctly with local rights. This is a dedicated SS box and we have max server mem to 120 gig of 128. 8 should be plenty for the OS but I do still see paging occurring. I dont understand that with this much more memory...

Frank Brennan SSC Veteran Points: 212 More actions · Answer 10

You should have at least one processor core freed up for the OS by deselecting it from your CPU and I/O affinity.

Asside from your config you may want to address the possibltity that you might be getting some bad execution plans.

Did you bench mark the box outside of ss? When I install a new ss box I run "SQLIOSimx64" or "SQLIOSimx86". Running these untilities will benchmark exactly what your hardware is capable of before introducing your actual db environment into the equation (cpu, mem, i/o).

'nix Old Hand Points: 302 More actions · Answer 11

we updated the stats over the weekend, but didnt run DBCC FREEPROCCACHE. That has since been fixed. Good idea about dedicating a core to the OS, that will help rule out what is causing the slowdown in each physical processor, we were seeing 1 core in each running well but the other 3 slowing down to about 5% when we were having problems. Things are running ok right now, but we've seen that before over the last week, when we changed the Max degree of parallelism from 0 to 8 it got better for a while then slowed down again, we changed it to 4 and same thing, then 2 then 1....same thing. Ran fine for a while (1-2 day) then the slowdown occurs again.

Thanks for all of our great input it is appreciated! 😀

Unfortunately, we did not benchmark the machine prior to installation and putting it in production.....

Frank Brennan SSC Veteran Points: 212 More actions · Answer 12

No problem.

Note: Waits...

Now that you are using more processors you should also adjust your tempdb file layout for maximum performance. Tempdb pages are allocated and indexed using the first 4gb or less of each physical file assigned to tempdb. this means that as you add to the thread count on an smp system you create more conflict on tempdb requests. Only one thread at a time can update the allocation table for a single file. To avoid this you need to create (n) tempdb files where n=number of cpu's. span the files across your LUNs assigned to database files (not the log LUN) and make them all the same size and relatively large as to avoid unnecessary auto grows. Additially, setup the -T1118 trace flag for sql startup. The result is that your tempdb allocations will be scattered across multiple files all having their own allocation tables. This works even if the files are on the same LUN.

Good luck.

'nix Old Hand Points: 302 More actions · Answer 13

yes, we already do that on both the old and new system. Thats standard across our enterprise. We're discussing changing the affinity to give the OS its own core.....

Mohan Kumar SSCertifiable Points: 5365 More actions · Answer 14

How your page file setting is? Please refer http://support.microsoft.com/kb/889654 and see if you have set up correctly.

[font="Verdana"]--www.sqlvillage.com[/size][/font]