Memory_Parallelism Performance Problem

  • Running Sql server 2005 on a 4XQuadcore Intel processor w128 gig of ram. Max server memory set to 120 Gig. We did a server migration from a 8Xdual core machine with less ram last week

    Since this migration we have seen a sudden jump on the 3rd day in producton of locks, especially those involving Parallelism where we had no issues before. All other configuration settings on the box and in sql are set the same as the old server and have been verified. Here are my sp_config values:

    Here is a copy of the memory standard report in .txt format

    nameminimummaximumconfig_valuerun_value

    Ad Hoc Distributed Queries0100

    affinity I/O mask-2147483648214748364700

    affinity mask-2147483648214748364700

    Agent XPs0111

    allow updates0100

    awe enabled0100

    blocked process threshold08640000

    c2 audit mode0100

    clr enabled0111

    common criteria compliance enabled0100

    cost threshold for parallelism03276755

    cross db ownership chaining0100

    cursor threshold-12147483647-1-1

    Database Mail XPs0111

    default full-text language0214748364710331033

    default language0999900

    default trace enabled0111

    disallow results from triggers0100

    fill factor (%)010000

    ft crawl bandwidth (max)032767100100

    ft crawl bandwidth (min)03276700

    ft notify bandwidth (max)032767100100

    ft notify bandwidth (min)03276700

    index create memory (KB)704214748364700

    in-doubt xact resolution0200

    lightweight pooling0100

    locks5000214748364700

    max degree of parallelism06400

    max full-text crawl range025644

    max server memory (MB)16214748364721474836472147483647

    max text repl size (B)021474836476553665536

    max worker threads1283276700

    media retention036500

    min memory per query (KB)512214748364710241024

    min server memory (MB)0214748364708

    nested triggers0111

    network packet size (B)5123276740964096

    Ole Automation Procedures0100

    open objects0214748364700

    PH timeout (s)136006060

    precompute rank0100

    priority boost0100

    query governor cost limit0214748364700

    query wait (s)-12147483647-1-1

    recovery interval (min)03276700

    remote access0111

    remote admin connections0100

    remote login timeout (s)021474836472020

    remote proc trans0100

    remote query timeout (s)02147483647600600

    Replication XPs0100

    scan for startup procs0100

    server trigger recursion0111

    set working set size0100

    show advanced options0111

    SMO and DMO XPs0111

    SQL Mail XPs0100

    transform noise words0100

    two digit year cutoff1753999920492049

    user connections03276700

    user options03276700

    Web Assistant Procedures0100

    xp_cmdshell0100

    Memory Consumption

    on svr-sql-prod03\prod at 3/21/2008 12:20:03 PM

    This report provides detailed data on the memory consumption of components within the Instance as well as historical data

    f ri rddb h Df lT

    Memory Grants Outstanding 0

    Memory Grants Pending 0

    Page life expectancy

    Memory Usage By Components

    Component Type

    Allocated

    memory(KB)

    Virtual Memory

    Reserved(KB)

    Virtual Memory

    Committed(KB)

    CACHESTORE_SQLCP 9,510,904 0 0

    CACHESTORE_OBJCP 572,504 0 0

    USERSTORE_TOKENPERM 286,320 0 0

    MEMORYCLERK_SOSNODE 209,272 0 0

    MEMORYCLERK_SQLGENERAL 147,568 0 0

    OBJECTSTORE_LOCK_MANAGER 102,760 524,288 524,288

    MEMORYCLERK_SQLUTILITIES 43,192 840 840

    CACHESTORE_PHDR 42,976 0 0

    USERSTORE_SCHEMAMGR 41,848 0 0

    MEMORYCLERK_SQLCLR 41,824 6,312,960 56,204

    OBJECTSTORE_SNI_PACKET 39,752 0 0

    MEMORYCLERK_SQLSTORENG 34,592 12,096 12,096

    OBJECTSTORE_LBSS 33,952 0 0

    USERSTORE_DBMETADATA 27,640 0 0

    MEMORYCLERK_SQLCONNECTIONPOOL 20,752 0 0

    MEMORYCLERK_BHF 14,176 0 0

    USERSTORE_SXC 5,192 0 0

    CACHESTORE_SYSTEMROWSET 5,096 0 0

    USERSTORE_OBJPERM 3,072 0 0

    MEMORYCLERK_SNI 2,552 0 0

    MEMORYCLERK_SQLOPTIMIZER 2,328 0 0

    MEMORYCLERK_SQLBUFFERPOOL 2,056 134,365,184 131,072

    CACHESTORE_BROKERTBLACS 640 0 0

    OBJECTSTORE_SERVICE_BROKER 464 0 0

    MEMORYCLERK_SQLSERVICEBROKER 440 0 0

    CACHESTORE_TEMPTABLES 440 0 0

    MEMORYCLERK_HOST 232 0 0

    CACHESTORE_XPROC 232 0 0

    CACHESTORE_BROKERREADONLY 128 0 0

    MEMORYCLERK_SQLSERVICEBROKERTRANSPO 80 0 0

    CACHESTORE_CLRPROC 72 0 0

    CACHESTORE_XMLDBTYPE 32 0 0

    MEMORYCLERK_SQLQUERYEXEC 32 0 0

    CACHESTORE_VIEWDEFINITIONS 16 0 0

    MEMORYCLERK_SQLXP 16 0 0

    CACHESTORE_EVENTS 16 0 0

    CACHESTORE_BROKERRSB 16 0 0

    CACHESTORE_NOTIF 16 0 0

    CACHESTORE_STACKFRAMES 16 0 0

    MEMORYCLERK_FULLTEXT 16 0 0

    CACHESTORE_XMLDBELEMENT 8 0 0

    CACHESTORE_XMLDBATTRIBUTE 8 0 0

    MEMORYCLERK_SQLHTTP 8 0 0

    CACHESTORE_BROKERUSERCERTLOOKUP 8 0 0

    CACHESTORE_BROKERDSH 8 0 0

    CACHESTORE_BROKERTO 8 0 0

    CACHESTORE_BROKERKEK 8 0 0

    MEMORYCLERK_SQLCLRASSEMBLY 0 4,244 4,244

    MEMORYCLERK_SQLXML 0 0 0

    MEMORYCLERK_FULLTEXT_SHMEM 0 0 0

    MEMORYCLERK_SQLSOAPSESSIONSTORE 0 0 0

    MEMORYCLERK_SQLQERESERVATIONS 0 0 0

    MEMORYCLERK_SQLQUERYCOMPILE 0 0 0

    MEMORYCLERK_SQLQUERYPLAN 0 0 0

    MEMORYCLERK_SQLSOAP 0 0 0

    MEMORYCLERK_QSRANGEPREFETCH 0 0 0

    11,193,288

    141,219,612

    728,744

    46066

    AWE Memory

    Allocated(KB)

    Shared Memory

    Reserved(KB)

    Shared Memory

    Committed(KB)

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    124,830,976 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    0 0 0

    Total of all memory columns 277972620

    which is just over 265 GIGs. This is way above our max memory setting of 120 Gig. How can this be? Could this be part of the problem?

    The system slows way way down with cpu usage NOT going up. The waits are showing up with high numbers in both EC, EX wait types and Parallelism. sorry I dont have exact numbers but they both pegged the graph and are triple or more our normal. We rarely saw either on performance dashboard at all. Our norm is Cachestore and Broker_recieve_wait_for

    This is on our production box as there was no effective way to test this level of hardware in a prod environment prior to rollout (It was not done)

    The fixes we have implemented so far that are helping was two days ago set max parellelism to 8 then today to 2 and its alleviated the issue but not fixed the underlying causation. The server ran fine for 2 days after setting it to 8 which is how long it ran fine after introduction to production and if this happens again in two days there is nowhere else to go but set it to 1.

    What could be causing this is the question1 and 2 is how can we show memory usage higher than that set for max server memory? Are the two related or is it cause and effect. We have an open ticket with Microsoft but I think you guys can beat them to the right answer both in quality and speed. They didnt think we should change the Parallelism setting but that got us this far.

  • Output of sp_configure indicates that you have not limited max memory setting for SQL Server.

    max server memory (MB) 16 2147483647 2147483647 2147483647

    If you want to limit max memory to 120 GB then you need to configure 122880. You may use following statement to do this:

    sp_configure 'max server memory (MB)', 122880

    GO

    reconfigure with override

    GO

    [font="Verdana"]--www.sqlvillage.com[/size][/font]

  • You've got a lot of info there; too much to really go through in details, but can you answer a few questions.

    What is the amount of memory in the new machine? You say much less, but didn't specify what.

    If you open SSMS, what do you have max memory configured to.

    My guess is that your system was using lots of memory. The additional CPUs do not necessarily help here.

    Also, I'm not sure that you can total up all the memory there as the memory being used. There will be things paged out to disk, which might be counted as memory in some counters.

  • interesting. I'm sure I checked it as well as my boss and two other DBAs. How could we all have missed it. Its now set and I didnt change it. I need to check with the others and see who did make the change. I have since found out some more info. The old server had a large L3 Cache and the new one has larger L2 cache but no L3. I've read up that L2 can be shared across multiple cores but not across physical processors but the L3 is shared across the processors as well as the cores. I've looked up the specs for high end data warehousing and Database servers and all have large L3 caches. Being as L3 is twice as fast as L2 and is shared across the processors could this be a contributing factor?

    The old server had 64 gig of ram with 124 L2 and 16mg L3

    The new server has 400k+ L2 and no L3

    Also we are now running with 4 quad cores vs 8 dual cores (two boxes configured by IBM to run one OS) I think this also would give us 2 L3 caches since its two boxed configured to work together.

    Processor use drops from norm of 30 percent to 15 percent with locks high (not parallelism this time)

  • In terms of CPU drop, is that between the two boxes (old and new) or after you set max memory?

    Not sure about L3, have to ask someone else about that, but 4 quads should be slightly less efficient than 8 duals, though I'm not sure how much. Depending on how the workload goes, if there are some really separate tasks (not parallelism of one query), I'd think the 8 CPUs would be much better.

    Is the old box still around for testing?

    So the old server was 64GB of RAM and the new server is 128GB of RAM? It wasn't quite clear from how you're posting if this is the case.

  • In terms of CPU drop, is that between the two boxes (old and new) or after you set max memory?

    Not sure about L3, have to ask someone else about that, but 4 quads should be slightly less efficient than 8 duals, though I'm not sure how much. Depending on how the workload goes, if there are some really separate tasks (not parallelism of one query), I'd think the 8 CPUs would be much better.

    Is the old box still around for testing?

    So the old server was 64GB of RAM and the new server is 128GB of RAM? It wasn't quite clear from how you're posting if this is the case.

    That is the difference in when we are experiencing issues and when things are running smoothly on the NEW machine. When things are smooth it runs at 35-40%. When things are going nuts its throttled DOWN to 15%. Looks like the CPUs are waiting on something (MEMORY READ OR Disk I/O?) I am sceptical on the disk I/O as they are the same ones we used on the old system, we just switched the LUNS to the new machines.

    the old box is still around but not currently online (were using the same LUNs on the new machine)

    The old server was 64 gig and the new one is 128 gig.

    recent perfmon trace shows that 3 of 4 cores on each physical processor are running at 5% and the other on on each physical processor is running at about 40%.

  • Depending upon your actual ss memory usage you could be negatively impacting the OS's ability to service resource (CPU, MEMORY, I/O) requests. I would reduce the max memory in ss to about 110 (of real memory) assuming you are only using the server for ss.

    Also,

    1. If you aren't already, you should think about implementing x64 on this hardware to take max advantage (cache above 4GB, etc.). It looks like you may be running x64 since you don't have AWE enabled.

    2. You should be running on ss2k5 build 3175 or better.

    3. You should also check to see if you have "lock pages in memory" local privileges on the OS granted to the NT account running your engine. Otherwise, you may be getting some VM paging.

    A couple of notes:

    The problem that you are describing is a bit like the issue that was found regarding thread ss thread management on smp systems with hyper-threading. Performance takes a hit when ever there is a high thread count. The waits spike and very little gets done. However, this was due to contention of background threads while sharing the floating point process and L2 cache on each core. On the quad-core there is no HT and thus no sharing. I also believe that you have dual FSBs (probably about 1066Mhz each) which provide much better real memory access performance. This helps a log in support of request coming from those threads.

    Another issue which was found is a problem involving NUMA on various servers. On some IBM system like the 3950 I believe they still employ the old architecture from x445. The sticks are connected to separate expansion cards and sync'ed across a relative slow BUS.

    I'm editing this post to add one more thing which I feel may be of some help...

    One major contributor to increased thread count (maxdop permitting) is the creation of not-so-optimal execution plans. A simple way to verify this is by simply running one of the sp's which isn't performing well on your new box on both the old box and the new box. capture the execution plans for each via trace and compare the two. I was able to do this while working a like problem and after capturing the optimal plan on one server i forced it on the slow performing server and even though the hardware was different (CPU, memory, etc.) the sp execute optimally.

    If you haven't done so already, you should run sp_createstats and update stats with a full sample. Then DBCC FREEPROCCACHE to ensure that you pickup new plans.

    Good luck.

  • 1. If you aren't already, you should think about implementing x64 on this hardware to take max advantage (cache above 4GB, etc.). It looks like you may be running x64 since you don't have AWE enabled.

    2. You should be running on ss2k5 build 3175 or better.

    3. You should also check to see if you have "lock pages in memory" local privileges on the OS granted to the NT account running your engine. Otherwise, you may be getting some VM paging.

    1. We are running x64

    2. checking into the build level upgrade we are at 3186 now

    3. please explain more about this and how to check

    Old server:

    X3950 CPUS

    Intel Zeon PM 7140N

    Intel Xeon CPU 3.33 ghz

    Famil F

    Model 6

    Stepping 8

    Ext. Family F

    Ext. Model 6

    Revision B0

    instructions MMX, SSE, SSE2, SSE3, EM64T

    Core Speed 3336.3 MHz

    Multiplier x20

    Bus speed 166.8 Mhz

    Rated FSB

    667.3 MHz

    Cache L1 Data 2x16 kbytes

    L2 Trace 2x12Kuops

    Level2 2 2x1024 kbytes

    level 3 16 MBytes

    Cores 2 Threads 2

    New servers

    M2s

    Intel Zeon X7350

    Socket 604 mpga

    Intel Xeon x7350 @ 2.93 GHz

    Family 6

    Model F

    Stepping B

    Ext Family 6

    Ext Model F

    revision G0

    Instructions MMX, SSE, SSE2, SSE3, SSSE3 EM64T

    Core Speed

    1599.3 MHz

    Multplier X6.0

    bus Speed 266.5

    Rated FSB 1066.2 MHz

    cache

    L1 4 x 32 kb

    L1 inst 4x32 kb

    L2 2x4096 kb

    Cores 4 Threads 4

  • "lock pages in memory" privileges are set within the "Local Security Policy" mmc snapin. Within the snapin you will find this setting in the "User Rights Assignment" node of the tree. Select this polity from the right window pane and add either the nt account which is running your sql engine or the nt group which contain the account. With these privilges set the account will now have the local privileges necessary to lock real memory pages and prevent the OS from paging you out to vm. Restart the ss engine. However, be careful not to set your ss max server memory too high as you will negatively impact the OS and other apps.

  • I checked and we have it set correctly with local rights. This is a dedicated SS box and we have max server mem to 120 gig of 128. 8 should be plenty for the OS but I do still see paging occurring. I dont understand that with this much more memory...

  • You should have at least one processor core freed up for the OS by deselecting it from your CPU and I/O affinity.

    Asside from your config you may want to address the possibltity that you might be getting some bad execution plans.

    Did you bench mark the box outside of ss? When I install a new ss box I run "SQLIOSimx64" or "SQLIOSimx86". Running these untilities will benchmark exactly what your hardware is capable of before introducing your actual db environment into the equation (cpu, mem, i/o).

  • we updated the stats over the weekend, but didnt run DBCC FREEPROCCACHE. That has since been fixed. Good idea about dedicating a core to the OS, that will help rule out what is causing the slowdown in each physical processor, we were seeing 1 core in each running well but the other 3 slowing down to about 5% when we were having problems. Things are running ok right now, but we've seen that before over the last week, when we changed the Max degree of parallelism from 0 to 8 it got better for a while then slowed down again, we changed it to 4 and same thing, then 2 then 1....same thing. Ran fine for a while (1-2 day) then the slowdown occurs again.

    Thanks for all of our great input it is appreciated! 😀

    Unfortunately, we did not benchmark the machine prior to installation and putting it in production.....

  • No problem.

    Note: Waits...

    Now that you are using more processors you should also adjust your tempdb file layout for maximum performance. Tempdb pages are allocated and indexed using the first 4gb or less of each physical file assigned to tempdb. this means that as you add to the thread count on an smp system you create more conflict on tempdb requests. Only one thread at a time can update the allocation table for a single file. To avoid this you need to create (n) tempdb files where n=number of cpu's. span the files across your LUNs assigned to database files (not the log LUN) and make them all the same size and relatively large as to avoid unnecessary auto grows. Additially, setup the -T1118 trace flag for sql startup. The result is that your tempdb allocations will be scattered across multiple files all having their own allocation tables. This works even if the files are on the same LUN.

    Good luck.

  • yes, we already do that on both the old and new system. Thats standard across our enterprise. We're discussing changing the affinity to give the OS its own core.....

  • How your page file setting is? Please refer http://support.microsoft.com/kb/889654 and see if you have set up correctly.

    [font="Verdana"]--www.sqlvillage.com[/size][/font]

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply