Increasing CPUs hurts Perf

Question

Increasing CPUs hurts Perf

Viewing 15 posts - 16 through 30 (of 95 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

The problem is not parallelism itself. It's the minimum cost at which the server starts using it. P. also comes with the costs of remerging the results which costs CPU on top of the waits on the other cpu to finish something.

Grant has much more experience in this sphere than I do but IIRC he recommends raising the default value in any oltp environement to something between 25 and 35 (yes he's that vague). The problem is that you need to thoroughly check and verify this for your own system. No 2 systems are alike so a static figure is not really usefull...

The default setting is like 5 which is way too low in his experience.

http://msdn.microsoft.com/en-us/library/ms190949(v=SQL.100).aspx

Grant Fritchey SSC Guru Points: 399330 More actions · Answer 2

Ninja's_RGR'us (7/4/2011)
The problem is not parallelism itself. It's the minimum cost at which the server starts using it. P. also comes with the costs of remerging the results which costs CPU on top of the waits on the other cpu to finish something.
Grant has much more experience in this sphere than I do but IIRC he recommends raising the default value in any oltp environement to something between 25 and 35 (yes he's that vague). The problem is that you need to thoroughly check and verify this for your own system. No 2 systems are alike so a static figure is not really usefull...
The default setting is like 5 which is way too low in his experience.
http://msdn.microsoft.com/en-us/library/ms190949(v=SQL.100).aspx

I'm still vague, but my numbers have gone up. I recommend 35-50 now. I've just seen too many estimated costs hovering up in the 30 range for queries that run in a second or less without parallelism.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

Thanks for coming in. I feel squeezy giving advice about something I never had to deal with! 😉

spaghettidba SSC Guru Points: 105732 More actions · Answer 4

Ninja's_RGR'us (7/4/2011)
The default setting is like 5 which is way too low in his experience.

I heartedly agree. I had to set it to 20 on my main OLTP instance, but I'm far from recommending the same to you. Every instance has its own optimal setting and you will have to test a lot before choosing a value that fits your workload.

Since you're using SQL2008, you could limit the DOP with the resource governor, which is a much better choice than setting it at instance level.

IMHO, 2 is more than enough for an OLTP database. If you want to benefit from a higher DOP for some particular operations, set up a specific workload group with higher DOP.

Paul Randal has a nice blog post on MAXDOP with some rough guidelines here: http://www.sqlskills.com/BLOGS/PAUL/post/MAXDOP-configuration-survey-results.aspx

Hope this helps

Gianluca

-- Gianluca Sartori

spaghettidba SSC Guru Points: 105732 More actions · Answer 5

Almost forgot: here you will find a great blog post by Jonathan Kehayias that deals with cost threshold for parallelism:

http://sqlblog.com/blogs/jonathan_kehayias/archive/2010/01/19/tuning-cost-threshold-of-parallelism-from-the-plan-cache.aspx

Also, consider that sometimes a parallel plan is not the best choice and the optimizer is tricked into that by insufficient indexing or outdated statistics.

-- Gianluca Sartori

as2higpark SSC Eights! Points: 949 More actions · Answer 6

Thanks everyone, I will take a look at that blog, just for some more detail on our system.

It is a mix system, the clients do a TON of transaction processing, but many of the display pages function almost like a reporting engine, so it is hard to set a cookie cutter setting for our systems.

In terms of indexing and stats, we should be ok, we monitor index usage and fragmentation pretty well and we were able to rebuild all indexes from scratch just the other day and have updated stats a bunch of times in the last week.

The good news is that I have a little bit of play with the settings, the increases performance is something we have noticed, but the increase isn't quite to the level where the end users will feel it yet ;).

Grant, in terms of the parallelism setting, I have tried upto 20 so far, do you think I should try something like 48 (double the logical cores)? or is there a more scientific way of determining best case?

as2higpark SSC Eights! Points: 949 More actions · Answer 7

...forgive the dumb newbie...I was confusing the 2 settings.

Sorry, I have not touched the Cost threshold setting yet, only the max setting. The cost is at 5 and the Max is (currently) at 0.

If I increase the cost up to say 20 / 30, where should I set the max setting? Match it or leave at 0?

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 8

CXPACKETs are going to happen as long as MAXDOP != 1, the trick is to reduce them. If this is primarily an OLTP system there may not be much point to setting it to something other than 1. If you have a mixed workload since you don't have resource governor (Enterprise Edition only, you said this is Web Edition) then you may be in for some fun by letting MAXDOP for the instance be > 1 and then adding MAXDOP query options to the specific problem queries that you want to run without parallelism.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 9

as2higpark (7/4/2011)
...forgive the dumb newbie...I was confusing the 2 settings.
Sorry, I have not touched the Cost threshold setting yet, only the max setting. The cost is at 5 and the Max is (currently) at 0.
If I increase the cost up to say 20 / 30, where should I set the max setting? Match it or leave at 0?

Yes leave maxdop at 0 for now. Sql is pretty smart with this. It mainly needs help with setting the min threshold value.

You may want to top it later so that 1 single report doesn't use all 24 processors but that's another problem for another day :w00t:.

as2higpark SSC Eights! Points: 949 More actions · Answer 10

Awesome, thanks guys.

I am going to set to MAX of 0 and Cost Threshold of 18, the query to see the plan caches actually showed a TON of them around 19ish.

I will check in with how it performs under load tomorrow AM.

Thanks again and have a good rest of the holiday weekend to those in the US.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 11

as2higpark (7/4/2011)
Awesome, thanks guys.
I am going to set to MAX of 0 and Cost Threshold of 18, the query to see the plan caches actually showed a TON of them around 19ish.

Were those queries for OLTP or OLAP?

If they were for oltp and pretty "simple stuff" you might play with the idea to use 25-30 instead of 18-19.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 12

Make sure to take a snapshot of the waits before changing the setting.

And clear the wait stats cache so you get a fresh report in the coming hours / days.

as2higpark SSC Eights! Points: 949 More actions · Answer 13

Ninja,

They are OLTP queries and the bane of my existence....

They are dynamically generated so in many cases they are single use plans.....we are currently fighting with development on that fun little nugget.

I will start with 18 and tweak up if no change happens.

Thanks again....

repicurus Right there with Babe Points: 727 More actions · Answer 14

as2higpark (7/1/2011)
Hello everyone,
I have an issue that I have been pounding my head against a wall for days on.
We have a large client DB(400 GB) that we recently moved from a good box with a dual quad (8 total CPUs) Xeon 3.00GHz box to another box with Dual 12 core (24 total CPUs) Opteron 2.1GHz procs. The performance has decreased by close to 50%. Average request time (Web application) has doubled.
The RAM, and SQL install are identical (32GB of RAM per box, allocated from 8GB to 20GB, changed during troubleshooting). SQL 2008 10.0.4000 web edition both. The only differences that I can see is a new version of windows (2003 to 2008 R2) and the CPU change.
We have been trying everything we can think of to bring performance back in line to no avail. We have tried rebuilding all existing indexes, adding suggested ones, updating stats, changing the parallelism settings and nothing has seemed to work. (Max Parrallelism currently set to 0, we have tried 1,4,8,10, and 20)
I am hoping that someone may have stumbled upon some quirk with either this version of Windows and/or increasing CPUs.
Any thoughts, I am at a loss and looking for things to try.
Thanks for any help.

I would check to see if HYPER-THREADING (not "Hyper-V") is enabled in the BIOS for these servers.

If so - disable it.

Reason: Hyper-threading can conflict with SQL performance because it effectively doubles the number of physical CPU's by creating virtual CPU's. This, in turn, will make SQL Server "think" there are twice as many CPU's than there really are, resulting in SQL create DOUBLE the worker/schedulers, which actually causes SQL to work harder and results in a sort of thrashing that is made most manifest during times of high activity.

It happened to me after upgrading to SQL 2008 R2 on a new DELL (510R) - it came configured with HT enabled.

That fixed the performance issues.

Side note: HT was intended for use in FLOP-intensive applications/services, NOT integer-based ops (like database apps/services).

The other excellent recommendations made herein, especially regarding MAXDOP and MAX MEMORY settings should be considered, as well.

Eddie Wuerch SSChampion Points: 12542 More actions · Answer 15

A CXPACKET wait is experienced by a thread in a parallel query after it has finished its portion of the work, and is waiting for the other threads to complete theirs.

With CXPACKET waits as high as you're seeing, I would be concerned that the statistics may be out of date, and the optimizer is splitting the work between the threads improperly. The result is a single thread performing a disproportionate amount of the work, while the other threads are done with theirs, and logging CXPACKET waits.

You can check the age of statistics on indexes with the STATS_DATE() function:

SELECT STATS_DATE(object_id, index_id), OBJECT_NAME(object_id), name

FROM sys.indexes

Apply filters to limit the results to the big tables that are hit most in those queries to see specifics (WHERE object_id = object_id('dbo.YourTableName').

-Eddie

Eddie Wuerch
MCM: SQL