Slow running SQL server.

  • Hi, I have an SQL Server which is running a vendor DB so we can't touch the DB. This server is windows 2008 r2 and SQL 2008 R2 running a cluster on shared storage that is also dedicated.

    The application has started performing poorly recently and the vendor isn't proving very helpful at this point so I'm just wanting to seek advice here if possible. I'm not a SQL pro but can get by.

    I have noticed that the server has periods of extremely high CPU usage and this usage is all against the sqlserver service according to windows. I mean all CPUs will spike to 70-95% and stay there for say 30 seconds(server has 24 cores real cores not HT). On further investigation i noticed that CXPacket and SOS_scheduler_yield are the top two wait types, with Latch_ex being number 4.

    I'm not sure how to tackle this from a server hardware point of view? maxdop is set to 8 (though the server has 4 x 6core cpus so maybe should be lowered to 6?) costfor parra is still set to default of 5. but cxpatch wait suggests a parralle problem but sos_schedule_yield suggests that single threaded queries aren't performing well either?

    I have no doubt that the db isn't well designed and i know the vendor uses alot of heaps for example. But are there any further tweaks i can do to get the most out of this system before more long calls with vendor?

  • Quick thought, the cost threshold for parallelism is way too low, would suggest setting it to 50 for a start. Can you provide more information on the setup?

    😎

  • So we have two HP servers running W2008r2 in a cluster. one node runs the SQL engine, the other the application engine but if a server fails the SQL db fails over to the app server or vice versa.

    Both server a HP DL580 g5 so about 4 years old now and soon to be replace. 4 x 6core cpus around 2.7ghz, 64gb of ram and both connect to a HP EVA4400 SAN. The vendor DB is the only DB running on this instance of SQL and there is only the one instance of SQL running on these servers.

    I have researched the cost of parra setting and have read there can be value in changing it to around 30-50 on modern hardware. Maxdop is set to 8 as per vendor recommendation although my own research suggests this should be set to the number of real cores on a single cpu? Although these servers are pre numa arch so probably less critical? This is the last of the core2duo stuff.

    I actually expected storage to be the main issue but all storage related waits seem low.

  • BTW i read i shouldn't make any changed to maxdop or cost of parra during business hours as SQL will dump the plan cache?

  • davidwarner (10/16/2014)


    BTW i read i shouldn't make any changed to maxdop or cost of parra during business hours as SQL will dump the plan cache?

    That is correct, so be careful!

    It is likely that a server wide setting of MAXDOP = 1 would be helpful in your case as CXPacket waits are high but before you start to tinker with the settings, make certain that the statistics are up to date, outdated statistics are one of the main causes for the optimizer making "wrong" parallel choices.

    😎

  • Eirikur Eiriksson (10/16/2014)


    It is likely that a server wide setting of MAXDOP = 1 would be helpful in your case as CXPacket waits are high

    Let me politely disagree. Setting MAXDOP to 1 will make the CXPACKET waits go away, but won't do any good to performance, it will probably make things worse.

    You have to ask yourself why you have such high waits for parallel operations.

    First of all, is this an OLTP database or a DWH or what?

    In OLTP workloads, SQL Server decides to go parallel when it has to work on huge amounts of data, which often means there is no efficient path to the data (e.g. a convenient index) or the syntax of the request cannot make use of the existing indexes (non-sargable predicates? Poorly written queries?).

    In this case, I would start looking at the plan cache and extract the top 20 statements in terms of CPU consumption. They will probably be parallel plans. Tune those first and see what happens. Rinse and repeat until performance gets back to acceptable.

    I concur that the cost threshold for parallelism is waaaay too low. Set it to at least 30. Whether you should worry about the plan cache getting flushed depends on how busy is the server. In general, if you have long running queries and the server is not too busy, the cost of the recompilation is negligible compared to the cost of the query execution, so thrashing the cache is not really an issue.

    In DWH workloads, parallel queries are normal and there is nothing you can or should do to avoid it.

    In general, tinkering with server options or OS/HW settings is the least effective way of tuning a SQL Server. The most effective levels to work on are (in decreasing order):

    * Database Design

    * Query optimization

    * Index optimization

    * Concurrency tuning

    * Server tuning

    If you can't go through this process with the vendor, I suggest that you hire a consultant: you won't regret it.

    -- Gianluca Sartori

  • DB workload is OLTP, sorry should of mentioned that right off the bat.

    DB has nightly maint jobs as per vendors spec which reindex, update stats etc.

  • You are absolutely right Gianluca, the source of the problem has to be determined before applying any fixes. What I left out in the previous post was of course using the MAXDOP hint in the queries that obviously would benefit from a parallel execution. Not certain what the options are, looks like it's rather tied down on the vendor side, significantly reduces the number of tuning opportunities if no changes can be made to the code, schema etc..

    😎

  • davidwarner (10/15/2014)


    Hi, I have an SQL Server which is running a vendor DB so we can't touch the DB.

    Even if it is a vendor DB you still can make some modifications without altering the schema. Usually you can add indexes and also create plan guides.

    So, look at the missing indexes first, maybe there is some room for improvement. And also look at the most CPU intensive queries and their plans. Maybe there is a way to improve them without changing.


    Alex Suprun

  • I think i might have worked out the issue at least whats been grinding the server to a halt the past few days.

    This morning i came in early before there were many users on and logged in and saw Windows task manager reporting still avout 30-40% CPU constant. I was like what the.... Than i saw that when the server was rebooted on the week end just past something had gone wrong with the data collection jobs and one was stuck in a cancelling state since basically SQL was restart. When i force cancelled this just all CPU cores just dropped to 0% util, Been monitoring the past 3 hours and not seen any spikes longer than a few seconds above 30% which is more normal for our environment. Certainly not the 70-90% mark i had yesterday.

    I still want to make those few changes to cost of parra and maxdop what is a safe starting point for cost? most seem to say 50?

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply