SQL Server Performance Problems

Question

SQL Server Performance Problems

kiffab

Right there with Babe

Points: 730
More actions
December 18, 2012 at 10:34 am

#267631

Hi Guys
Occasionally at peak times of the day I get I/O errors on my SQL server (v 2005, 16 core 2.4 ghz, 58 GB ram).
These errors read "SQL server has encountered x occurences of i/o requests taking longer than 15 seconds to complete on ....". The drive is always the data drive but multiple database are affected.
Various indexes and tweaks were put in place over the last year to stabilise things but it's starting to show signs of performance problems again. I've made all the usual changes like splitting data/logs, temp db on own drive, etc. The biggest DB is around 1 TB.
I have installed SQL server 2005 performance dashboard.
What's the best way to use it?
For expensive queries, what am I most interested in from CPU, duration, logical reads, physical reads, logical writes, CLR time? I can feed this back to suppliers and have them tweak their SQL if necessary. I also see various indexes that it suggests I should add.
What else can I check?
Thanks.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 1

This is really an I/O subsystem issue, and less a query tuning issue. The load from your server exceeds what the I/O system can handle.

http://mssqlwiki.com/2012/08/27/io-requests-taking-longer-than-15-seconds-to-complete-on-file/

Tuning queries to use less reads can help, and indexing better might help, but this is a bit of an art. Really you need more/faster disks in the short term.

In looking at queries, I'd start with a workload trace, then feed that to the DTA and get an idea of what indexes it might recommend.

You can also take your trace and look for those queries with lots of reads in them (or writes) and see if there is something you can do to reduce the reads. Indexing helps here, but potentially rewriting SQL to work on smaller sets of data can help.

kiffab Right there with Babe Points: 730 More actions · Answer 2

Hi Steve

Thanks for the prompt response. I will read through the article attached.

Given that I see nothing but expansion in the future, it sounds like I can tweak queries and index as much as I like but eventually this problem will surface again. In fact I'd say it is now as this problem had disappeared for a period of time as I changed the block write size to 64k from 8 or whatever the default was.

The cluster nodes appear to have plenty of free CPU and mem but I guess that doesn't rule out the SAN.

I need more evidence in order to request better kit so I will follow the steps in the link and may come back with a question or 2!

Thanks.

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 3

One thing you might examine is your waits for the various files. If you see waits stacking up, then you have IO issues.

A few more references for you:

http://www.simple-talk.com/sql/performance/sql-server-wait-events-taking-the-guesswork-out-of-performance-profiling/

http://www.simple-talk.com/sql/performance/a-performance-troubleshooting-methodology-for-sql-server/

http://www.sqlskills.com/blogs/paul/post/capturing-wait-stats-for-a-single-operation.aspx

kiffab Right there with Babe Points: 730 More actions · Answer 4

Regarding the waits, SQL server performance dashboard shows below (taken at 11:20 today)

Wait Category..................Number of Waits..........Wait Time (sec)..........% Wait Time

Parallelism...........................724065872.................6319246.421...............28.06%

Other................................1232197629................5075559.493...............22.54%

Sleep.................................425224734..................4091093.732...............18.17%

Latch................................1642991523.................3378139.014...............15.00%

Buffer IO............................148173427...................2212876.593..............9.83%

I am not sure whether that seems good bad or normal! Apologies for the dots....

kiffab Right there with Babe Points: 730 More actions · Answer 5

The top wait on live always seems to be this:

wait_typewait_time_ms

MSQL_XP26126343

I believe this is due to SQL waiting on extended stored procs to complete yet I can't find a process relating to this. I use RedGate for SQL backup and the general feedback online is that an update from v5 is required.

Jon.Morisi SSChampion Points: 12846 More actions · Answer 6

It looks like you have got a lot of things going on performance wise. CXPACKET waits (parallelism) should be addressed if > 5%. The IO waiting more that 15 sec indicates an IO subsystem bottleneck and the IO related waits seem to support this. However, that doesn't necessarily mean you need a faster IO subsystem, it could be a number of things including bad indexing, bad maintenance, or a big ad hoc query that needs tuning. Have you run any Performance Monitor Counter logs?

If I were in your shoes I'd start with addressing the CXPACKET waits by modifying Cost threshold for parallelism and Max Degree of Parallelism because that's a relatively easy one. Next I'd check index fragmentation and the last time statistics were updated. If there's no routine maintenance occurring, put it in place.

Next I'd run a PAL analysis, and go from there:

http://pal.codeplex.com/

HTH,

Jon

kiffab Right there with Babe Points: 730 More actions · Answer 7

Thanks Jon. I'll read up on this.

FYI -

max degree of parallelism = 0

Locks = 0

Cost = 5

Query waits = -1

I'm guessing this is the out the box setup as I've never changed it. Does this ring alarm bells or seem perfectly reasonable?

I have 16 processors @ 2.4 GHZ. 58 GB memory with 48 GB allocated to SQL.

Cheers

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

Jon.Morisi (12/19/2012)
If I were in your shoes I'd start with addressing the CXPACKET waits by modifying Cost threshold for parallelism and Max Degree of Parallelism because that's a relatively easy one.

Fixing excessive CXPacket waits isn't as simple as that (sure, maxdop 1 will make them go away entirely, but...). Requires investigating whether the CX packet waits are a problem (any time a query runs in parallel, you will get CXPacket waits), what, if anything, are the threads that aren't waiting on CXPacket waiting for, whether there's data skew, etc

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Adam Machanic SSCoach Points: 15370 More actions · Answer 9

Jon.Morisi (12/19/2012)
CXPACKET waits (parallelism) should be addressed if > 5%.

That's a rather specific pronouncement, especially when you don't know anything about the workload in question. What's your logic there?

--
Adam Machanic
whoisactive

kiffab Right there with Babe Points: 730 More actions · Answer 10

Hi Guys

Are there any facts or figures I can provide to support or contradict the CXPackets theory?

I'm unsure where to begin though I have sent a couple of rogue queries to 3rd party suppliers.

Cheers.

Adam Machanic SSCoach Points: 15370 More actions · Answer 11

kiffab (12/19/2012)
Hi Guys
Are there any facts or figures I can provide to support or contradict the CXPackets theory?
I'm unsure where to begin though I have sent a couple of rogue queries to 3rd party suppliers.
Cheers.

My recommendation would be to ignore the CXPACKET waits entirely, at least for now. They're not "real" waits that are actually harming your workload. They're merely a side-effect of parallel processing.

That said, you should definitely reconfigure both the server level Maximum Degree of Parallelism and Cost Threshold for Parallelism settings. Recommended settings depend on workload style and NUMA configuration. I have some information on my blog, here:

http://sqlblog.com/blogs/adam_machanic/archive/2010/05/28/sql-university-parallelism-week-part-3-settings-and-options.aspx

--
Adam Machanic
whoisactive

Jon.Morisi SSChampion Points: 12846 More actions · Answer 12

Adam Machanic (12/19/2012)
Jon.Morisi (12/19/2012)
CXPACKET waits (parallelism) should be addressed if > 5%.
That's a rather specific pronouncement, especially when you don't know anything about the workload in question. What's your logic there?

It should be looked at, here's my logic:

If they're > 5% it's worth looking into. If CXPACKET waits are "the prevalent wait type", there's likely some tuning that can be done.

Jimmy May - http://blogs.msdn.com/b/jimmymay/archive/2008/11/28/case-study-part-1-cxpacket-wait-stats-max-degree-of-parallelism-option-introduction-to-using-wait-stats-to-identify-remediate-query-parallelism-bottlenecks.aspx

sqlskills.com - http://www.sqlskills.com/blogs/paul/post/maxdop-configuration-survey-results.aspx

If waits are

Adam Machanic SSCoach Points: 15370 More actions · Answer 13

Jon.Morisi (12/19/2012)
If they're > 5% it's worth looking into. If CXPACKET waits are "the prevalent wait type", there's likely some tuning that can be done.

That's entirely dependent upon workload style (OLTP, OLAP, mixed, or whatever in-between). I'm looking at a production server right now that's been quite well tuned and CXPACKET waits account for 25% the wait time since the last reboot. Just under that is HADR_WORK_QUEUE at 18%, and LAZYWRITER_SLEEP, clocking in at 15%.

There is nothing to do with CXPACKET here. It's a data warehouse server and the vast majority of the queries use parallelism. Parallelism means CXPACKET, and parallel queries are exactly what we want in this case. MAXDOP has been properly tuned not in hopes of eliminating CXPACKET, but rather so that we see MORE of it (more parallelism == more CXPACKET). This means faster queries, not slower.

The other two waits I've listed have nothing at all to do with workload performance. They're indicative of background processes sleeping.

The points I'm trying to drive home with this example are twofold:

A) Waits analysis (and tuning in general) is highly dependent upon workload, and pronouncements must be made with actual workload characteristics in mind.

B) We should not strive to find the "prevalent" wait type, but rather to determine which waits are actually having an impact on our workload (in other words, which waits are causing actual end users to see actual wall clock time differences in query performance).

--
Adam Machanic
whoisactive

Ray Mond SSCertifiable Points: 7159 More actions · Answer 14

kiffab (12/19/2012)
The top wait on live always seems to be this:
wait_typewait_time_ms
MSQL_XP26126343
I believe this is due to SQL waiting on extended stored procs to complete yet I can't find a process relating to this. I use RedGate for SQL backup and the general feedback online is that an update from v5 is required.

Backups ran via SQL Backup is ran using extended stored procedures. While SQL Backup is running the backup, MSQL_XP waits are incurred for the duration of the backup.

Were backups running when you encounter the I/O errors?

SQL BAK Explorer - read SQL Server backup file details without SQL Server.
Supports backup files created with SQL Server 2005 up to SQL Server 2017.