April 26, 2011 at 9:33 am
Guys,
I have a question regarding SQL/IIS deadlocks in our SharePoint environment. I believe it is a SQL server issue hence posting on this board.
Our environment:
SharePoint 2007 SP1 August 2008 CU
1 x SQL 2005 x86 (Server 2003) - this was physical and has very recently been virtualised, at the same time the storage was upgraded and the dbs moved over to the new storage. Since doing this the performance of the SharePoint application has been massively improved, however it is now unreliable!
1 x WFE (Server 2008 x64)
1 x Indexer (Server 2003 x86)
I virtualised the SQL server onto a VMware HA Cluster VM 2 weekends ago, and ever since then between roughly 8AM and 9.30AM each morning one of the SharePoint web applications grinds almost completely to a halt, and the following event ID is logged:
-------------------------------------------------------------------------------------------------------------
Log Name: Application
Source: Microsoft-Windows-IIS-W3SVC-WP
Date: 26/04/2011 09:25:54
Event ID: 2262
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: hc-cen-wx-ap-07.ukr.local
Description:
ISAPI 'C:\Windows\Microsoft.NET\Framework64\v2.0.50727\aspnet_isapi.dll' reported itself as unhealthy for the following reason: 'Deadlock detected'.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-IIS-W3SVC-WP" Guid="{670080D9-742A-4187-8D16-41143D1290BD}" EventSourceName="W3SVC-WP" />
<EventID Qualifiers="32768">2262</EventID>
<Version>0</Version>
<Level>3</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2011-04-26T08:25:54.000Z" />
<EventRecordID>1305090</EventRecordID>
<Correlation />
<Execution ProcessID="0" ThreadID="0" />
<Channel>Application</Channel>
<Computer>hc-cen-wx-ap-07.ukr.local</Computer>
<Security />
</System>
<EventData>
<Data Name="IsapiExtension">C:\Windows\Microsoft.NET\Framework64\v2.0.50727\aspnet_isapi.dll</Data>
<Data Name="UnhealthyReason">Deadlock detected</Data>
<Binary>
</Binary>
</EventData>
</Event>
-----------------------------------------------------------------
I have tried pointing my hosts file at the index server, but also suffer the same performance problems trying to load the web application from that server, therefore this issue must lie with the database server and not the front end/application servers?
I really can't see why upgrading the DB server to a VM with far more memory, disk throughput and CPU capacity would cause this to happen - but my SQL knowledge is somewhat limited compared to my SharePoint knowledge.
There are several other SharePoint web applications on this server and they continue to respond normally to requests, so would seem the deadlock is in one particular content database.
Any help would be much appreciated in troubleshooting the root cause and finding a resolution, as currently I have to restart the SQL service every morning during busy periods which is extremely undesirable.
I have read this blog post on sql deadlocks/sharepoint by Graham K, have installed SQL Nexus and taken a data collection today with the server stable. I will attempt to take another in the morning when the issue will no doubt arise, but need some guidance in analysing the data as it means very little to me at the moment
Cheers,
Conrad
April 27, 2011 at 1:57 am
Guys,
I've had another deadlock this morning at 6AM reported in IIS.
SharePoint performance on particular web application has almost completely ground to a stop.
I'm taking a 15 min trace with SQLDiag/PerfStats Script.
Any ideas on how I should proceed?
Cheers,
Conrad
April 27, 2011 at 2:11 am
Is this even a SQL Server issue ??
Reading here http://support.microsoft.com/kb/821268 doesn't sound like it
April 27, 2011 at 2:12 am
Hi
It is a SQL server issue because I have multiple web servers and none of them can function whilst there is a deadlock.
April 27, 2011 at 2:53 am
The term 'deadlock' is not unique to SQL. The KB article i referenced stated that the internal processing of ASP can deadlock.
If we were to assume that this is a SQL deadlock , which im sceptical of right now,
Profiler can be used to provide more information on the deadlock,
further advice will depend on what the deadlock graph says
April 27, 2011 at 4:36 am
Hi Dave
I think I am getting somewhere.
I have compared the SQLNexus reports for both traces (one stable, one unstable/locked).
I think the graphs will speak for themselves, although I don't know nearly enough about SQL Server to understand what they are telling me.
Page I/O latch is sky high and nothing else gets a chance when the system is unresponsive.
Stable
Unstable
Stable
Unstable
Stable
Unstable
April 27, 2011 at 5:19 am
You've also got high BACKUPIO and BACKUPBUFFER.
so do you have a general performance issues not just deadlock issues ?
The two BACKUP wait states would imply that you are backing up, that is , obviously, an io intensive operation. Do you see the perfomance / deadlock issue only while BACKUP is running ?
April 27, 2011 at 5:32 am
Hi Dave
I'm not sure if you've compared the two separate graphs (the first two).
The first one was taken when there were no performance issues (with BACKUPIO and BACKUPBUFFER present).
The second graph was taken when I had the issue, and it shows massively more PAGE I/O Latch compared to ANYTHING else, and shows a lot of waits on LOCKS and if you look at the locking chains they are much longer and much more of them. The locking chains are all waiting on PAGE IO LATCH as well.
The BACKUPIO and BACKUPBUFFER waits are consistent between the two graphs so don't think that is the issue?
April 27, 2011 at 5:44 am
OK lets step back a bit here...
If you have a general performance issue , a good starting point is the below articles.
See if you can identify the queries / procedures which are running slow.
April 27, 2011 at 7:00 am
Dave,
Would you classify this as a general performance issue? Once a day, I get a load of locking chains with the PAGE IO LATCH tag, something isn't right there - it's not that the server suffers poor performance all day, nor is it running out of CPU/RAM bandwidth as I monitor these counters.
April 27, 2011 at 7:02 am
Also , are you using database snapshots ?
http://msdn.microsoft.com/en-us/library/ms175158%28v=SQL.90%29.aspx
April 27, 2011 at 7:05 am
No I don't have enterprise edition running on this instance.
April 27, 2011 at 7:14 am
Would you classify this as a general performance issue?
Yes i would , you need to find which statement is causing the unusual usage.
The best tool for that would be running a server side trace.
Dont discount that some other process in the virtual server or physical server could be 'rattling' the disks too.
April 27, 2011 at 7:34 am
Conrad Goodman (4/27/2011)
The BACKUPIO and BACKUPBUFFER waits are consistent between the two graphs so don't think that is the issue?
From what i cant find those waits are specific to BACKUP to tape, odd that you should be waiting on that 'continuously'.
April 27, 2011 at 7:42 am
Hi Dave
You are right something very strange indeed is occuring.
I've just enumerated a list of backup jobs for the database server in Commvault.
It appears whenever everything has ground to a complete halt a Differential backup has been running all through the night - very odd because the total sie of the backup is only 40-80GB and should be done in a matter of 2-3 hours MAX.
There are no errors in the Commvault backup job logs, appart from losing the connection to the server when I've restarted it.
There is a pattern as well, it has only been doing this (taking ages to back up differentials) since I virtualised the server.
I will continue to examine these logs and try and make sense of it.
Cheers,
C
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply