After virtualising SQL Server 2005, consistently getting aspnet_isapi.dll Deadlock Detected every day.

  • Guys,

    I have a question regarding SQL/IIS deadlocks in our SharePoint environment. I believe it is a SQL server issue hence posting on this board.

    Our environment:

    SharePoint 2007 SP1 August 2008 CU

    1 x SQL 2005 x86 (Server 2003) - this was physical and has very recently been virtualised, at the same time the storage was upgraded and the dbs moved over to the new storage. Since doing this the performance of the SharePoint application has been massively improved, however it is now unreliable!

    1 x WFE (Server 2008 x64)

    1 x Indexer (Server 2003 x86)

    I virtualised the SQL server onto a VMware HA Cluster VM 2 weekends ago, and ever since then between roughly 8AM and 9.30AM each morning one of the SharePoint web applications grinds almost completely to a halt, and the following event ID is logged:

    -------------------------------------------------------------------------------------------------------------

    Log Name: Application

    Source: Microsoft-Windows-IIS-W3SVC-WP

    Date: 26/04/2011 09:25:54

    Event ID: 2262

    Task Category: None

    Level: Warning

    Keywords: Classic

    User: N/A

    Computer: hc-cen-wx-ap-07.ukr.local

    Description:

    ISAPI 'C:\Windows\Microsoft.NET\Framework64\v2.0.50727\aspnet_isapi.dll' reported itself as unhealthy for the following reason: 'Deadlock detected'.

    Event Xml:

    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">

    <System>

    <Provider Name="Microsoft-Windows-IIS-W3SVC-WP" Guid="{670080D9-742A-4187-8D16-41143D1290BD}" EventSourceName="W3SVC-WP" />

    <EventID Qualifiers="32768">2262</EventID>

    <Version>0</Version>

    <Level>3</Level>

    <Task>0</Task>

    <Opcode>0</Opcode>

    <Keywords>0x80000000000000</Keywords>

    <TimeCreated SystemTime="2011-04-26T08:25:54.000Z" />

    <EventRecordID>1305090</EventRecordID>

    <Correlation />

    <Execution ProcessID="0" ThreadID="0" />

    <Channel>Application</Channel>

    <Computer>hc-cen-wx-ap-07.ukr.local</Computer>

    <Security />

    </System>

    <EventData>

    <Data Name="IsapiExtension">C:\Windows\Microsoft.NET\Framework64\v2.0.50727\aspnet_isapi.dll</Data>

    <Data Name="UnhealthyReason">Deadlock detected</Data>

    <Binary>

    </Binary>

    </EventData>

    </Event>

    -----------------------------------------------------------------

    I have tried pointing my hosts file at the index server, but also suffer the same performance problems trying to load the web application from that server, therefore this issue must lie with the database server and not the front end/application servers?

    I really can't see why upgrading the DB server to a VM with far more memory, disk throughput and CPU capacity would cause this to happen - but my SQL knowledge is somewhat limited compared to my SharePoint knowledge.

    There are several other SharePoint web applications on this server and they continue to respond normally to requests, so would seem the deadlock is in one particular content database.

    Any help would be much appreciated in troubleshooting the root cause and finding a resolution, as currently I have to restart the SQL service every morning during busy periods which is extremely undesirable.

    I have read this blog post on sql deadlocks/sharepoint by Graham K, have installed SQL Nexus and taken a data collection today with the server stable. I will attempt to take another in the morning when the issue will no doubt arise, but need some guidance in analysing the data as it means very little to me at the moment

    Cheers,

    Conrad

  • Guys,

    I've had another deadlock this morning at 6AM reported in IIS.

    SharePoint performance on particular web application has almost completely ground to a stop.

    I'm taking a 15 min trace with SQLDiag/PerfStats Script.

    Any ideas on how I should proceed?

    Cheers,

    Conrad

  • Is this even a SQL Server issue ??

    Reading here http://support.microsoft.com/kb/821268 doesn't sound like it



    Clear Sky SQL
    My Blog[/url]

  • Hi

    It is a SQL server issue because I have multiple web servers and none of them can function whilst there is a deadlock.

  • The term 'deadlock' is not unique to SQL. The KB article i referenced stated that the internal processing of ASP can deadlock.

    If we were to assume that this is a SQL deadlock , which im sceptical of right now,

    Profiler can be used to provide more information on the deadlock,

    further advice will depend on what the deadlock graph says

    http://www.simple-talk.com/sql/learn-sql-server/how-to-track-down-deadlocks-using-sql-server-2005-profiler/



    Clear Sky SQL
    My Blog[/url]

  • Hi Dave

    I think I am getting somewhere.

    I have compared the SQLNexus reports for both traces (one stable, one unstable/locked).

    I think the graphs will speak for themselves, although I don't know nearly enough about SQL Server to understand what they are telling me.

    Page I/O latch is sky high and nothing else gets a chance when the system is unresponsive.

    Stable

    Unstable

    Stable

    Unstable

    Stable

    Unstable

  • You've also got high BACKUPIO and BACKUPBUFFER.

    so do you have a general performance issues not just deadlock issues ?

    The two BACKUP wait states would imply that you are backing up, that is , obviously, an io intensive operation. Do you see the perfomance / deadlock issue only while BACKUP is running ?



    Clear Sky SQL
    My Blog[/url]

  • Hi Dave

    I'm not sure if you've compared the two separate graphs (the first two).

    The first one was taken when there were no performance issues (with BACKUPIO and BACKUPBUFFER present).

    The second graph was taken when I had the issue, and it shows massively more PAGE I/O Latch compared to ANYTHING else, and shows a lot of waits on LOCKS and if you look at the locking chains they are much longer and much more of them. The locking chains are all waiting on PAGE IO LATCH as well.

    The BACKUPIO and BACKUPBUFFER waits are consistent between the two graphs so don't think that is the issue?

  • OK lets step back a bit here...

    If you have a general performance issue , a good starting point is the below articles.

    http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-1/

    http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-2/

    See if you can identify the queries / procedures which are running slow.



    Clear Sky SQL
    My Blog[/url]

  • Dave,

    Would you classify this as a general performance issue? Once a day, I get a load of locking chains with the PAGE IO LATCH tag, something isn't right there - it's not that the server suffers poor performance all day, nor is it running out of CPU/RAM bandwidth as I monitor these counters.

  • Also , are you using database snapshots ?

    http://msdn.microsoft.com/en-us/library/ms175158%28v=SQL.90%29.aspx



    Clear Sky SQL
    My Blog[/url]

  • No I don't have enterprise edition running on this instance.

  • Would you classify this as a general performance issue?

    Yes i would , you need to find which statement is causing the unusual usage.

    The best tool for that would be running a server side trace.

    Dont discount that some other process in the virtual server or physical server could be 'rattling' the disks too.



    Clear Sky SQL
    My Blog[/url]

  • Conrad Goodman (4/27/2011)


    The BACKUPIO and BACKUPBUFFER waits are consistent between the two graphs so don't think that is the issue?

    From what i cant find those waits are specific to BACKUP to tape, odd that you should be waiting on that 'continuously'.



    Clear Sky SQL
    My Blog[/url]

  • Hi Dave

    You are right something very strange indeed is occuring.

    I've just enumerated a list of backup jobs for the database server in Commvault.

    It appears whenever everything has ground to a complete halt a Differential backup has been running all through the night - very odd because the total sie of the backup is only 40-80GB and should be done in a matter of 2-3 hours MAX.

    There are no errors in the Commvault backup job logs, appart from losing the connection to the server when I've restarted it.

    There is a pattern as well, it has only been doing this (taking ages to back up differentials) since I virtualised the server.

    I will continue to examine these logs and try and make sense of it.

    Cheers,

    C

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply