October 18, 2007 at 10:49 am
Let me start out with an apology as I did NOT fully search the forums for related threads. With management breathing down my neck I decided to go straight to the SQL Server oracles (pun intended).
Problem: We are running SQL Server 2000 SP3 on a Windows 2003 cluster. Hardware on each node is 4 dual core Xenon 2.8GHZ processors with 16Gb of RAM. Today users reported the inability to login and just as I began looking into, SQL Server failed over to the secondary node. Examination of the error logs shows this slice of heaven on both machines (before and after failover)
[qoute]
10/18/2007 08:55:41,server,Unknown,Error: 17883 State: 0
10/18/2007 08:55:41,server,Unknown,The Scheduler 1 appears to be hung. SPID 0 UMS Context 0x03908418.
10/18/2007 08:55:41,server,Unknown,Error: 17883 State: 0
10/18/2007 08:55:41,server,Unknown,The Scheduler 3 appears to be hung. SPID 90 UMS Context 0x0390C858.
10/18/2007 08:55:41,server,Unknown,Error: 17883 State: 0
10/18/2007 08:55:41,server,Unknown,The Scheduler 4 appears to be hung. SPID 0 UMS Context 0x038FC630.
10/18/2007 08:55:41,server,Unknown,Error: 17883 State: 0
10/18/2007 08:55:41,server,Unknown,The Scheduler 7 appears to be hung. SPID 127 UMS Context 0x03909AA0.
10/18/2007 08:55:12,spid90,Unknown,WARNING: EC 2723c098 3 waited 600 sec. on latch 14fda114. Not a BUF latch.
10/18/2007 08:55:12,spid90,Unknown,Waiting for type 0x4 current owning EC 0x2C71C098.
10/18/2007 08:55:12,spid90,Unknown,Waiting for type 0x4 current owning EC 0x2C71C098.
10/18/2007 08:55:12,spid90,Unknown,WARNING: EC 2ca40098 1 waited 600 sec. on latch 14fda114. Not a BUF latch.
10/18/2007 08:55:12,spid90,Unknown,WARNING: EC 29e70098 7 waited 600 sec. on latch 14fda114. Not a BUF latch.
10/18/2007 08:55:12,spid90,Unknown,Waiting for type 0x4 current owning EC 0x2C71C098.
10/18/2007 08:55:12,spid90,Unknown,WARNING: EC 2d074098 6 waited 600 sec. on latch 14fda114. Not a BUF latch.
10/18/2007 08:55:12,spid90,Unknown,Waiting for type 0x4 current owning EC 0x2C71C098.
10/18/2007 08:54:41,server,Unknown,Error: 17883 State: 0
10/18/2007 08:54:41,server,Unknown,The Scheduler 3 appears to be hung. SPID 90 UMS Context 0x0390C858.
10/18/2007 08:54:41,server,Unknown,Error: 17883 State: 0
10/18/2007 08:54:41,server,Unknown,The Scheduler 4 appears to be hung. SPID 0 UMS Context 0x038FC630.
10/18/2007 08:54:41,server,Unknown,Error: 17883 State: 0
10/18/2007 08:54:41,server,Unknown,The Scheduler 7 appears to be hung. SPID 127 UMS Context 0x03909AA0.
[/qoute]
Also, the final straw was a Machine check error on the primary node which caused the failover.
So:
1. How do I put this in layman's terms for Management?
2. What is the recommended course of action?
3. Is this a SQL Server problem or a hardware problem
and
4. Why does this happen hours before I leave for a 3 day weekend?
At a loss,
Gordon
Gordon Pollokoff
"Wile E. is my reality, Bugs Bunny is my goal" - Chuck Jones
October 19, 2007 at 1:10 am
Have a look at
Viewing 2 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply