Help!! UMS Scheduler Hung

  • Calling all experts..

    Our SQL Server had encountered SQL UMS Scheduler hung couple of times now and the occupied resources had caused insufficient memory in the server. The SQL service was unresponsive and failed over to node B (yes we are running a active/active cluster environment) till we rebooted the server.

    Error: 17883, Severity: 1, State: 0

    The Scheduler 0 appears to be hung. SPID 94, ECID 0, UMS Context 0x07E580C0

    Also,I have found in the event viewer log (Application) which indicated the first error event with the following message:

    17066 :

    SQL Server Assertion: File: <"msqlxact.cpp">, line=1882

    Failed Assertion = 'pss->IsSingleThreaded ()'.

    Found in Microsoft web site that it is a known bug and it mentioned that the fix is included in sp4 or MS03-031 security patch. Does it apply to my scenario here? Or is there any better way of solving this?

    In your opinion what may be the root cause of this to happen?? Will it be caused by COM+ server? As I noticed that the user that triggered the above error may probably come from data access in one of COM+ component.

    Please help as I am having the same problem for twice this week.

    Thanks a bunch!

  • The best solution can be installing SP4


    subban

  • Currently I am still not able to install sp4 in production as it will impact some of the applications running there.

    Do you think security patch MS03-031 will fix the problem?

  • Don't know if this will help, but:

    When we experienced 17883 errors, it tracked back to a known issue with BULK_INSERTS and high-end disk subsystems. The fix is NOT included in the SP4 base build (2039), but in a subsequent hotfix (2162). However, that hotfix broke other things on our servers. Our solution ended up being upgrading firmware on the array controllers (Compaq 6402 controllers), speeding up disk access (RAID 5 reconfigured as RAID 10), and trying to control the BULK_INSERTS (reducing frequency). Since taking these steps, we haven't seen the hung scheduler issues.

    Hope that gives an idea of where to look.


    Cheers,

    Joshua Jones

  • I'd suggest calling Microsoft PSS. The $249 cost isn't much and you are dealing with the low level internals here. They can help you work through this, though they may come back to SP4.

  • We started getting a similar issue when we applied SP4 ( I know! )

    We rolled back to SP3a without hot fixes and the issue disappeared (see below). However, it seems that the MS03-031 that you reference might be the common issue...is it possible that you have the hotfix applied? Or, is it rolled up into SP3a vs. SP3.

    FIX: You receive an "Error: 17883" error message when the checkpoint process delays the SQL Server database activity and does not yield the scheduler correctly ... e.g. refers to MS03-031

    http://support.microsoft.com/default.aspx?scid=kb;en-us;815056

    MS03-031: Security patch for SQL Server 2000 Service Pack 3

    http://support.microsoft.com/default.aspx?scid=kb;en-us;821277

    FIX: An access violation occurs in SQL Server 2000 when a high volume of local shared memory connections occur after you install security update MS03-031

    http://support.microsoft.com/default.aspx?scid=kb;en-us;830366

    2005-12-12 09:19:39

    Using 'dbghelp.dll' version '4.0.5'

    *Dump thread - spid = 79, PSS = 0x5d1df1e0, EC = 0x5d1df510

    *Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL\log\SQLDump0002.txt

    *

    Stack Signature for the dump is 0xB61C92A9

    SQL Server Assertion: File: , line=1882

    Failed Assertion = 'pss->IsSingleThreaded ()'.

    Error: 3624, Severity: 20, State: 1.

    2005-1212 09:21:22 Stack Signature for the dump is 0x00000000

    Error: 17883, Severity: 1, State: 0

    Process 79:0 (13c8) UMS Context 0x0DEEDA90 appears to be non-yielding on Scheduler 3.

    2005-12-12 09:29:20 Waiting for type 0x4, current count 0xa, current owning EC 0x77149510.

    Error: 17883, Severity: 1, State: 0

    Process 79:0 (13c8) UMS Context 0x0DEEDA90 appears to be non-yielding on Scheduler 3.

    WARNING: EC 66c57560, 0 waited 300 sec. on latch 42d88e1c, class MISC. Not a BUF latch.

    Waiting for type 0x4, current count 0xa, current owning EC 0x5FF8F560.

    2005-12-12 09:32:00

  • We're having the same issue on a production server here.  Applied SP4 & then the 2162 hotfix due to trying to fix strange problems. 

    Ended up bad data being bulk-copied in was the original problem, now we're getting stack dumps & the 17883 errors.  Microsoft is involved, but they're claiming it's COM objects related to sp_OACreate statements in stored procedures...

    We had no problems before

    Would be interested in the firmware applied by Grasshopper before we roll back to sp3a.

     

  • BlueIvy,

    did you solve the problem already?

    I have exactly some message with your.. sql hang and need to restart.

    any solution? what you you do?

     

     

  • 17883 is not so easy to debug. There can be many reasons of thread not yielding properly (in 60 seconds) I would recommend you to open a case with Microsoft PSS and take their help to troubleshoot the problem.

    If you call PSS, first recommendation will be to be on latest service pack to avoid known issues with SQL Server.

    COM based dll may cause problem, so it is always recommended to test them properly, if they are getting loaded into SQL Virtual Address space.

     

     

  • My box running on the latest SP and Hotfix, but facing the same problem.

    Sivaprasad S - [ SIVA ][/url]http://sivasql.blogspot.com/[/url]

  • Siva,

    Please contact me offline at bmlakhani@yahoo.com

     

  • We experienced similar error messages with a SQL Server 2000 Standard Edition build 8.0.2039 (sp4).

    The error:

    SQL Server Assertion: File: , line=1882

    Failed Assertion = 'pss->IsSingleThreaded ()'.

    Error: 3624, Severity: 20, State: 1.

    Followed by errors like the following every 1 minute:

    Process 55:0 (1d0) UMS Context 0x077102D0 appears to be non-yielding on Scheduler 3.

    Error: 17883, Severity: 1, State: 0

    MS KB 908156 suggests, as a workaround, changing the MAXDOP to 1.

    I have implemented this earlier today, and the problem has not yet recurred.

    Has anyone else attempted the MAXDOP = 1 setting. The server is an OLTP server that can tolerate MAXDOP = 1.

    Joe D.

  • Is this KB related to your 17883 error? http://support.microsoft.com/kb/928568

  • 17883 is raised when the UMS schedular experience a yield problem. I have noticed generally these are related to bug in SQL Server. As you said you are in SP4..., better open a case with Microsoft PSS.

    "More Green More Oxygen !! Plant a tree today"

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply