Strange Mass-Mirror Failure

Question

Strange Mass-Mirror Failure

Jim Foster 80123

SSChasing Mays

Points: 655
More actions
July 19, 2012 at 4:17 pm

#392785

Painting the picture...
(7) databases High-Safety mirrored between (2) Win 2008 R2 / SQL 2008 SP2 CU6 Enterprise servers w/witness.
I thought the principle server had died today - got the alerts that the mirrors had failed over all together.
Research revealed...
* The original principle server was still online.
* However, the original principle had stopped accepting connections to the mirrored dbs, and in fact closed all the connections to the mirrored dbs. (errors below)
* It stayed in this "state" for about 5 minutes - refusing connections to the (now mirrored) dbs.
* Then on it's own it came back awake and accepted connections from the new principle server and began synching.
* After letting it synch, we were able to fail all the DBs back to the original principle.
* Reviewing logs, it appears that there was not a large CPU load at the time, and no large IO loads during the time.
I could chalk it up to a network problem, or a all-server-outage of some kind...
However, during this same timeframe non-mirrored dbs were able to move data off server via SQL jobs without problem. So the network seemed ok, and the other DBs on the server were available.
Errors seen on the original principle:
SQL Server failed with error code 0xc0000000 to spawn a thread to process a new login or connection.
Check the SQL Server error log and the Windows event logs for information about possible related problems. [CLIENT: %]
After that error then all the connections into the mirrored DBs were closed with the following error #18056, Sev 20, State 29:
The client was unable to reuse a session with SPID %, which had been reset for connection pooling.
The failure ID is 29. This error may have been caused by an earlier operation failing.
Check the error logs for failed operations immediately before this error message.
Windows event logs empty for this timeframe except where they repeat the above from the SQL logs...
If it were a general server lockup, I would look towards kb article/hotfix (http://support.microsoft.com/kb/2543687). Evenso, we are already on a version of SQL that should've included this fix. The fix was in CU5 and we're on CU6.
But it was not a full-server lockup. The outage was focused on the mirrored databases.
Ideas?
Other than network, what would freeze/choke/whatever all the mirrored DBs on a server?

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply

Robert Davis One Orange Chip Points: 28027 More actions · Answer 1

The second error was caused by the first error, so you can ignore the second one for troubleshooting purposes. the first error clearly indicates a worker thread consumption problem. All of the worker threads were consumed at the time it happened, and no new processes could be handled as a result. Login attempts failed because there were no more worker threads.

How many logical CPUs do you have? Is it 64 bit or 32 bit? Is the max worker thread configuration setting set at the default value of 0 or has it been set to a hard-defined number?

Were there any jobs or external processes that hit SQL running at the time it happened?

My blog: SQL Soldier[/url]
SQL Server Best Practices: SQL Server Best Practices
Twitter: @SQLSoldier
My book: Pro SQL Server 2008 Mirroring[/url]
Microsoft Certified Master: SQL Server, Data Platform MVP
Database Engineer at BlueMountain Capital Management[/url]