SQL Server 2008 cluster node going down unexpectedly

  • Last night our primary SQL Server node went down and failed over to the secondary node.

    I was actually on the server at the moment having just launched a trace to troubleshoot a particular query when suddenly I lost all connectivity to SQL Server.

    Our setup is:

    Microsoft SQL Server 2008 R2 (SP1) - 10.50.2796.0 (X64) 2 Node Active/Passive Cluster.

    Here is what I found in the Administrative Log :

    [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed

    [sqsrvres] printODBCError: sqlstate = HYT00; native error = 0; message = [Microsoft][SQL Server Native Client 10.0]Query timeout expired

    [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed

    [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Server Native Client 10.0]The connection is no longer usable because the server failed to respond to a command cancellation for a previously executed statement in a timely manner. Possible causes include application deadlocks or the server being overloaded. Open a new connection and re-try the operation.

    We have SQL Server and SQL Server agent are running under designated network accounts.

    SQL Server Browser is running under a Local account.

    Never had that issue before in 2 years we've been using the server.

    The SQL Server error log did not reveal much. The very last event in the error log before the node went down is:

    2013-04-30 20:06:48.970spid133SQL Trace ID 2 was started by login "sa".

    Thank you for your help

  • What is in the windows error log?

  • Administrative log was the most informative:

    [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed

    [sqsrvres] printODBCError: sqlstate = HYT00; native error = 0; message = [Microsoft][SQL Server Native Client 10.0]Query timeout expired

    [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed

    [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Server Native Client 10.0]The connection is no longer usable because the server failed to respond to a command cancellation for a previously executed statement in a timely manner. Possible causes include application deadlocks or the server being overloaded. Open a new connection and re-try the operation

    System Log:

    Cluster resource 'SQL Server' in clustered service or application 'SQL Server (MSSQLSERVER)' failed.

    Application log:

    [sqagtres] SvcStop: service did not stop; giving up.

    [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed

    [sqsrvres] printODBCError: sqlstate = 08S01; native error = 40; message = [Microsoft][SQL Server Native Client 10.0]TCP Provider: The specified network name is no longer available.

    Error4/30/2013 8:16:55 PMMSSQLSERVER19019Failover

  • What is in the system log before the SQL Server cluster resource became unavailable?

  • Hello All,

    I am also facing the similar issue.

    Please let e know if the issue was resolved and share the resolution.

    Regards,

    Vandy

  • Just to give you some background clustering works on a heartbeat which is configured in Failover Cluster Manager for each each clustered resource or cluster goup.

    What were you collecting in your trace and was it through the Profiler GUI - if this was a large volume of events the server\instance could have been too busy to respond to the heartbeat (health check) and as a result the failover occurred.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply