cluster Failover

Question

cluster Failover

mushtaq777

Mr or Mrs. 500

Points: 591
More actions
October 19, 2010 at 12:12 pm

#228202

Hi
Server : Windows server 2008
DB Server : SQL Server 2008 (SP1)
first of all,I went through all the logs, and could not find the reason for fail-over initialization. There should be some thing logged why the failover happened? secondly after failover the service was not coming online due to duplicate IP address detection. later when we try to manually bring the service online from cluster management it comes online successfully. i dont understand how would duplicate IP address get resolved when we start manually.
Lastly we see few errors related to physical disk resource between failover retries, is this could be the correlated to failover error ? Please help to troubleshoot these errors, i am not so good at clustering and Thanks for your help in advance....:)
Here are the series of events which happened.
1.) Event ID: 1135
Cluster node 'XYZ' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
2.) Event ID: 1049
Cluster IP address resource 'SQL IP Address 1 (XYZ)' cannot be brought online because a duplicate IP address '10.9.8.113' was detected on the network. Please ensure all IP addresses are unique.
3.) Event ID: 1069
Cluster resource 'SQL IP Address 1 (XYZ)' in clustered service or application 'SQL Server (MSSQLSERVER)' failed.
4.) Event ID: 1049
Cluster IP address resource 'Cluster IP Address' cannot be brought online because a duplicate IP address '10.9.8.112' was detected on the network. Please ensure all IP addresses are unique.
5.) Event ID: 1069
Cluster resource 'Cluster IP Address' in clustered service or application 'Cluster Group' failed.
6.) Event ID: 1066
Cluster disk resource 'Cluster Disk 25' indicates corruption for volume '\\?\Volume{88552e6f-aea2-11df-9790-0026b92fffa7}'. Chkdsk is being run to repair problems. The disk will be unavailable until Chkdsk completes. Chkdsk
output will be logged to file 'C:\Windows\Cluster\Reports\ChkDsk_ResCluster Disk 25_Disk16Part1.log'. Chkdsk may also write information to the Application Event Log.
7.) Event ID : 1066
Cluster disk resource 'Cluster Disk 26' indicates corruption for volume '\\?\Volume{88552e05-aea2-11df-9790-0026b92fffa7}'. Chkdsk is being run to repair problems. The disk will be unavailable until Chkdsk completes. Chkdsk
output will be logged to file 'C:\Windows\Cluster\Reports\ChkDsk_ResCluster Disk 26_Disk4Part1.log'. Chkdsk may also write information to the Application Event Log.
8.) Event ID: 1049
(Same message as point 2)
9.) Event ID: 1069
(Same message as point 3)
10.) Event ID : 1049
(same message as point 4)
11.) Event ID :1069
(same message as point 5)
12.) Event ID :1205
The Cluster service failed to bring clustered service or application 'Cluster Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
13.) Event ID: 1069
Cluster resource 'Cluster Disk 17' in clustered service or application 'SQL Server (MSSQLSERVER)' failed.
14.) Event D : 1049
(same message as point 2)
15.) Event ID: 1069
Cluster resource 'SQL IP Address 1 (XYZ)' in clustered service or application 'SQL Server (MSSQLSERVER)' failed.
16.) Event ID : 1205
The Cluster service failed to bring clustered service or application 'SQL Server (MSSQLSERVER)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
Thanks
Mushtaq

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Perry Whittle SSC Guru Points: 233821 More actions · Answer 1

mushtaq777 (10/19/2010)
Cluster node 'XYZ' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

Have you checked all network hardware to make sure there are no issues here. Last time I had a failover on one of my clusters someone had made a VLAN change to the port on the switch servicing the active node!

mushtaq777 (10/19/2010)
2.) Event ID: 1049
Cluster IP address resource 'SQL IP Address 1 (XYZ)' cannot be brought online because a duplicate IP address '10.9.8.113' was detected on the network. Please ensure all IP addresses are unique.

This means what it says, who assigns IP addresses in your enterprise, get them to confirm the IP for your clustered instance! Take the virtual IP offline and then try to ping it to see if it still replies!

mushtaq777 (10/19/2010)
3.) Event ID: 1069
Cluster resource 'SQL IP Address 1 (XYZ)' in clustered service or application 'SQL Server (MSSQLSERVER)' failed.
4.) Event ID: 1049
Cluster IP address resource 'Cluster IP Address' cannot be brought online because a duplicate IP address '10.9.8.112' was detected on the network. Please ensure all IP addresses are unique.
5.) Event ID: 1069
Cluster resource 'Cluster IP Address' in clustered service or application 'Cluster Group' failed.

These are all related to the virtual IP issue

mushtaq777 (10/19/2010)
6.) Event ID: 1066
Cluster disk resource 'Cluster Disk 25' indicates corruption for volume '\\?\Volume{88552e6f-aea2-11df-9790-0026b92fffa7}'. Chkdsk is being run to repair problems. The disk will be unavailable until Chkdsk completes. Chkdsk
output will be logged to file 'C:\Windows\Cluster\Reports\ChkDsk_ResCluster Disk 25_Disk16Part1.log'. Chkdsk may also write information to the Application Event Log.
7.) Event ID : 1066
Cluster disk resource 'Cluster Disk 26' indicates corruption for volume '\\?\Volume{88552e05-aea2-11df-9790-0026b92fffa7}'. Chkdsk is being run to repair problems. The disk will be unavailable until Chkdsk completes. Chkdsk
output will be logged to file 'C:\Windows\Cluster\Reports\ChkDsk_ResCluster Disk 26_Disk4Part1.log'. Chkdsk may also write information to the Application Event Log.

Have you tried running CHKDSK on these disks as the message suggests?

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

ayemya Hall of Fame Points: 3495 More actions · Answer 2

Did you resolve the issues? We have the same errors.

Cluster node 'ABC' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges

nandan_kurdekar SSC Journeyman Points: 77 More actions · Answer 3

Were you able to resolve this issue? We are also seeing these issues..

ayemya Hall of Fame Points: 3495 More actions · Answer 4

SQL server 2012 AlwaysOn with windows 2008 R2 has so many issues with Clustering. We rebuilt our servers with windows 2012 data center and sql server2012 AlwaysOn. Also, set our data,log,temp drives to independent disk mode in Vmware.