Server down

Question

Server down

ramyours2003

SSChampion

Points: 12266
More actions
March 27, 2015 at 7:08 am

#302527

y'day we faced situation one of the primary server went down and unable to failover the services to second node . by checking in logs we found
Cluster network 'Public' is partitioned. Some attached failover cluster nodes cannot communicate with each other over the network. The failover cluster was not able to determine the location of the failure. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
and also we found in
Event ID 2003 :A Windows Firewall setting in the Domain profile has changed.
is this id is responsible to shutdown the server ?
what are the reasons for this situation ?

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

jasona.work SSC Guru Points: 50083 More actions · Answer 1

ramyours2003 (3/27/2015)
Event ID 2003 :A Windows Firewall setting in the Domain profile has changed.
is this id is responsible to shutdown the server ?
what are the reasons for this situation ?

Possibly, depending on the firewall setting that was changed.

*IF* it was a firewall settings change, it could've been done at the Domain level with Group Policy. Otherwise, someone would have had to have logged into the server(s) to make the change.

One question, as well. Do you have a dedicated "heartbeat" network between the cluster nodes? Or is all the network communication over the public network? POSSIBLY (if you don't have a heartbeat network) a heartbeat network would've kept the cluster alive and happy, unless the firewall change also impacted that network.

ramyours2003 SSChampion Points: 12266 More actions · Answer 2

we have a monitoring tool which provide alerts on heart beat ..

jasona.work SSC Guru Points: 50083 More actions · Answer 3

OK, but do you have a dedicated heartbeat network for the cluster servers?

So, for example, each server in the cluster would have the following:

1x NIC for the public network (this would be using your "public" IP range)

1x NIC for the private (heartbeat) network (allocated as cluster traffic ONLY in cluster manager) (this would be using a "private" IP range, and ideally a separate *physical* network from the public network, even if only a cross-over cable between 2 nodes)

1x NIC for the storage network (if needed, such as shared storage)

Or, is your configuration more like this:

1x NIC for the public network

1x NIC for the storage network (if needed, such as shared storage)

The idea behind having a dedicated heartbeat network is, if one of the servers loses it's public connection, it can still hand off any cluster resources over the heartbeat to the other node(s). If the heartbeat goes down, then (unless configured otherwise) the nodes can still heartbeat each other over the public network. It's really the same reason you have a cluster in the first place, redundancy.

(Also, please bear in mind, I've not set up a cluster in quite a while, my SQLs (for now) are VMs on a beefy VMWare cluster)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

The root cause of this failure was that you either a) built a cluster that wasn't properly validated/tested from the start or b) you allowed changes to your production environment that broke that valid server pair. Either way it is a failure of process and/or operations.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Perry Whittle SSC Guru Points: 233854 More actions · Answer 5

jasona.work (3/27/2015)
OK, but do you have a dedicated heartbeat network for the cluster servers?

You no longer need to have a dedicated heartbeat network, you don't push the heartbeat traffic through a specific Network, the networks aren't used in the same way they were in Windows 2003.

You do however, need multiple redundant networks.

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

Perry Whittle SSC Guru Points: 233854 More actions · Answer 6

ramyours2003 (3/27/2015)
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

First thing is to run a cluster validation and address any issues it reports back

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

jasona.work SSC Guru Points: 50083 More actions · Answer 7

Perry Whittle (3/27/2015)
jasona.work (3/27/2015)
OK, but do you have a dedicated heartbeat network for the cluster servers?
You no longer need to have a dedicated heartbeat network, you don't push the heartbeat traffic through a specific Network, the networks aren't used in the same way they were in Windows 2003.
You do however, need multiple redundant networks.

Ah, OK. The last cluster I worked with was a Server 2003 cluster, and the last one I built (4'ish years back) was for Hyper-V 2008 R2 (which means I think the dedicated network was for Live migrations of VMs)

I think it's time for me to setup my test cluster at home and get familiar with it again...