Clustering Errors

  • Hello,

    I need help with a project at my new work. I was hired as the DBA and I told then I did not have clustering experience. What is the second project they give me? To resolve their clustering issues. This is what I have at first glance:

    Failover Cluster Management states:

    1. Quorum configuration: Node and Disk Majority (Cluster Disk1) Failure of a node or Cluster Disk 1 will cause cluster to fail.

    2. Application are not running ( This is the Backup Express)

    3. Node 2 - Unavailable Critical Error: Node 2 failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.

    Please can anyone help me troubleshoot this. Node 1 cannot be restarted without notice to users. I have already restarted node 2. The Witness disk in the Quorum is on a system called NetApps. We do have Semantic Endpoint, but it appears the nodes are talking to each other. It states that it is Online. Is there someone willing to chat with me on this today preferably?

  • VanishingW (9/6/2011)


    Hello,

    I need help with a project at my new work. I was hired as the DBA and I told then I did not have clustering experience. What is the second project they give me? To resolve their clustering issues. This is what I have at first glance:

    There's nothing like being thrown in at the deep end 😉

    VanishingW (9/6/2011)


    Failover Cluster Management states:

    1. Quorum configuration: Node and Disk Majority (Cluster Disk1) Failure of a node or Cluster Disk 1 will cause cluster to fail.

    How many nodes are in the Windows cluster?

    What Windows operating system do the cluster nodes have 2003 or 2008?

    VanishingW (9/6/2011)


    2. Application are not running ( This is the Backup Express)

    Is this installed as a clustered service?

    VanishingW (9/6/2011)


    3. Node 2 - Unavailable Critical Error: Node 2 failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.

    it sounds to me as though the second node cannot 'see' the quorum drive, i'm assuming that all the shared storage is NetApp based LUNs

    VanishingW (9/6/2011)


    Please can anyone help me troubleshoot this. Node 1 cannot be restarted without notice to users. I have already restarted node 2. The Witness disk in the Quorum is on a system called NetApps. We do have Semantic Endpoint, but it appears the nodes are talking to each other. It states that it is Online. Is there someone willing to chat with me on this today preferably?

    who looks after the storage there, might wanna tap their shoulder 😉

    Failing that, I'm available on a contract basis if you require it 😀

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • There's nothing like being thrown in at the deep end [Wink]

    I know right...

    How many nodes are in the Windows cluster?

    There are 2 nodes

    What Windows operating system do the cluster nodes have 2003 or 2008?

    W2008 R2

    Is this installed as a clustered service?

    I believe so.

    it sounds to me as though the second node cannot 'see' the quorum drive, i'm assuming that all the shared storage is NetApp based LUNs

    I believe you are correct on this one.

    who looks after the storage there, might wanna tap their shoulder [Wink]

    This guy is on vacation. But I will get to him.

    Failing that, I'm available on a contract basis if you require it [BigGrin]

    I cannot afford a contractor right now on my petty pay. :crying:

  • From your answers above it sounds like you didn't need to post here in the first place 😉

    First thing is to get on the Filer and verify that the initiator groups contain both nodes, i suspect they don't!

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Thanks I will check that right now. One more question, I looked that the ISCSI service on node 1, it is not running, is this a problem?

  • VanishingW (9/6/2011)


    Thanks I will check that right now. One more question, I looked that the ISCSI service on node 1, it is not running, is this a problem?

    if the storage is not presented over iSCSI then it doesnt matter

    BTW node and disk majority are correct for your cluster configuration

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Thanks again. Going back to network now.

  • cool!

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Well, the filer says both nodes are there. I ran the validation and the only errors that come up is the same subnet and the user initiating the validation check was not in the Domain Admin account. Both firewalls on the servers are off. Oh another error was that the nodes did not have the same updates installed. Would this bring it down?

  • I'm assuming the filer is setup to use FCP and not iSCSI, double check the initiator groups as that has to be the root of the issue. Does the second node have access to all the remaining shared disks?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Ok here are the new findings. Last night I shut down one and every thing failed over to two. It was not instant, but it did switch and two had access to the Q and was the primary, but then when I brought one up it did not have access to the Q as it was in reverse before the shut down. So it seems that only one node at a time can talk to the Q. Any ideas?

  • VanishingW (9/16/2011)


    Ok here are the new findings. Last night I shut down one and every thing failed over to two. It was not instant, but it did switch and two had access to the Q and was the primary, but then when I brought one up it did not have access to the Q as it was in reverse before the shut down. So it seems that only one node at a time can talk to the Q. Any ideas?

    Ah right, that is by design. Only one node may access the shared disks at a time. As long as it fails over correctly that is fine

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply