Cannot Repair or remove node

  • I have a 2 node cluster. When I move the instance of SQL from node B to Node A, SQL fails. I haven't been able to determine why, but I suspect SQL has become corrupt on node A.

    To resolve the issue I thought of repairing the node. When I get to the step where I'm to select the instance of SQL, there isn't one to select, but the node is listed. If I click Next, the error states: This node is not in any sql server failover cluster.

    So I tried to remove the node, but got the same result.

    (Something has gotten really screwed up. This cluster has been working well for 2 years.)

    Can I evict the node in failover management, and then rebuild the node from scratch?

    This is a prod environment & I need my databases available at all times, so will this affect the available of them?

    I'm running W2k8 Enterprise & SQL2k8 R2.

    Thanks in advance.

    BigSam

  • what you should have done is to take the group offline and move it to Node A, then try to bring the resources online one at a time in this order

    • IP Address
    • Network name
    • shared disk resources
    • sql server service
    • sql server agent

    If i read it right, you're saying that the uninstall has failed on Node A?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • How you are trying to remove the node?

  • I configured the node to not automatically failover. Then I moved the instance from Node B to Node A. When SQL was in a failed state on Node A I tried to bring it online from SQL Configuration Manager - still failed to come online. Nothing helpful in the logs. Everything comes online except for SQL & Agent.

    I would love to repair or remove the node. Whatever it takes to get it working; repaired or rebuilt.

  • In trying to repair or remove the node, I'm using the SQL Setup -> Maintenance options on the CD.

  • BigSam (11/28/2012)


    A. When SQL was in a failed state on Node A I tried to bring it online from SQL Configuration Manager

    Incorrect, bring the resources online manually from failover cluster manager in the order I specified above. When the resource fails go check the event logs you will find information in there.

    If you'd like to email me a copy of the event log after the failure I'd be happy to take a look 😉

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • I understand where you're going with these steps. Getting the window to do this on a prod server isn't always easy.

    When I failed-over from Node B to A, the instance of SQL was unable to automatically move back. When I tried to start SQL on Node A I was able to capture lots of information in the Error Log; unfortunately there wasn't a smoking gun in the log. All of my databases started, etc. & then boom. I've been trying to work with Microsoft on this & they seem confounded, too. Also, they seem more interested in the root cause, which I understand.

    However, now my boss is on me to get something done ASAP, which I also understand. That's why I wanted to either repair or remove the instance. Since I cannot do either in the recommended way, I need to know what or if there is a work around that doesn't take my databases offline, such as just evicting the node from within failover manager, then rebuilding the server & node from scratch.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply