Rebooting Clustered Nodes of SQL 2005/8

  • What is the best way to reboot the noes of a clustered SQL server.

    a. do we just do hard reboot.

    b.failover the instances to a node and reboot the other and repeat the same on the 2nd node

    c. take the cluster ofline and reboot all nodes.

    d. any other option?

  • I would failover any active resources on that node to a different node, restart the node and fail the resources back. This will cause an outage for the system as the resources are failed over, but it shouldn't take very long.

    I would not recommend doing this during business hours.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I agree with Jeffrey and in addition there are a few things you should consider before you reboot and this list is by no means inclusive of everything to think of.

    You may want to block access to your application before you take sql down to ensure that there are no in flight operations going on during the reboots.

    You may want to set an Alerting blackout so you don't send out a bunch of alerts about your planned maintenance

    You should check the sql server agent to make sure that you don't have jobs that might get missed during the failover or jobs that will get cut short by the reboot. (Backups, maintenance plans, other critical business jobs).

    You want to have the contact info handy of someone you can call to go to the datacenter or someone with access to a KVM just in case the machine decides not come back up.

    If you can swallow a longer outage sometimes its less disruptive to down the resources then move them to the other node, leaving them down until the rebooted node comes back online, sliding them back to their original location and then bringing them back online.

  • If you're dealing with a typical 1-instance 2-node cluster:

    Patch/reboot the passive node, wait for it to come back up.

    Fail over the instance to the passive node, wait for it to come back up, test connectivity.

    Once validated, reboot the other node (now passive).

    Either leave your instance running on the node you rebooted first, or fail it back if you have a preferred node for some reason (better specs, etc).

    Pretty straight forward.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply