Failover Clustering Issues

  • Guys, we're relatively new here to failover clustering, so BE NICE!!!!!

    Our setup is we have two "real" servers, WAS and WDBS, that together simulate the cluster, which we'll call "HR". My understanding of clustering, such as it is, is that HR isn't real, but is essentially an illusion created by the two background servers. The hardware is Compaq, and they sent a guy out who did the installation, SQL Server configuration -- heck, pretty much everything. The database normally runs on WDBS, and the apps run on WAS. The database is set up to failover onto WAS if WDBS chokes on a hairball.

    Every Wednesday morning at 7:30 AM, our operator reboots the cluster. When it comes back up, she logs into WDBS and always sees a message indicating failure of MSSQLSERVER to come up automatically, like it is supposed to do. She pulls up "Services", right-clicks MSSQLSERVER, and starts it.

    Then, she goes to the SQL Server Service Manager, which shows "HR" as the database server. MSSQLSERVER shows up as "off", even though we know the service is running. She then refreshes it, and then everyone is in agreement. We've also noticed that, once we start SQLSERVERAGENT (likewise manually, even though marked otherwise), sometimes Enterprise Manager shows it as "off" even though it is on and running.

    So what we're wondering is, what's going on? It seems to us as if the right hand does not know what the left is doing. How can we get our automatic services to start automatically?

    Thanks in advance!

  • quote:


    Every Wednesday morning at 7:30 AM, our operator reboots the cluster. When it comes back up, she logs into WDBS and always sees a message indicating failure of MSSQLSERVER to come up automatically, like it is supposed to do. She pulls up "Services", right-clicks MSSQLSERVER, and starts it.


    Because you are running clustered SQL Server, you should use cluster administrator to bring up/down SQL Server and Agent resources.

  • quote:


    Because you are running clustered SQL Server, you should use cluster administrator to bring up/down SQL Server and Agent resources.


    Microsoft says this: "SQL Server 2000 virtual servers do not have these restrictions; you can safely start and stop the services by any available means."

    Is that not true?

  • quote:


    Microsoft says this: "SQL Server 2000 virtual servers do not have these restrictions; you can safely start and stop the services by any available means."


    You could but it is not recommended. Once SQL Server is clustered, It creates SQL Server resource that depends on other clustered resources such as disks, SQL Network name and IP address. If one of them doesn't be brought online, Virtual SQL Server will not be able to start.

    Each time you reboot machine which SQL Server resources reside will move all resources to another machine and these resources will be online by themself in normal sitiuation. What you need to look is the machine application/system logs to find out why these resources did not become online after the reboot.

  • What kind of clustering do you have?

    There is active/active and active/passive.

    Normally, you have two 'nodes' which are the actual hardware servers (WAS and WDBS for you). Then you have a virtual server, which is an array of hard drives that BOTH nodes can see and use. It doesn't physically exist on either node. Any software (such as SQL Server) that is to 'failover' is loaded on this virtual server.

    In active/passive, only one node can see/use the virtual server. That is also the only node that has the services running. The other node will show the services not running until a failover occurs.

    In active/active, both nodes can see/use the virtual server. I've never used this mode, so I'm not sure about the services...but since there is still only one virtual server - only one node can 'own' it. So the services should only be running on that node until it fails over.

    So, I believe you shouldn't be having the SQL Server services running on both nodes at the same time.

    Lastly, is the services set to come up automatically on startup? You might want to recheck that.

    Also, yes while you can stop and start the services a lot of ways, using the cluster manager/administration program is the best.

    By the way....when she reboots the cluster, is she doing both nodes at once? Or separately? Remember, both nodes should NEVER be rebooted at the same time (defeats clustering). If you reboot WDBS, you won't find the service started there because SQL Server will now be under the control of WAS and the service should be running there.

    Hopefully I haven't confused you too much.

    -SQLBill

  • quote:


    What kind of clustering do you have? There is active/active and active/passive.


    Active/passive. WDBS is the man until he falls, then WAS picks up the flag.

    quote:


    In active/passive, only one node can see/use the virtual server. That is also the only node that has the services running. The other node will show the services not running until a failover occurs.


    This is what we have.

    quote:


    Lastly, is the services set to come up automatically on startup? You might want to recheck that.


    They are, but that's through "Control Panel" -> "Administrative Tools" -> "Services", not through "Cluster Administrator".

    quote:


    Also, yes while you can stop and start the services a lot of ways, using the cluster manager/administration program is the best.


    If so, I sure hope it isn't easy to mess it up. I've never had to touch it.

    quote:


    By the way....when she reboots the cluster, is she doing both nodes at once? Or separately?


    Don't know yet, she should respond shortly.

    quote:


    Remember, both nodes should NEVER be rebooted at the same time (defeats clustering). If you reboot WDBS, you won't find the service started there because SQL Server will now be under the control of WAS and the service should be running there.


    So... Is this the correct sequence of events?

    1. Shut down both servers.

    2. Re-Start WDBS. (database active)

    3. Re-Start WAS. (passive)

    quote:


    Hopefully I haven't confused you too much.


    It's hard to know at this point, but your advice is greatly appreciated!

  • The reason you have cluster is to have SQL Server available almost all the time. You don't have to shut down both servers at same time. Shuting down active server will move all resources to the passive server. Once the server comes back online, then shutdown the passive server if you need.

  • Find the active node, go to Start>Programs>Microsoft SQL Server>Service Manager.

    In the Service Manager pop-up, set the server window for your server name, set the service to SQL Server. Down at the bottom there is a 'check-box' for Auto-start services when OS starts. Is that box checked? It should be on both nodes. Also, you may want to consider setting the SQL Server Agent service to auto-start.

    Also, I agree with Allen. You reboot one node (usually the active node). The server fails over. When the original node (being rebooted) comes back up, you then reboot the other node. That will fail the server back over to the original node.

    Usually the cluster administration/management tool makes it very easy to stop and start services (you can also use it to failover the nodes).

  • 2 things, MSDTC must be cluster aware and do you have enough resources at node WAS for the WDBS resources to run?

    simba


    simba

  • quote:


    In the Service Manager pop-up, set the server window for your server name, set the service to SQL Server. Down at the bottom there is a 'check-box' for Auto-start services when OS starts. Is that box checked? It should be on both nodes. Also, you may want to consider setting the SQL Server Agent service to auto-start.


    Both MSSQLSERVER and SQLSERVERAGENT were already set to autostart, but only on the active node. It's the "autostart" feature that was failing.

    quote:


    Also, I agree with Allen. You reboot one node (usually the active node). The server fails over. When the original node (being rebooted) comes back up, you then reboot the other node. That will fail the server back over to the original node.


    Very good. I will do this, or rather, propose to our net admins that they do this. I look forward to seeing what this does. Thanks!

  • quote:


    In the Service Manager pop-up, set the server window for your server name, set the service to SQL Server. Down at the bottom there is a 'check-box' for Auto-start services when OS starts. Is that box checked? It should be on both nodes. Also, you may want to consider setting the SQL Server Agent service to auto-start.


    Both services have to be set 'manual' start on both servers. SQL Server cluster will handle the startup once machine reboots or fails over.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply