Cluster not failing over after adding physical disk

  • Problem: Cluster not failing over after adding physical disk

    Specs:

    Server 2003

    SQL 2005

    Nodes: Active/Active, two node cluster each with an instance installed

    Node1 has group A

    Node2 has group B

    Recent Event: I added a Physical Disk resource to group B on node 2 of the cluster

    Story: I recently added a physical disk resource to group B on node 2 of a two node cluster. We are patching the servers and moved group A from node1 to node2, restarted node1 and moved group A back from node 2 to node 1.

    An issue occurred when moving group B from node 2 to node 1. What happens is that the resources, SQL, IP… start moving over to node1 then the disk I just added fails and the whole group B moves back to node 2 (Fail Back).

    I checked all the setting of the disk resource and they match perfectly with the others.

    What I noticed is in computer management --disk manager of node 1 I do not see the disks for node2.

    More specifically, in node 2 disk manager I can see all the local disk and the disks for node 1. The disk for node 1 are marked unreachable and have a red x. On node 1 I cannot see the disks for node 2 in the same way. I am only seeing the local disks on that node.

    Another thing I noticed is that disk Manager on both nodes show a Disk 2

    I rescanned disks on both server and still no luck.

    Any help would be appreciated.

    Jeff

  • More information:

    I am in Node 2 in Computer Management, but this time SAN Disk Manager and under Disks I do not see the disk that is failing during a Move Group operation.

    On node one all the disks are listed but node two does not have the disk that I just added.

    Could this be my trouble.

    Any help is appreciated.

    Jeff

  • Once you add the new disks you must add them as a dependency to the SQL Server service cluster resource, this will require a restart of the clustered service

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • I did add the resource as a dependency to SQL Server Service, but have not yet restarted the "Cluster Service" on that node.

    I will give it a try tonight after hours and let you know what happens.

    Your help is appreciated.

    Thanks

    Jeff

  • jayoub (1/27/2014)


    I did add the resource as a dependency to SQL Server Service, but have not yet restarted the "Cluster Service" on that node.

    I will give it a try tonight after hours and let you know what happens.

    Your help is appreciated.

    Thanks

    Please run the following from a command prompt on the server and post results

    cluster node

    cluster res

    Take the new disks resource name from cluster res and put into the following

    cluster res "disk resource name" /listowners

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Cluster res

    Resource Group NodeStatus

    Disk L: GroupsharePointZ010online

    Disk M: GroupsharePointZ010online

    SQL Network Name (SQLSHAREPOINT)GroupsharePointZ010online

    SQL IP Address 1 (SQLSHAREPONT)GroupsharePointZ010online

    SQL Server (SHAREPOINT) GroupsharePointZ010online

    SQL Server Agent (SHAREPOINT)GroupsharePointZ010online

    SQL Server Fulltext (SHAREPOINT)GroupsharePointZ010online

    Disk J: GroupDiscoZ009online

    Disk S: GroupDiscoZ009online

    SQL Network Name (SQLVIRTUAL)GroupDISCOMPZ009online

    SQL IP Address 1 (SQLVIRTUAL)GroupDISCOMPZ009online

    SQL Server (DISCOMP) GroupDISCOMPZ009online

    Disk T: GroupDISCOMPZ009online

    NEWDB (this is a disk) GroupDISCOMPZ009online

    Disk K: GroupDISCOMPZ009online

    SQL Server Agent (DISCOMP)GroupDISCOMPZ009online

    SQL Server Fulltext (DISCOMP)GroupDISCOMPZ009online

    Cluster IP Address Cluster GroupZ009online

    Cluster Name Cluster GroupZ009online

    Disk Q: Cluster GroupZ009online

    MSDTC Cluster GroupZ009online

    Disk O: (problem disk) GroupsharePointZ010online

    Cluster Node

    NodeNodeIDStatus

    Z0102UP

    Z0091UP

    Cluster res "Disk O:" /listowners

    Z010

    Z009

    Jeff

  • I just found out that the SAN Admin provisioned the disk to only the one node Z010 and did not include the other node Z009 in the storage software. The other drives have both hosts listed and probably this is the cause of the problem.

    I will restart the cluster services or even the whole box tonight and check it and let you know.

    I provided the information as best i could. I had to retype the whole thing and the formatting did not come out in a nice way.

    Thank you very much

    Jeff

  • jayoub (1/27/2014)


    I just found out that the SAN Admin provisioned the disk to only the one node Z010 and did not include the other node Z009 in the storage software.

    Thats your problem right there

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Still no luck. I rebooted the server and tried the move and it was the same results.

    I still feel like the SAN Admin has not provisoned the drive correctly. I have done the job before with another SAN admin and it went without a hitch

    In computer Management there is a folder called SAN Disk Manager and the trouble drive is not listed there. All other drives are listed and working. I have a feeling that there is more to the provisioning process that must be done.

    I will keep trying and let you know what happens. I may have to start digging into the SAN myself and see. Sometimes a second set of eyes can spot something.

    Thanks

    Jeff

  • jayoub (1/27/2014)


    Still no luck. I rebooted the server and tried the move and it was the same results.

    I still feel like the SAN Admin has not provisoned the drive correctly. I have done the job before with another SAN admin and it went without a hitch

    In computer Management there is a folder called SAN Disk Manager and the trouble drive is not listed there. All other drives are listed and working. I have a feeling that there is more to the provisioning process that must be done.

    I will keep trying and let you know what happens. I may have to start digging into the SAN myself and see. Sometimes a second set of eyes can spot something.

    Thanks

    Its easy to get the IDs wrong and leave a device masked, i have experienced this in the past

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Update to the issue.

    The failover is still not working and here is the problem. We have an HP EVA SAN and we also have software called Falconstore that is managing the cluster drives.

    The current SAN admin needs to get Falcostore out of the equation, so he wants to provision the drives using only the HP EVA software and get that to work. Once he gets this working I am sure the physical disk will begin failing over correctly

    Again thanks for the help and I will update once it is completely figured out.

    Jeff

  • Update to the issue.

    The failover is still not working and here is the problem. We have an HP EVA SAN and we also have software called Falconstore that is managing the cluster drives.

    The current SAN admin needs to get Falcostore out of the equation, so he wants to provision the drives using only the HP EVA software and get that to work. Once he gets this working I am sure the physical disk will begin failing over correctly

    Again thanks for the help and I will update once it is completely figured out.

    Jeff

    Jeff

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply