Ownership of cluster disk 'Cluster Disk xxx has been unexpectedly lost by this node.

  • sql-lover (5/6/2013)


    it happened again. Now, not sure if it was the same or not.

    I'll take notice though, thanks for reply.

    Have you made the changes in the iscsi initiator?

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (5/7/2013)


    sql-lover (5/6/2013)


    it happened again. Now, not sure if it was the same or not.

    I'll take notice though, thanks for reply.

    Have you made the changes in the iscsi initiator?

    Today ... 🙂

    But tested the process on my VMware lab already.

  • Sorry to hijack but we are experiencing a similar issue on our 3-node cluster. Two of the disks attached to an HP MSA SAN reguarly "lose ownership".

    The iSCSI initiator properties for ALL targets are set to

    Local Adapater: Default

    Initiator IP: Default

    Target Portal IP: Default

    Are you saying these values should be changed to this?

    Local Adapater: Microsoft iSCSI Initiator

    Initiator IP: x.x.x.x

    Target Portal IP: x.x.x.x

    I also notice that the binding order is DIFFERENT on each node, God knows what idiot set that up but I guess they should be exactly the same?

  • ginolard (5/7/2013)


    Sorry to hijack but we are experiencing a similar issue on our 3-node cluster. Two of the disks attached to an HP MSA SAN reguarly "lose ownership".

    The iSCSI initiator properties for ALL targets are set to

    Local Adapater: Default

    Initiator IP: Default

    Target Portal IP: Default

    Are you saying these values should be changed to this?

    Local Adapater: Microsoft iSCSI Initiator

    Initiator IP: x.x.x.x

    Target Portal IP: x.x.x.x

    I also notice that the binding order is DIFFERENT on each node, God knows what idiot set that up but I guess they should be exactly the same?

    Two things ...

    Each case is different, so I suggest open a new thread and post your issue, so any other user or Perry can step in and help. Not that I mind you for hijack my thread though 🙂 ... but you may get better attention.

    2nd thing is ... the MS engineer advice was to change the binding order and put the fixed IP on iSCSI initiatior, yes, that is correct. But I do differ on the 1st advice though. This is how I usually configure my Clusters (binding order(

    -Public

    -Heartbeat

    -SAN

    This is why MS is recommending

    -Public

    -SAN

    -Heartbeat

    And binding order should BE THE SAME ON ALL NODES!

    I do agree that the iSCSI initiator should use actual IPs though, and that's what I will change today with the help of my IT resource.

  • ginolard (5/7/2013)


    The iSCSI initiator properties for ALL targets are set to

    Local Adapater: Default

    Initiator IP: Default

    Target Portal IP: Default

    Are you saying these values should be changed to this?

    Local Adapater: Microsoft iSCSI Initiator

    Initiator IP: x.x.x.x

    Target Portal IP: x.x.x.x

    Yes, see my article at this link[/url]

    ginolard (5/7/2013)


    TI also notice that the binding order is DIFFERENT on each node, God knows what idiot set that up but I guess they should be exactly the same?

    The binding order must be the same on all nodes

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (5/7/2013)


    ginolard (5/7/2013)


    The iSCSI initiator properties for ALL targets are set to

    Local Adapater: Default

    Initiator IP: Default

    Target Portal IP: Default

    Are you saying these values should be changed to this?

    Local Adapater: Microsoft iSCSI Initiator

    Initiator IP: x.x.x.x

    Target Portal IP: x.x.x.x

    Yes, see my article at this link[/url]

    ginolard (5/7/2013)


    TI also notice that the binding order is DIFFERENT on each node, God knows what idiot set that up but I guess they should be exactly the same?

    The binding order must be the same on all nodes

    Nice article thanks. One small question though. In the case (such as ours) where we have TWO iSCSI NICs per host, which IP would you specify?

  • ginolard (5/8/2013)


    Nice article thanks.

    You're welcome, I hope you found it useful

    ginolard (5/8/2013)


    One small question though. In the case (such as ours) where we have TWO iSCSI NICs per host, which IP would you specify?

    It's all detailed in the article, first question to ask is

    Are you using MCS or MPIO?

    You'll essentially have multiple connections\sessions to the exposed LUNs, its the MCS or MPIO driver that makes the multiple connections\sessions per LUN appear as one disk logical disk.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (5/8/2013)


    ginolard (5/8/2013)


    Nice article thanks.

    You're welcome, I hope you found it useful

    ginolard (5/8/2013)


    One small question though. In the case (such as ours) where we have TWO iSCSI NICs per host, which IP would you specify?

    It's all detailed in the article, first question to ask is

    Are you using MCS or MPIO?

    You'll essentially have multiple connections\sessions to the exposed LUNs, its the MCS or MPIO driver that makes the multiple connections\sessions per LUN appear as one disk logical disk.

    Right, well, the person who installed this cluster (who was fired some months ago) appears to have decided to use BOTH. I plan to uninstall MPIO on Friday because I believe it to be causing our issues. Also, I don't really see the value of having two iSCSI NICS when he hasn't even added a second session to MPIO (as per your article)

    The more I look, the more I find how badly this thing has been done

    I may consider putting MPIO back but the more pressing issue right now is get the damned thing stable. Besides, the shares will be migrated to a NAS in the coming months anyway

  • ginolard (5/8/2013)


    Right, well, the person who installed this cluster (who was fired some months ago) appears to have decided to use BOTH.

    Probably why they were fired 😉

    Use either\or never both.

    ginolard (5/8/2013)


    I plan to uninstall MPIO on Friday because I believe it to be causing our issues. Also, I don't really see the value of having two iSCSI NICS when he hasn't even added a second session to MPIO (as per your article)

    Without seeing your setup it would be hard to comment but using the article as a guide, if you check the MCS settings on the disk device what do you see?

    On the initiator discovery tab do you see both IPs for the iSCSI target listeners?

    ginolard (5/8/2013)


    The more I look, the more I find how badly this thing has been done

    I may consider putting MPIO back but the more pressing issue right now is get the damned thing stable. Besides, the shares will be migrated to a NAS in the coming months anyway

    The differences between MCS and MPIO are discussed in the article

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Oh well, the intervention solved nothing.

    New situation is this

    Cluster is now only 2 nodes

    1 NIC each only for Public, iSCSI and Heartbeat connections (no more NIC Teaming)

    MPIO uninstalled on all nodes

    Any existing Persistent Reservations cleared on all disks.

    The two disks attached to the HP MSA filer still lose ownership around the same time each day and throw 1038 events.

    This is starting to annoy me now 😉

  • If it's any help, these are the entries from the cluster log at the time of the problem.

    00001808.00000610::2013/05/13-02:46:04.351 INFO [GUM] Node 2: Processing RequestLock 1:191

    00001808.00000610::2013/05/13-02:46:04.351 INFO [GUM] Node 2: Processing GrantLock to 1 (sent by 2 gumid: 978)

    00001808.0000205c::2013/05/13-02:46:04.366 INFO [NM] Received request from client address 2002:9ea6:b026::9ea6:b026.

    00001808.00000610::2013/05/13-02:46:06.348 INFO [QUORUM] Node 2: quorum is not owned by anyone

    00001758.00001f38::2013/05/13-02:46:06.738 ERR [RES] Physical Disk: PNP: Failed to get volume info for disk resource SOP92-V2_V2, status 3

    00001808.00001f1c::2013/05/13-02:46:06.753 INFO [GUM] Node 2: Processing RequestLock 2:119

    00001808.00000610::2013/05/13-02:46:06.753 INFO [GUM] Node 2: Processing GrantLock to 2 (sent by 1 gumid: 981)

    00001808.00002318::2013/05/13-02:46:06.769 WARN [RCM] ResourceControl(SET_PRIVATE_PROPERTIES) to SOP92-V2_V2 returned 5024.

    00001758.00002340::2013/05/13-02:46:06.769 INFO [RES] Physical Disk: PNP: HardDiskpSetPnpUpdateTimePropertyWorker: status 0, time 02:46:06.738

    00001758.00002368::2013/05/13-02:46:06.847 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00001f24::2013/05/13-02:46:06.878 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00002368::2013/05/13-02:46:06.909 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00001f24::2013/05/13-02:46:06.940 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00001f24::2013/05/13-02:46:06.940 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.00001fd8::2013/05/13-02:46:06.940 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00002368::2013/05/13-02:46:06.972 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00002368::2013/05/13-02:46:06.972 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000227c::2013/05/13-02:46:06.972 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00001f24::2013/05/13-02:46:07.003 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00001f24::2013/05/13-02:46:07.003 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.00001f1c::2013/05/13-02:46:07.003 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00002368::2013/05/13-02:46:07.034 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.00001fd8::2013/05/13-02:46:07.034 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001758.00001f24::2013/05/13-02:46:07.065 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000227c::2013/05/13-02:46:07.065 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001758.00002368::2013/05/13-02:46:07.096 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.00001f1c::2013/05/13-02:46:07.096 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001758.00002368::2013/05/13-02:46:07.128 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00001f24::2013/05/13-02:46:07.159 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00002368::2013/05/13-02:46:07.190 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00001f24::2013/05/13-02:46:07.221 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00001f24::2013/05/13-02:46:07.221 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.00001fd8::2013/05/13-02:46:07.221 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00002368::2013/05/13-02:46:07.252 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00002368::2013/05/13-02:46:07.252 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000227c::2013/05/13-02:46:07.252 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00001f24::2013/05/13-02:46:07.284 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00001f24::2013/05/13-02:46:07.284 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.00001f1c::2013/05/13-02:46:07.284 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00002368::2013/05/13-02:46:07.315 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.00001fd8::2013/05/13-02:46:07.315 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001758.00001f24::2013/05/13-02:46:07.346 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000227c::2013/05/13-02:46:07.346 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001808.00000610::2013/05/13-02:46:07.362 INFO [GUM] Node 2: Processing RequestLock 1:194

    00001808.00000610::2013/05/13-02:46:07.362 INFO [GUM] Node 2: Processing GrantLock to 1 (sent by 2 gumid: 982)

    00001758.00002368::2013/05/13-02:46:07.377 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.00001f1c::2013/05/13-02:46:07.377 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001808.00000610::2013/05/13-02:46:07.377 INFO [QUORUM] Node 2: quorum is arbitrated by node 1

    00001758.00001f24::2013/05/13-02:46:07.424 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00002368::2013/05/13-02:46:07.455 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00001f24::2013/05/13-02:46:07.502 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00002368::2013/05/13-02:46:07.518 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00002368::2013/05/13-02:46:07.518 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.00001fd8::2013/05/13-02:46:07.518 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00001f24::2013/05/13-02:46:07.549 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00001f24::2013/05/13-02:46:07.549 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000227c::2013/05/13-02:46:07.549 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00001f24::2013/05/13-02:46:07.564 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00001f24::2013/05/13-02:46:07.564 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000205c::2013/05/13-02:46:07.564 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00002368::2013/05/13-02:46:07.580 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.00001fd8::2013/05/13-02:46:07.580 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001758.00001f24::2013/05/13-02:46:07.611 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000227c::2013/05/13-02:46:07.611 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001758.00002368::2013/05/13-02:46:07.627 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000205c::2013/05/13-02:46:07.627 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001758.00001f24::2013/05/13-02:46:07.674 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00002368::2013/05/13-02:46:07.720 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00001f24::2013/05/13-02:46:07.767 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00002368::2013/05/13-02:46:07.798 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00002368::2013/05/13-02:46:07.798 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.00001fd8::2013/05/13-02:46:07.798 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00001f24::2013/05/13-02:46:07.814 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00001f24::2013/05/13-02:46:07.814 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000227c::2013/05/13-02:46:07.814 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00001f24::2013/05/13-02:46:07.830 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00001f24::2013/05/13-02:46:07.830 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000205c::2013/05/13-02:46:07.830 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00002368::2013/05/13-02:46:07.861 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.00001fd8::2013/05/13-02:46:07.861 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001758.00001f24::2013/05/13-02:46:07.876 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.000023ec::2013/05/13-02:46:07.876 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001758.00002368::2013/05/13-02:46:07.892 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000205c::2013/05/13-02:46:07.892 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001758.00001f24::2013/05/13-02:46:07.923 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00002368::2013/05/13-02:46:07.923 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00002368::2013/05/13-02:46:07.923 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000205c::2013/05/13-02:46:07.923 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00001f24::2013/05/13-02:46:07.923 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000205c::2013/05/13-02:46:07.923 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001758.00002368::2013/05/13-02:46:07.939 ERR [RES] Physical Disk: Failed to get partition size, status 3

    00001758.00001f24::2013/05/13-02:46:07.939 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034

    00001758.00001f24::2013/05/13-02:46:07.939 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000205c::2013/05/13-02:46:07.939 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.

    00001758.00002368::2013/05/13-02:46:07.939 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.

    00001808.0000205c::2013/05/13-02:46:07.939 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.

    00001a14.00001f44::2013/05/13-02:46:08.641 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'

    00001808.0000205c::2013/05/13-02:46:08.641 WARN [RCM] Failed to load restype 'MSMQ': error 21.

    00001a14.00001f44::2013/05/13-02:46:08.656 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'

    00001808.0000205c::2013/05/13-02:46:08.656 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.

    00001a14.00001f44::2013/05/13-02:46:09.951 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'

    00001808.0000205c::2013/05/13-02:46:09.951 WARN [RCM] Failed to load restype 'MSMQ': error 21.

    00001a14.00001f44::2013/05/13-02:46:09.967 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'

    00001808.0000205c::2013/05/13-02:46:09.967 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.

    00001758.00001f24::2013/05/13-02:46:10.201 ERR [RES] Physical Disk <SOP92-V2_V2>: IsAlive sanity check failed!, pending IO completed with status 1235.

    00001758.00001f24::2013/05/13-02:46:10.201 ERR [RES] Physical Disk <SOP92-V2_V2>: IsAlive sanity check failed!, pending IO completed with status 1235.

    00001758.00001f24::2013/05/13-02:46:10.201 WARN [RHS] Resource SOP92-V2_V2 IsAlive has indicated failure.

    00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'SOP92-V2_V2', gen(5) result 1.

    00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] TransitionToState(SOP92-V2_V2) Online-->ProcessingFailure.

    00001808.0000205c::2013/05/13-02:46:10.201 ERR [RCM] rcm::RcmResource::HandleFailure: (SOP92-V2_V2)

    00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] resource SOP92-V2_V2: failure count: 2, restartAction: 2.

    00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] Greater than restartPeriod time has elapsed since first failure, resetting failureTime and failureCount.

    00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] Will restart resource in 500 milliseconds.

    00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] TransitionToState(SOP92-V2_V2) ProcessingFailure-->[WaitingToTerminate to DelayRestartingResource].

    00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (S-OLAF-CLFS2, Online --> Pending)

    00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] TransitionToState(FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)) Online-->[WaitingToTerminate to OnlineCallIssued].

    00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] TransitionToState(FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)) [WaitingToTerminate to OnlineCallIssued]-->[Terminating to OnlineCallIssued].

    00001758.00001f24::2013/05/13-02:46:10.201 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: FileServerDoTerminate: Terminate called... !!!

    00001808.000021c0::2013/05/13-02:46:10.201 INFO [RCM] HandleMonitorReply: TERMINATERESOURCE for 'FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)', gen(0) result 0.

    00001808.000021c0::2013/05/13-02:46:10.201 INFO [RCM] Restarting resource 'FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)'.

    00001808.000021c0::2013/05/13-02:46:10.201 INFO [RCM] TransitionToState(SOP92-V2_V2) [WaitingToTerminate to DelayRestartingResource]-->[Terminating to DelayRestartingResource].

    00001758.00001f24::2013/05/13-02:46:10.201 INFO [RES] Physical Disk <SOP92-V2_V2>: Terminate request.

    00001758.00001f24::2013/05/13-02:46:10.201 WARN [RES] Physical Disk <SOP92-V2_V2>: Terminate: Failed to open device \Device\Harddisk7\Partition2, Error 2

    00001758.00001f24::2013/05/13-02:46:10.263 INFO [RES] Physical Disk <SOP92-V2_V2>: HardDiskpCloseSVIHandles: Exit

    00001758.00001f24::2013/05/13-02:46:10.263 ERR [RES] Physical Disk: ReleaseDisk: stop reserve failed on device 7 (sig a37d7c82), status 1168

    00001758.00001f24::2013/05/13-02:46:10.263 ERR [RES] Physical Disk <SOP92-V2_V2>: Terminate: Failed to release disk \Device\Harddisk7\Partition0, Error 1168

    00001808.00001f1c::2013/05/13-02:46:10.263 INFO [RCM] HandleMonitorReply: TERMINATERESOURCE for 'SOP92-V2_V2', gen(6) result 0.

    00001808.00001f1c::2013/05/13-02:46:10.263 INFO [RCM] TransitionToState(SOP92-V2_V2) [Terminating to DelayRestartingResource]-->DelayRestartingResource.

    00001808.000021c0::2013/05/13-02:46:10.778 INFO [RCM] Delay-restarting SOP92-V2_V2 and any waiting dependents.

    00001808.000021c0::2013/05/13-02:46:10.778 INFO [RCM] TransitionToState(SOP92-V2_V2) DelayRestartingResource-->OnlineCallIssued.

    00001758.00002368::2013/05/13-02:46:10.778 INFO [RES] Physical Disk <SOP92-V2_V2>: Online request.

    00001808.00001fd8::2013/05/13-02:46:10.778 INFO [RCM] HandleMonitorReply: ONLINERESOURCE for 'SOP92-V2_V2', gen(6) result 997.

    00001808.00001fd8::2013/05/13-02:46:10.778 INFO [RCM] TransitionToState(SOP92-V2_V2) OnlineCallIssued-->OnlinePending.

    00001758.00001c6c::2013/05/13-02:46:10.778 INFO [RES] Physical Disk <SOP92-V2_V2>: Arbitrate request: FastPath 0

    00001758.00001c6c::2013/05/13-02:46:10.778 INFO [RES] Physical Disk: Enter EnumerateDevices: EnumDevice 0

    00001758.00001c6c::2013/05/13-02:46:10.778 INFO [RES] Physical Disk: Exit EnumerateDevices: status 0

    00001758.00001c6c::2013/05/13-02:46:10.903 INFO [RES] Physical Disk: Successful reserve, disk is unowned, key 4bdf7466734d

    00001758.00001c6c::2013/05/13-02:46:10.903 INFO [RES] Physical Disk: Update disk props returns 0

    00001808.00001fd8::2013/05/13-02:46:10.903 INFO [GUM] Node 2: Processing RequestLock 2:120

    00001758.00001c6c::2013/05/13-02:46:10.903 INFO [RES] Physical Disk <SOP92-V2_V2>: Disk is offline

    00001808.00000610::2013/05/13-02:46:10.903 INFO [GUM] Node 2: Processing GrantLock to 2 (sent by 1 gumid: 983)

    00001758.00000748::2013/05/13-02:46:10.950 INFO [RES] Physical Disk <SOP92-V2_V2>: Completed HardDiskpUpdateRegistryWorker (start time 02:46:10.903)

    00001758.00001c6c::2013/05/13-02:46:11.028 WARN [RES] Physical Disk <SOP92-V2_V2>: OnlineThread: Failed to get volume guid for device \\?\GLOBALROOT\Device\Harddisk7\Partition2\. Error 3

    00001758.0000060c::2013/05/13-02:46:11.090 INFO [RES] Physical Disk: PNP: Adding volume \\?\STORAGE#Volume#{9cf114d6-e555-11e1-b63f-002655d34435}#0000000008100000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b}

    00001758.0000060c::2013/05/13-02:46:11.090 INFO [RES] Physical Disk: PNP: Add Volume exit, status 0

    00001a14.00001f44::2013/05/13-02:46:11.246 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'

    00001808.000023ec::2013/05/13-02:46:11.246 WARN [RCM] Failed to load restype 'MSMQ': error 21.

    00001a14.00001f44::2013/05/13-02:46:11.262 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'

    00001808.000023ec::2013/05/13-02:46:11.262 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.

    00001a14.00001f44::2013/05/13-02:46:14.584 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'

    00001808.00002318::2013/05/13-02:46:14.584 WARN [RCM] Failed to load restype 'MSMQ': error 21.

    00001a14.00001f44::2013/05/13-02:46:14.600 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'

    00001808.00002318::2013/05/13-02:46:14.600 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.

    00001758.00001c6c::2013/05/13-02:46:15.037 WARN [RES] Physical Disk <SOP92-V2_V2>: OnlineThread: Failed to set volguid \??\Volume{7eddaf8a-6f3d-4a2b-9d06-caa144c224d7}. Error: 183.

    00001758.00001c6c::2013/05/13-02:46:15.037 INFO [RES] Physical Disk <SOP92-V2_V2>: Found 2 mount points for device \Device\Harddisk7\Partition2

    00001758.00001c6c::2013/05/13-02:46:18.437 INFO [RES] Physical Disk <SOP92-V2_V2>: VolumeIsNtfs: Volume \\?\GLOBALROOT\Device\Harddisk7\Partition2\ has FS type NTFS

    00001758.00001c6c::2013/05/13-02:46:18.437 INFO [RES] Physical Disk: Volume \\?\GLOBALROOT\Device\Harddisk7\Partition2\ has FS type NTFS

    00001758.00001c6c::2013/05/13-02:46:18.437 INFO [RES] Physical Disk: MountPoint U:\ points to volume \\?\Volume{7eddaf8a-6f3d-4a2b-9d06-caa144c224d7}00001758.00001c6c::2013/05/13-02:46:18.437 DBG [ClRtl] CaptureShareInfo: Share 0000000000000050, Server 0000000000000000, Path 0000000000000056 SD 0000000000000000 size 242

    00001758.00001c6c::2013/05/13-02:46:18.437 WARN [ClRtl] SsCoreShareAdd(): status = 2118 share = U$ server = (null)

    00001758.00001c6c::2013/05/13-02:46:18.437 DBG [ClRtl] CaptureShareInfo: Share 0000000000000050, Server 0000000000000000, Path 0000000000000056 SD 0000000000000000 size 242

    00001758.00001c6c::2013/05/13-02:46:18.437 INFO [RHS] Resource SOP92-V2_V2 has come online. RHS is about to report status change to RCM

    00001808.00001f1c::2013/05/13-02:46:18.437 INFO [RCM] HandleMonitorReply: ONLINERESOURCE for 'SOP92-V2_V2', gen(6) result 0.

    00001808.00001f1c::2013/05/13-02:46:18.437 INFO [RCM] TransitionToState(SOP92-V2_V2) OnlinePending-->Online.

    00001758.00001c6c::2013/05/13-02:46:18.437 INFO [RES] Physical Disk: DriveLetter mask: 0x100000

    00001808.00002318::2013/05/13-02:46:18.437 INFO [RCM] TransitionToState(FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)) WaitingToComeOnline-->OnlineCallIssued.

    00001808.00002318::2013/05/13-02:46:18.453 INFO [RCM] HandleMonitorReply: ONLINERESOURCE for 'FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)', gen(0) result 997.

    00001808.00002318::2013/05/13-02:46:18.453 INFO [RCM] TransitionToState(FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)) OnlineCallIssued-->OnlinePending.

    00001758.00001c6c::2013/05/13-02:46:18.547 INFO [RES] Physical Disk: HardDiskpScopeShareCallback: Enter resourceName S-OLAF-CLFS2

    00001758.00001c6c::2013/05/13-02:46:18.547 DBG [ClRtl] CaptureShareInfo: Share 0000000000000050, Server 0000000000000056, Path 0000000000000070 SD 0000000000000000 size 268

    00001758.00001c6c::2013/05/13-02:46:18.547 WARN [ClRtl] SsCoreShareAdd(): status = 2118 share = U$ server = S-OLAF-CLFS2

    00001758.00001c6c::2013/05/13-02:46:18.547 DBG [ClRtl] CaptureShareInfo: Share 0000000000000050, Server 0000000000000056, Path 0000000000000070 SD 0000000000000000 size 268

    00001758.00001c6c::2013/05/13-02:46:18.547 INFO [RES] Physical Disk: Attempt to ADD admin share U$ to/from netname S-OLAF-CLFS2, returned 0

    00001758.00002094::2013/05/13-02:46:18.578 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: Shares 'are being scoped to virtual name S-OLAF-CLFS2

    00001758.00002094::2013/05/13-02:46:18.671 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: Successfully added Share ACL with Path u:\Software\ACL9 on server S-OLAF-CLFS2

    00001758.00002094::2013/05/13-02:46:18.687 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: Successfully added Share NOAP_Roaming with Path u:\NOAP_Roaming on server S-OLAF-CLFS2

    00001758.00002094::2013/05/13-02:46:18.718 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: Successfully added Share Software with Path u:\Software on server S-OLAF-CLFS2

    00001758.00002094::2013/05/13-02:46:18.749 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: Successfully added Share tnsnames$ with Path u:\tnsnames on server S-OLAF-CLFS2

    00001758.00002094::2013/05/13-02:46:18.781 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: Successfully added Share TSRoaming with Path u:\Tsroaming on server S-OLAF-CLFS2

    00001758.00002094::2013/05/13-02:46:18.796 INFO [RHS] Resource FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2) has come online. RHS is about to report status change to RCM

    00001808.00002030::2013/05/13-02:46:18.796 INFO [RCM] HandleMonitorReply: ONLINERESOURCE for 'FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)', gen(0) result 0.

    00001808.00002030::2013/05/13-02:46:18.796 INFO [RCM] TransitionToState(FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)) OnlinePending-->Online.

    00001808.00002030::2013/05/13-02:46:18.796 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (S-OLAF-CLFS2, Pending --> Online)

    00001758.00002094::2013/05/13-02:46:18.796 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: FileServerpMigrateSubdirShares: Exit status 0

    00001a14.00000630::2013/05/13-02:46:18.843 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'

    00001808.000021f0::2013/05/13-02:46:18.843 WARN [RCM] Failed to load restype 'MSMQ': error 21.

    00001808.000021f0::2013/05/13-02:46:18.843 WARN [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQ's DLL is not present on this node. Attempting to find a good node...

    00001a14.00000630::2013/05/13-02:46:18.843 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'

    00001808.0000205c::2013/05/13-02:46:18.843 WARN [RCM] Failed to load restype 'MSMQ': error 21.

    00001a14.00000630::2013/05/13-02:46:18.905 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'

    00001808.000021f0::2013/05/13-02:46:18.905 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.

    00001808.000021f0::2013/05/13-02:46:18.905 WARN [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQTriggers's DLL is not present on this node. Attempting to find a good node...

    00001a14.00000630::2013/05/13-02:46:18.905 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'

    00001808.0000205c::2013/05/13-02:46:18.905 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.

    00001808.0000227c::2013/05/13-02:46:22.696 INFO [API] s_ApiGetQuorumResource final status 0.

  • ginolard (5/12/2013)


    1 NIC each only for Public, iSCSI and Heartbeat connections (no more NIC Teaming)

    Under Windows 2008, teaming is fully supported for the public network and the Heartbeat network, never had an issue with it.

    However, as detailed in my article, you must not team the iSCSI network as failover and management of the network(s) must be left to the control of the MPIO\MCS software. Teaming moves control of the network outside of the iSCSI policies.

    ginolard (5/12/2013)


    MPIO uninstalled on all nodes

    unnecessary at this point IMHO

    ginolard (5/12/2013)


    The two disks attached to the HP MSA filer still lose ownership around the same time each day and throw 1038 events.

    This is starting to annoy me now 😉

    Ah now this shows a pattern. The HP filer isn't shared by any chance between multiple systems is it?

    I had an issue recently where a set of SQL Servers with LUNs presented from an HP EVA would generate slow I\O messages from SQL Server and on some occasions the drives would disappear from under the server 😉

    It transpired that a pre prod SQL Server shared the same array with a multi Terrabyte database which was merrily refreshed once a week by a DBA. Everytime he ran the restore the transaction rate on the array went through the roof and beyond the architecural limit of the array.

    Look at the storage and see what else is sharing it, then look at what they do around the time frame you are experiencing.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • We THINK we may have found the issue, we'll know within 24 hours. We noticed that the errors were occuring 2 minutes after the start and end of the "disk scrub" that the MSAs perform every day. This checks for disk defects and takes about 12 hours to complete. I'm guessing that this scrub somehow makes the controllers temporarily fail before recovering.

    The latest firmware has a fix for this issue but, for now, I've disabled the scrub and we'll see if that helps.

  • good work, let us know how you get on

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Not had the issue since disabling the scrub yesterday. That seems pretty conclusive to me.

    Time for a firmware update on the MSA controllers methinks! Thanks for all your advice

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply