Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234»»

Ownership of cluster disk 'Cluster Disk xxx has been unexpectedly lost by this node. Expand / Collapse
Author
Message
Posted Wednesday, May 8, 2013 7:04 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, May 14, 2013 12:40 AM
Points: 7, Visits: 19
Perry Whittle (5/7/2013)
ginolard (5/7/2013)
The iSCSI initiator properties for ALL targets are set to

Local Adapater: Default
Initiator IP: Default
Target Portal IP: Default

Are you saying these values should be changed to this?

Local Adapater: Microsoft iSCSI Initiator
Initiator IP: x.x.x.x
Target Portal IP: x.x.x.x

Yes, see my article at this link


ginolard (5/7/2013)
TI also notice that the binding order is DIFFERENT on each node, God knows what idiot set that up but I guess they should be exactly the same?

The binding order must be the same on all nodes


Nice article thanks. One small question though. In the case (such as ours) where we have TWO iSCSI NICs per host, which IP would you specify?
Post #1450539
Posted Wednesday, May 8, 2013 7:27 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 12:54 PM
Points: 6,282, Visits: 13,538
ginolard (5/8/2013)
Nice article thanks.

You're welcome, I hope you found it useful



ginolard (5/8/2013)
One small question though. In the case (such as ours) where we have TWO iSCSI NICs per host, which IP would you specify?

It's all detailed in the article, first question to ask is

Are you using MCS or MPIO?


You'll essentially have multiple connections\sessions to the exposed LUNs, its the MCS or MPIO driver that makes the multiple connections\sessions per LUN appear as one disk logical disk.


-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1450559
Posted Wednesday, May 8, 2013 7:36 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, May 14, 2013 12:40 AM
Points: 7, Visits: 19
Perry Whittle (5/8/2013)
ginolard (5/8/2013)
Nice article thanks.

You're welcome, I hope you found it useful



ginolard (5/8/2013)
One small question though. In the case (such as ours) where we have TWO iSCSI NICs per host, which IP would you specify?

It's all detailed in the article, first question to ask is

Are you using MCS or MPIO?


You'll essentially have multiple connections\sessions to the exposed LUNs, its the MCS or MPIO driver that makes the multiple connections\sessions per LUN appear as one disk logical disk.


Right, well, the person who installed this cluster (who was fired some months ago) appears to have decided to use BOTH. I plan to uninstall MPIO on Friday because I believe it to be causing our issues. Also, I don't really see the value of having two iSCSI NICS when he hasn't even added a second session to MPIO (as per your article)

The more I look, the more I find how badly this thing has been done

I may consider putting MPIO back but the more pressing issue right now is get the damned thing stable. Besides, the shares will be migrated to a NAS in the coming months anyway
Post #1450565
Posted Wednesday, May 8, 2013 9:08 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 12:54 PM
Points: 6,282, Visits: 13,538
ginolard (5/8/2013)
Right, well, the person who installed this cluster (who was fired some months ago) appears to have decided to use BOTH.

Probably why they were fired
Use either\or never both.


ginolard (5/8/2013)
I plan to uninstall MPIO on Friday because I believe it to be causing our issues. Also, I don't really see the value of having two iSCSI NICS when he hasn't even added a second session to MPIO (as per your article)

Without seeing your setup it would be hard to comment but using the article as a guide, if you check the MCS settings on the disk device what do you see?

On the initiator discovery tab do you see both IPs for the iSCSI target listeners?



ginolard (5/8/2013)
The more I look, the more I find how badly this thing has been done

I may consider putting MPIO back but the more pressing issue right now is get the damned thing stable. Besides, the shares will be migrated to a NAS in the coming months anyway

The differences between MCS and MPIO are discussed in the article


-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1450634
Posted Sunday, May 12, 2013 11:33 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, May 14, 2013 12:40 AM
Points: 7, Visits: 19
Oh well, the intervention solved nothing.

New situation is this

Cluster is now only 2 nodes
1 NIC each only for Public, iSCSI and Heartbeat connections (no more NIC Teaming)
MPIO uninstalled on all nodes

Any existing Persistent Reservations cleared on all disks.

The two disks attached to the HP MSA filer still lose ownership around the same time each day and throw 1038 events.

This is starting to annoy me now ;)
Post #1451972
Posted Monday, May 13, 2013 12:09 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, May 14, 2013 12:40 AM
Points: 7, Visits: 19
If it's any help, these are the entries from the cluster log at the time of the problem.

00001808.00000610::2013/05/13-02:46:04.351 INFO [GUM] Node 2: Processing RequestLock 1:191
00001808.00000610::2013/05/13-02:46:04.351 INFO [GUM] Node 2: Processing GrantLock to 1 (sent by 2 gumid: 978)
00001808.0000205c::2013/05/13-02:46:04.366 INFO [NM] Received request from client address 2002:9ea6:b026::9ea6:b026.
00001808.00000610::2013/05/13-02:46:06.348 INFO [QUORUM] Node 2: quorum is not owned by anyone
00001758.00001f38::2013/05/13-02:46:06.738 ERR [RES] Physical Disk: PNP: Failed to get volume info for disk resource SOP92-V2_V2, status 3
00001808.00001f1c::2013/05/13-02:46:06.753 INFO [GUM] Node 2: Processing RequestLock 2:119
00001808.00000610::2013/05/13-02:46:06.753 INFO [GUM] Node 2: Processing GrantLock to 2 (sent by 1 gumid: 981)
00001808.00002318::2013/05/13-02:46:06.769 WARN [RCM] ResourceControl(SET_PRIVATE_PROPERTIES) to SOP92-V2_V2 returned 5024.
00001758.00002340::2013/05/13-02:46:06.769 INFO [RES] Physical Disk: PNP: HardDiskpSetPnpUpdateTimePropertyWorker: status 0, time 02:46:06.738
00001758.00002368::2013/05/13-02:46:06.847 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00001f24::2013/05/13-02:46:06.878 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00002368::2013/05/13-02:46:06.909 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00001f24::2013/05/13-02:46:06.940 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00001f24::2013/05/13-02:46:06.940 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.00001fd8::2013/05/13-02:46:06.940 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00002368::2013/05/13-02:46:06.972 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00002368::2013/05/13-02:46:06.972 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.0000227c::2013/05/13-02:46:06.972 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00001f24::2013/05/13-02:46:07.003 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00001f24::2013/05/13-02:46:07.003 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.00001f1c::2013/05/13-02:46:07.003 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00002368::2013/05/13-02:46:07.034 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.00001fd8::2013/05/13-02:46:07.034 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001758.00001f24::2013/05/13-02:46:07.065 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.0000227c::2013/05/13-02:46:07.065 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001758.00002368::2013/05/13-02:46:07.096 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.00001f1c::2013/05/13-02:46:07.096 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001758.00002368::2013/05/13-02:46:07.128 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00001f24::2013/05/13-02:46:07.159 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00002368::2013/05/13-02:46:07.190 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00001f24::2013/05/13-02:46:07.221 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00001f24::2013/05/13-02:46:07.221 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.00001fd8::2013/05/13-02:46:07.221 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00002368::2013/05/13-02:46:07.252 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00002368::2013/05/13-02:46:07.252 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.0000227c::2013/05/13-02:46:07.252 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00001f24::2013/05/13-02:46:07.284 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00001f24::2013/05/13-02:46:07.284 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.00001f1c::2013/05/13-02:46:07.284 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00002368::2013/05/13-02:46:07.315 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.00001fd8::2013/05/13-02:46:07.315 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001758.00001f24::2013/05/13-02:46:07.346 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.0000227c::2013/05/13-02:46:07.346 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001808.00000610::2013/05/13-02:46:07.362 INFO [GUM] Node 2: Processing RequestLock 1:194
00001808.00000610::2013/05/13-02:46:07.362 INFO [GUM] Node 2: Processing GrantLock to 1 (sent by 2 gumid: 982)
00001758.00002368::2013/05/13-02:46:07.377 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.00001f1c::2013/05/13-02:46:07.377 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001808.00000610::2013/05/13-02:46:07.377 INFO [QUORUM] Node 2: quorum is arbitrated by node 1
00001758.00001f24::2013/05/13-02:46:07.424 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00002368::2013/05/13-02:46:07.455 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00001f24::2013/05/13-02:46:07.502 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00002368::2013/05/13-02:46:07.518 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00002368::2013/05/13-02:46:07.518 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.00001fd8::2013/05/13-02:46:07.518 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00001f24::2013/05/13-02:46:07.549 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00001f24::2013/05/13-02:46:07.549 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.0000227c::2013/05/13-02:46:07.549 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00001f24::2013/05/13-02:46:07.564 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00001f24::2013/05/13-02:46:07.564 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.0000205c::2013/05/13-02:46:07.564 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00002368::2013/05/13-02:46:07.580 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.00001fd8::2013/05/13-02:46:07.580 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001758.00001f24::2013/05/13-02:46:07.611 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.0000227c::2013/05/13-02:46:07.611 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001758.00002368::2013/05/13-02:46:07.627 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.0000205c::2013/05/13-02:46:07.627 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001758.00001f24::2013/05/13-02:46:07.674 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00002368::2013/05/13-02:46:07.720 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00001f24::2013/05/13-02:46:07.767 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00002368::2013/05/13-02:46:07.798 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00002368::2013/05/13-02:46:07.798 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.00001fd8::2013/05/13-02:46:07.798 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00001f24::2013/05/13-02:46:07.814 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00001f24::2013/05/13-02:46:07.814 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.0000227c::2013/05/13-02:46:07.814 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00001f24::2013/05/13-02:46:07.830 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00001f24::2013/05/13-02:46:07.830 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.0000205c::2013/05/13-02:46:07.830 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00002368::2013/05/13-02:46:07.861 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.00001fd8::2013/05/13-02:46:07.861 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001758.00001f24::2013/05/13-02:46:07.876 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.000023ec::2013/05/13-02:46:07.876 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001758.00002368::2013/05/13-02:46:07.892 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.0000205c::2013/05/13-02:46:07.892 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001758.00001f24::2013/05/13-02:46:07.923 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00002368::2013/05/13-02:46:07.923 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00002368::2013/05/13-02:46:07.923 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.0000205c::2013/05/13-02:46:07.923 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00001f24::2013/05/13-02:46:07.923 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.0000205c::2013/05/13-02:46:07.923 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001758.00002368::2013/05/13-02:46:07.939 ERR [RES] Physical Disk: Failed to get partition size, status 3
00001758.00001f24::2013/05/13-02:46:07.939 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk7\Partition2, status 0xc0000034
00001758.00001f24::2013/05/13-02:46:07.939 ERR [RHS] Error 2 from ResourceControl for resource SOP92-V2_V2.
00001808.0000205c::2013/05/13-02:46:07.939 WARN [RCM] ResourceControl(STORAGE_GET_DIRTY) to SOP92-V2_V2 returned 2.
00001758.00002368::2013/05/13-02:46:07.939 ERR [RHS] Error 3 from ResourceControl for resource SOP92-V2_V2.
00001808.0000205c::2013/05/13-02:46:07.939 WARN [RCM] ResourceControl(STORAGE_GET_MOUNTPOINTS) to SOP92-V2_V2 returned 3.
00001a14.00001f44::2013/05/13-02:46:08.641 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00001808.0000205c::2013/05/13-02:46:08.641 WARN [RCM] Failed to load restype 'MSMQ': error 21.
00001a14.00001f44::2013/05/13-02:46:08.656 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00001808.0000205c::2013/05/13-02:46:08.656 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.
00001a14.00001f44::2013/05/13-02:46:09.951 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00001808.0000205c::2013/05/13-02:46:09.951 WARN [RCM] Failed to load restype 'MSMQ': error 21.
00001a14.00001f44::2013/05/13-02:46:09.967 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00001808.0000205c::2013/05/13-02:46:09.967 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.
00001758.00001f24::2013/05/13-02:46:10.201 ERR [RES] Physical Disk <SOP92-V2_V2>: IsAlive sanity check failed!, pending IO completed with status 1235.
00001758.00001f24::2013/05/13-02:46:10.201 ERR [RES] Physical Disk <SOP92-V2_V2>: IsAlive sanity check failed!, pending IO completed with status 1235.
00001758.00001f24::2013/05/13-02:46:10.201 WARN [RHS] Resource SOP92-V2_V2 IsAlive has indicated failure.
00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'SOP92-V2_V2', gen(5) result 1.
00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] TransitionToState(SOP92-V2_V2) Online-->ProcessingFailure.
00001808.0000205c::2013/05/13-02:46:10.201 ERR [RCM] rcm::RcmResource::HandleFailure: (SOP92-V2_V2)
00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] resource SOP92-V2_V2: failure count: 2, restartAction: 2.
00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] Greater than restartPeriod time has elapsed since first failure, resetting failureTime and failureCount.
00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] Will restart resource in 500 milliseconds.
00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] TransitionToState(SOP92-V2_V2) ProcessingFailure-->[WaitingToTerminate to DelayRestartingResource].
00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (S-OLAF-CLFS2, Online --> Pending)
00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] TransitionToState(FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)) Online-->[WaitingToTerminate to OnlineCallIssued].
00001808.0000205c::2013/05/13-02:46:10.201 INFO [RCM] TransitionToState(FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)) [WaitingToTerminate to OnlineCallIssued]-->[Terminating to OnlineCallIssued].
00001758.00001f24::2013/05/13-02:46:10.201 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: FileServerDoTerminate: Terminate called... !!!
00001808.000021c0::2013/05/13-02:46:10.201 INFO [RCM] HandleMonitorReply: TERMINATERESOURCE for 'FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)', gen(0) result 0.
00001808.000021c0::2013/05/13-02:46:10.201 INFO [RCM] Restarting resource 'FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)'.
00001808.000021c0::2013/05/13-02:46:10.201 INFO [RCM] TransitionToState(SOP92-V2_V2) [WaitingToTerminate to DelayRestartingResource]-->[Terminating to DelayRestartingResource].
00001758.00001f24::2013/05/13-02:46:10.201 INFO [RES] Physical Disk <SOP92-V2_V2>: Terminate request.
00001758.00001f24::2013/05/13-02:46:10.201 WARN [RES] Physical Disk <SOP92-V2_V2>: Terminate: Failed to open device \Device\Harddisk7\Partition2, Error 2
00001758.00001f24::2013/05/13-02:46:10.263 INFO [RES] Physical Disk <SOP92-V2_V2>: HardDiskpCloseSVIHandles: Exit
00001758.00001f24::2013/05/13-02:46:10.263 ERR [RES] Physical Disk: ReleaseDisk: stop reserve failed on device 7 (sig a37d7c82), status 1168
00001758.00001f24::2013/05/13-02:46:10.263 ERR [RES] Physical Disk <SOP92-V2_V2>: Terminate: Failed to release disk \Device\Harddisk7\Partition0, Error 1168
00001808.00001f1c::2013/05/13-02:46:10.263 INFO [RCM] HandleMonitorReply: TERMINATERESOURCE for 'SOP92-V2_V2', gen(6) result 0.
00001808.00001f1c::2013/05/13-02:46:10.263 INFO [RCM] TransitionToState(SOP92-V2_V2) [Terminating to DelayRestartingResource]-->DelayRestartingResource.
00001808.000021c0::2013/05/13-02:46:10.778 INFO [RCM] Delay-restarting SOP92-V2_V2 and any waiting dependents.
00001808.000021c0::2013/05/13-02:46:10.778 INFO [RCM] TransitionToState(SOP92-V2_V2) DelayRestartingResource-->OnlineCallIssued.
00001758.00002368::2013/05/13-02:46:10.778 INFO [RES] Physical Disk <SOP92-V2_V2>: Online request.
00001808.00001fd8::2013/05/13-02:46:10.778 INFO [RCM] HandleMonitorReply: ONLINERESOURCE for 'SOP92-V2_V2', gen(6) result 997.
00001808.00001fd8::2013/05/13-02:46:10.778 INFO [RCM] TransitionToState(SOP92-V2_V2) OnlineCallIssued-->OnlinePending.
00001758.00001c6c::2013/05/13-02:46:10.778 INFO [RES] Physical Disk <SOP92-V2_V2>: Arbitrate request: FastPath 0
00001758.00001c6c::2013/05/13-02:46:10.778 INFO [RES] Physical Disk: Enter EnumerateDevices: EnumDevice 0
00001758.00001c6c::2013/05/13-02:46:10.778 INFO [RES] Physical Disk: Exit EnumerateDevices: status 0
00001758.00001c6c::2013/05/13-02:46:10.903 INFO [RES] Physical Disk: Successful reserve, disk is unowned, key 4bdf7466734d
00001758.00001c6c::2013/05/13-02:46:10.903 INFO [RES] Physical Disk: Update disk props returns 0
00001808.00001fd8::2013/05/13-02:46:10.903 INFO [GUM] Node 2: Processing RequestLock 2:120
00001758.00001c6c::2013/05/13-02:46:10.903 INFO [RES] Physical Disk <SOP92-V2_V2>: Disk is offline
00001808.00000610::2013/05/13-02:46:10.903 INFO [GUM] Node 2: Processing GrantLock to 2 (sent by 1 gumid: 983)
00001758.00000748::2013/05/13-02:46:10.950 INFO [RES] Physical Disk <SOP92-V2_V2>: Completed HardDiskpUpdateRegistryWorker (start time 02:46:10.903)
00001758.00001c6c::2013/05/13-02:46:11.028 WARN [RES] Physical Disk <SOP92-V2_V2>: OnlineThread: Failed to get volume guid for device \\?\GLOBALROOT\Device\Harddisk7\Partition2\. Error 3
00001758.0000060c::2013/05/13-02:46:11.090 INFO [RES] Physical Disk: PNP: Adding volume \\?\STORAGE#Volume#{9cf114d6-e555-11e1-b63f-002655d34435}#0000000008100000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b}
00001758.0000060c::2013/05/13-02:46:11.090 INFO [RES] Physical Disk: PNP: Add Volume exit, status 0
00001a14.00001f44::2013/05/13-02:46:11.246 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00001808.000023ec::2013/05/13-02:46:11.246 WARN [RCM] Failed to load restype 'MSMQ': error 21.
00001a14.00001f44::2013/05/13-02:46:11.262 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00001808.000023ec::2013/05/13-02:46:11.262 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.
00001a14.00001f44::2013/05/13-02:46:14.584 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00001808.00002318::2013/05/13-02:46:14.584 WARN [RCM] Failed to load restype 'MSMQ': error 21.
00001a14.00001f44::2013/05/13-02:46:14.600 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00001808.00002318::2013/05/13-02:46:14.600 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.
00001758.00001c6c::2013/05/13-02:46:15.037 WARN [RES] Physical Disk <SOP92-V2_V2>: OnlineThread: Failed to set volguid \??\Volume{7eddaf8a-6f3d-4a2b-9d06-caa144c224d7}. Error: 183.
00001758.00001c6c::2013/05/13-02:46:15.037 INFO [RES] Physical Disk <SOP92-V2_V2>: Found 2 mount points for device \Device\Harddisk7\Partition2
00001758.00001c6c::2013/05/13-02:46:18.437 INFO [RES] Physical Disk <SOP92-V2_V2>: VolumeIsNtfs: Volume \\?\GLOBALROOT\Device\Harddisk7\Partition2\ has FS type NTFS
00001758.00001c6c::2013/05/13-02:46:18.437 INFO [RES] Physical Disk: Volume \\?\GLOBALROOT\Device\Harddisk7\Partition2\ has FS type NTFS
00001758.00001c6c::2013/05/13-02:46:18.437 INFO [RES] Physical Disk: MountPoint U:\ points to volume \\?\Volume{7eddaf8a-6f3d-4a2b-9d06-caa144c224d7}\
00001758.00001c6c::2013/05/13-02:46:18.437 DBG [ClRtl] CaptureShareInfo: Share 0000000000000050, Server 0000000000000000, Path 0000000000000056 SD 0000000000000000 size 242
00001758.00001c6c::2013/05/13-02:46:18.437 WARN [ClRtl] SsCoreShareAdd(): status = 2118 share = U$ server = (null)
00001758.00001c6c::2013/05/13-02:46:18.437 DBG [ClRtl] CaptureShareInfo: Share 0000000000000050, Server 0000000000000000, Path 0000000000000056 SD 0000000000000000 size 242
00001758.00001c6c::2013/05/13-02:46:18.437 INFO [RHS] Resource SOP92-V2_V2 has come online. RHS is about to report status change to RCM
00001808.00001f1c::2013/05/13-02:46:18.437 INFO [RCM] HandleMonitorReply: ONLINERESOURCE for 'SOP92-V2_V2', gen(6) result 0.
00001808.00001f1c::2013/05/13-02:46:18.437 INFO [RCM] TransitionToState(SOP92-V2_V2) OnlinePending-->Online.
00001758.00001c6c::2013/05/13-02:46:18.437 INFO [RES] Physical Disk: DriveLetter mask: 0x100000
00001808.00002318::2013/05/13-02:46:18.437 INFO [RCM] TransitionToState(FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)) WaitingToComeOnline-->OnlineCallIssued.
00001808.00002318::2013/05/13-02:46:18.453 INFO [RCM] HandleMonitorReply: ONLINERESOURCE for 'FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)', gen(0) result 997.
00001808.00002318::2013/05/13-02:46:18.453 INFO [RCM] TransitionToState(FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)) OnlineCallIssued-->OnlinePending.
00001758.00001c6c::2013/05/13-02:46:18.547 INFO [RES] Physical Disk: HardDiskpScopeShareCallback: Enter resourceName S-OLAF-CLFS2
00001758.00001c6c::2013/05/13-02:46:18.547 DBG [ClRtl] CaptureShareInfo: Share 0000000000000050, Server 0000000000000056, Path 0000000000000070 SD 0000000000000000 size 268
00001758.00001c6c::2013/05/13-02:46:18.547 WARN [ClRtl] SsCoreShareAdd(): status = 2118 share = U$ server = S-OLAF-CLFS2
00001758.00001c6c::2013/05/13-02:46:18.547 DBG [ClRtl] CaptureShareInfo: Share 0000000000000050, Server 0000000000000056, Path 0000000000000070 SD 0000000000000000 size 268
00001758.00001c6c::2013/05/13-02:46:18.547 INFO [RES] Physical Disk: Attempt to ADD admin share U$ to/from netname S-OLAF-CLFS2, returned 0
00001758.00002094::2013/05/13-02:46:18.578 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: Shares 'are being scoped to virtual name S-OLAF-CLFS2
00001758.00002094::2013/05/13-02:46:18.671 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: Successfully added Share ACL with Path u:\Software\ACL9 on server S-OLAF-CLFS2
00001758.00002094::2013/05/13-02:46:18.687 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: Successfully added Share NOAP_Roaming with Path u:\NOAP_Roaming on server S-OLAF-CLFS2
00001758.00002094::2013/05/13-02:46:18.718 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: Successfully added Share Software with Path u:\Software on server S-OLAF-CLFS2
00001758.00002094::2013/05/13-02:46:18.749 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: Successfully added Share tnsnames$ with Path u:\tnsnames on server S-OLAF-CLFS2
00001758.00002094::2013/05/13-02:46:18.781 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: Successfully added Share TSRoaming with Path u:\Tsroaming on server S-OLAF-CLFS2
00001758.00002094::2013/05/13-02:46:18.796 INFO [RHS] Resource FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2) has come online. RHS is about to report status change to RCM
00001808.00002030::2013/05/13-02:46:18.796 INFO [RCM] HandleMonitorReply: ONLINERESOURCE for 'FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)', gen(0) result 0.
00001808.00002030::2013/05/13-02:46:18.796 INFO [RCM] TransitionToState(FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)) OnlinePending-->Online.
00001808.00002030::2013/05/13-02:46:18.796 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (S-OLAF-CLFS2, Pending --> Online)
00001758.00002094::2013/05/13-02:46:18.796 INFO [RES] File Server <FileServer-(S-OLAF-CLFS2)(SOP92-V2_V2)>: FileServerpMigrateSubdirShares: Exit status 0
00001a14.00000630::2013/05/13-02:46:18.843 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00001808.000021f0::2013/05/13-02:46:18.843 WARN [RCM] Failed to load restype 'MSMQ': error 21.
00001808.000021f0::2013/05/13-02:46:18.843 WARN [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQ's DLL is not present on this node. Attempting to find a good node...
00001a14.00000630::2013/05/13-02:46:18.843 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00001808.0000205c::2013/05/13-02:46:18.843 WARN [RCM] Failed to load restype 'MSMQ': error 21.
00001a14.00000630::2013/05/13-02:46:18.905 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00001808.000021f0::2013/05/13-02:46:18.905 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.
00001808.000021f0::2013/05/13-02:46:18.905 WARN [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQTriggers's DLL is not present on this node. Attempting to find a good node...
00001a14.00000630::2013/05/13-02:46:18.905 ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00001808.0000205c::2013/05/13-02:46:18.905 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.
00001808.0000227c::2013/05/13-02:46:22.696 INFO [API] s_ApiGetQuorumResource final status 0.
Post #1451980
Posted Monday, May 13, 2013 5:19 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 12:54 PM
Points: 6,282, Visits: 13,538
ginolard (5/12/2013)
1 NIC each only for Public, iSCSI and Heartbeat connections (no more NIC Teaming)

Under Windows 2008, teaming is fully supported for the public network and the Heartbeat network, never had an issue with it.

However, as detailed in my article, you must not team the iSCSI network as failover and management of the network(s) must be left to the control of the MPIO\MCS software. Teaming moves control of the network outside of the iSCSI policies.



ginolard (5/12/2013)
MPIO uninstalled on all nodes

unnecessary at this point IMHO



ginolard (5/12/2013)
The two disks attached to the HP MSA filer still lose ownership around the same time each day and throw 1038 events.

This is starting to annoy me now ;)

Ah now this shows a pattern. The HP filer isn't shared by any chance between multiple systems is it?

I had an issue recently where a set of SQL Servers with LUNs presented from an HP EVA would generate slow I\O messages from SQL Server and on some occasions the drives would disappear from under the server

It transpired that a pre prod SQL Server shared the same array with a multi Terrabyte database which was merrily refreshed once a week by a DBA. Everytime he ran the restore the transaction rate on the array went through the roof and beyond the architecural limit of the array.

Look at the storage and see what else is sharing it, then look at what they do around the time frame you are experiencing.


-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1452049
Posted Monday, May 13, 2013 7:12 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, May 14, 2013 12:40 AM
Points: 7, Visits: 19
We THINK we may have found the issue, we'll know within 24 hours. We noticed that the errors were occuring 2 minutes after the start and end of the "disk scrub" that the MSAs perform every day. This checks for disk defects and takes about 12 hours to complete. I'm guessing that this scrub somehow makes the controllers temporarily fail before recovering.

The latest firmware has a fix for this issue but, for now, I've disabled the scrub and we'll see if that helps.
Post #1452086
Posted Monday, May 13, 2013 7:58 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 12:54 PM
Points: 6,282, Visits: 13,538
good work, let us know how you get on

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1452114
Posted Tuesday, May 14, 2013 12:41 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, May 14, 2013 12:40 AM
Points: 7, Visits: 19
Not had the issue since disabling the scrub yesterday. That seems pretty conclusive to me.

Time for a firmware update on the MSA controllers methinks! Thanks for all your advice
Post #1452403
« Prev Topic | Next Topic »

Add to briefcase «««1234»»

Permissions Expand / Collapse