Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

Ownership of cluster disk 'Cluster Disk xxx has been unexpectedly lost by this node. Expand / Collapse
Author
Message
Posted Tuesday, April 23, 2013 8:09 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, September 25, 2014 9:42 AM
Points: 400, Visits: 1,288
My Cluster went down again. I don't have to say... I am having a not so good morning already ... :-(

Here's the Cluster's error:


Ownership of cluster disk 'Cluster Disk xxx' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration.


This looks to me as a Os or SAN error. The LUN is gone, then SQL goes down. Now, following our SAN admin's advice, we did apply this patch: http://support.microsoft.com/?id=2718576 ... but does not look like it resolved the main issue.

We just started having this issue few weeks ago. But it was running fine for a two month period, maybe a bit more but with less workload.

Has someone experienced this problem before?
Post #1445450
Posted Tuesday, April 23, 2013 10:21 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 10:29 AM
Points: 6,420, Visits: 13,810
are you using iSCSI attached storage and MPIO?
Can you provide a little more info on the storage and the connectivity from the nodes?


-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1445562
Posted Tuesday, April 23, 2013 11:03 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, September 25, 2014 9:42 AM
Points: 400, Visits: 1,288
Hi Perry,

The shared storage is a Dell Compellent SC8000 SAN, connected via iSCSI / MPIO to both nodes. The Windows Cluster runs on Win2008R2 SP1. MS-SQL runs SQL2012 Standard.

I also found this error on Windows log:


Connection to the target was lost. The initiator will attempt to retry the connection.


It clearly looks like an iSCSI / MPIO issue. On both incidents, the iSCSI mapping got lost, then SQL went down.

Our SAN expert advice is remove MPIO ???? ... but I've helped configuring and deploying dozens of SQL Clusters before with MPIO, and this is the 1st time I see this problem. Moreover, I believe removing MPIO will create Cluster validation issues and data corruptions.
Post #1445586
Posted Wednesday, April 24, 2013 10:47 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 10:29 AM
Points: 6,420, Visits: 13,810
sql-lover (4/23/2013)
Hi Perry,

The shared storage is a Dell Compellent SC8000 SAN, connected via iSCSI / MPIO to both nodes. The Windows Cluster runs on Win2008R2 SP1. MS-SQL runs SQL2012 Standard.

I'm assuming you're using the Microsoft iscsi initiator?
Are you using the default MPIO driver or a Dell DSM?
If the MS driver what policy are you using?



sql-lover (4/23/2013)
Our SAN expert advice is remove MPIO ???? ....

Some expert huh?
Without multi pathing things could be a whole lot worse.


-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1446118
Posted Wednesday, April 24, 2013 11:09 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, September 25, 2014 9:42 AM
Points: 400, Visits: 1,288
The MCS policy was set to "round robin".

Now, I do believe we are using the default Microsoft MPIO driver, but where can I check than on Windows and confirm? I do not remember where ...

Also, forgot to mention and I actually was not aware about this until yesterday, we do not have two switches but only one and both nodes are connected to same switch. That actually defeats part of MPIO purpose, I think. Not sure why our IT resource made it that way.
Post #1446125
Posted Wednesday, April 24, 2013 2:21 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 10:29 AM
Points: 6,420, Visits: 13,810
sql-lover (4/24/2013)
The MCS policy was set to "round robin".

You use either MPIO or MCS not both, so, are you using MCS or MPIO?
MCS is specific to the Microsoft iSCSI Initiator and comprises single session\multiple connection.
MPIO uses multiple sessions.
For more info on iSCSI see my article at this link.


sql-lover (4/24/2013)
The MCS policy was set to "round robin". Now, I do believe we are using the default Microsoft MPIO driver, but where can I check than on Windows and confirm? I do not remember where ...

Open the Microsoft iSCSI Initiator console, select the disk device and open the properties. You should see the MPIO button which will open the MPIO properties to view\change.



sql-lover (4/24/2013)
Also, forgot to mention and I actually was not aware about this until yesterday, we do not have two switches but only one and both nodes are connected to same switch. That actually defeats part of MPIO purpose, I think. Not sure why our IT resource made it that way.

When using storage multi pathing one would sort of hope that the hardware would be in place to support the topology otherwise a switch hardware failure will leave MPIO redundant!!
You should have more than 2 switches for your iSCSI network. A typical topolgy would have at least 2 core switches with edge switches feeding off these to provide multiple redundant paths down to your storage. This is all detailed in my article linked above.

The whole point of multi pathing is to allow Windows server to host highly available local SAN disks otherwise the OS would see the multiple paths as separate disk devices, which they are not.

With 10GBoe available you're exceeding the capabilities of a standard FC setup


-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1446222
Posted Wednesday, April 24, 2013 2:37 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, September 25, 2014 9:42 AM
Points: 400, Visits: 1,288
Perry Whittle (4/24/2013)

When using storage multi pathing one would sort of hope that the hardware would be in place to support the topology otherwise a switch hardware failure will leave MPIO redundant!!

You should have more than 2 switches for your iSCSI network. A typical topolgy would have at least 2 core switches with edge switches feeding off these to provide multiple redundant paths down to your storage. This is all detailed in my article linked above.

The whole point of multi pathing is to allow Windows server to host highly available local SAN disks otherwise the OS would see the multiple paths as separate disk devices, which they are not.

With 10GBoe available you're exceeding the capabilities of a standard FC setup


You are correct and I understand that! It has been very difficult to explain and support my arguments though. I've been questioned a lot (knowing this by experience) and it is really FRUSTRATING!

Anyway, I appreciate the follow up. I can check those other settings you mention, I'll post once I get that ...
Post #1446231
Posted Wednesday, April 24, 2013 2:41 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 10:29 AM
Points: 6,420, Visits: 13,810
sql-lover (4/24/2013)
It has been very difficult to explain and support my arguments though. I've been questioned a lot (knowing this by experience) and it is really FRUSTRATING!

point them to my article


-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1446235
Posted Thursday, May 2, 2013 2:11 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, September 25, 2014 9:42 AM
Points: 400, Visits: 1,288
Just in case someone else is reading this thread and face a similar issue.

Our IT guy / SAN expert contacted Microsoft. He had a meeting with them and the Microsoft engineer revised the whole Cluster implementation. He did not find anything wrong on MS-SQL and its configuration but suggested these two Os changes:

-Change the network binding order. Put HeartBeat second and SAN last (SAN was 2nd and heartbeat the last one)
-Assign fix IP values on the iSCSI initiator properties

While I was absent during the meeting, I do not understand the 1st suggestion. It is usually how I setup my Cluster implementations. I'll give a try to the second one though.
Post #1448973
Posted Friday, May 3, 2013 12:18 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 10:29 AM
Points: 6,420, Visits: 13,810
sql-lover (5/2/2013)
-Change the network binding order. Put HeartBeat second and SAN last (SAN was 2nd and heartbeat the last one)

I can sort of see why but I don't see this is to relevant, the cluster communication can still take place over the public network (the default setting)


sql-lover (5/2/2013)
-Assign fix IP values on the iSCSI initiator properties

Now this is relevant, your heartbeat and iscsi adapters should be set to not register themselves in DNS. Always provide fixed IP details to the initiator disk device connection to ensure the correct adapters are bound. My article linked above details this.


-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1449057
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse