Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

SQL Cluster shutdown issue due to network issue Expand / Collapse
Author
Message
Posted Wednesday, November 7, 2012 10:56 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, June 16, 2014 12:00 AM
Points: 18, Visits: 192
Hi All,

I have faced an issue yesterday on our SQL Server 2008 SP1 cluster. It got shut down due to network issue. In HA, the cluster should failover to another node if there is any issue. When the private network is not down, why the cluster service went down. The error reported in Event Viewer and Cluster events are 1205, 1069, 1077, 1126, 1127, 1129 and 1130.

Please help me to understand.

Thanks in advance.

Regards
S Govindarajan
Post #1382262
Posted Thursday, November 8, 2012 4:59 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 7:02 AM
Points: 6,180, Visits: 13,327
Check your windows application and system logs, collate this info with the cluster events log. Can you post details of any errors reported, failure of the public network will initiate a failover.

Public network will usually allow cluster and user traffic.
Private network will usually only allow cluster communications.

If the private network goes down it can still communicate on the public network.


-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1382420
Posted Thursday, November 8, 2012 9:26 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 2:35 PM
Points: 529, Visits: 1,566
Network issues can make clusters inaccessible, prevent failover, or trigger failovers than shouldn't happen, but they shouln't cuase the cluster instance or SQL server to completely shut down.
Post #1382573
Posted Thursday, November 8, 2012 9:21 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, June 16, 2014 12:00 AM
Points: 18, Visits: 192
Hi All,

Thanks for your reply.

Even i have the same understanding that the sql server should not go down due to network issues whether it is private or public. When i was going thru the event logs, i found event id 1077

- System
- Provider
[ Name] Microsoft-Windows-FailoverClustering
[ Guid] {baf908ea-3421-4ca9-9b84-6689b8c6f85f}
EventID 1077
Version 0
Level 2
Task 20
Opcode 0
Keywords 0x8000000000000000
- TimeCreated
[ SystemTime] 2012-11-07T09:25:16.053Z
EventRecordID 45227
Correlation
- Execution
[ ProcessID] 5232
[ ThreadID] 28880
Channel System
Computer NODE1.COMPANY.COM
- Security
[ UserID] S-1-5-18
- EventData
ResourceName SQL IP Address 1 (SQLCLUSTER01)
IPAddress 192.168.101.154
Status 1117
Health check for IP interface 'SQL IP Address 1 (SQLCLUSTER01)' (address '192.168.101.154 ') failed (status is '1117'). Run the Validate a Configuration wizard to ensure that the network adapter is functioning properly.

When i searched in google, i found that "failed (status is '1117')." is ERROR_IO_DEVICE error. On the other hand, i could not find any logs related to this in system logs and applications logs.

Help me to find out how i can find the root cause for this message.

Thanks in advance.

Regards
S Govindarajan
Post #1382808
Posted Thursday, November 8, 2012 11:45 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 7:02 AM
Points: 6,180, Visits: 13,327
govindarajan69 (11/8/2012)
Even i have the same understanding that the sql server should not go down due to network issues whether it is private or public.

No, stop, this is incorrect.
Kindly re read my post above, failure of the public network will initiate a failover of the instance.


-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1382835
Posted Friday, November 9, 2012 3:00 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, June 16, 2014 12:00 AM
Points: 18, Visits: 192
Hi Perry,

if you are saying the public network will initiate a failover of the instance, will it shut down the cluster itself ? or will it failover to the available node ? When connection to the SAN Storage & Quorum disk are intact, then why it is happening.

Regards
Govind
Post #1382910
Posted Friday, November 9, 2012 3:54 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 7:02 AM
Points: 6,180, Visits: 13,327
It should try to start on an available node, if theres an issue with the available node too then the group will stay offline after a certain amount of retries

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1382930
Posted Friday, November 9, 2012 9:16 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, June 16, 2014 12:00 AM
Points: 18, Visits: 192
Thanks Perrry. Infact thats what my understanding after this incident. It might have been costlier though.

Regards
Govind
Post #1383082
Posted Friday, November 9, 2012 9:34 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 7:02 AM
Points: 6,180, Visits: 13,327
You can change the actions that are taken during a failover but the default is to try online locally then fail over and try 3 times then offline if unsuccessful.

It sounds like you had a public network outage affecting both nodes, ensure they're not plugged into the same switch


-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1383096
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse