SQL Cluster shutdown issue due to network issue

  • Hi All,

    I have faced an issue yesterday on our SQL Server 2008 SP1 cluster. It got shut down due to network issue. In HA, the cluster should failover to another node if there is any issue. When the private network is not down, why the cluster service went down. The error reported in Event Viewer and Cluster events are 1205, 1069, 1077, 1126, 1127, 1129 and 1130.

    Please help me to understand.

    Thanks in advance.

    Regards

    S Govindarajan

  • Check your windows application and system logs, collate this info with the cluster events log. Can you post details of any errors reported, failure of the public network will initiate a failover.

    Public network will usually allow cluster and user traffic.

    Private network will usually only allow cluster communications.

    If the private network goes down it can still communicate on the public network.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Network issues can make clusters inaccessible, prevent failover, or trigger failovers than shouldn't happen, but they shouln't cuase the cluster instance or SQL server to completely shut down.

  • Hi All,

    Thanks for your reply.

    Even i have the same understanding that the sql server should not go down due to network issues whether it is private or public. When i was going thru the event logs, i found event id 1077

    - System

    - Provider

    [ Name] Microsoft-Windows-FailoverClustering

    [ Guid] {baf908ea-3421-4ca9-9b84-6689b8c6f85f}

    EventID 1077

    Version 0

    Level 2

    Task 20

    Opcode 0

    Keywords 0x8000000000000000

    - TimeCreated

    [ SystemTime] 2012-11-07T09:25:16.053Z

    EventRecordID 45227

    Correlation

    - Execution

    [ ProcessID] 5232

    [ ThreadID] 28880

    Channel System

    Computer NODE1.COMPANY.COM

    - Security

    [ UserID] S-1-5-18

    - EventData

    ResourceName SQL IP Address 1 (SQLCLUSTER01)

    IPAddress 192.168.101.154

    Status 1117

    Health check for IP interface 'SQL IP Address 1 (SQLCLUSTER01)' (address '192.168.101.154 ') failed (status is '1117'). Run the Validate a Configuration wizard to ensure that the network adapter is functioning properly.

    When i searched in google, i found that "failed (status is '1117')." is ERROR_IO_DEVICE error. On the other hand, i could not find any logs related to this in system logs and applications logs.

    Help me to find out how i can find the root cause for this message.

    Thanks in advance.

    Regards

    S Govindarajan

  • govindarajan69 (11/8/2012)


    Even i have the same understanding that the sql server should not go down due to network issues whether it is private or public.

    No, stop, this is incorrect.

    Kindly re read my post above, failure of the public network will initiate a failover of the instance.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Hi Perry,

    if you are saying the public network will initiate a failover of the instance, will it shut down the cluster itself ? or will it failover to the available node ? When connection to the SAN Storage & Quorum disk are intact, then why it is happening.

    Regards

    Govind

  • It should try to start on an available node, if theres an issue with the available node too then the group will stay offline after a certain amount of retries

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Thanks Perrry. Infact thats what my understanding after this incident. It might have been costlier though.

    Regards

    Govind

  • You can change the actions that are taken during a failover but the default is to try online locally then fail over and try 3 times then offline if unsuccessful.

    It sounds like you had a public network outage affecting both nodes, ensure they're not plugged into the same switch 😉

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply