How does sql server FCI respond to failure?

  • Taking windows role default settings - (n-1) failovers ok within 6 hrs , where n=# of nodes, if nodes = 2, 1 failover within 6 hrs is allowed.

    Taking resource default settings - Within 15mins period, 1 restart is allowed on the same node, else failover, RetryPeriodOnFailure is 1 hr

     

    Taking a example of a simple 2 node cluster (node1 & Node2), single sqlFCI (sql), say sql is on node 1 at 9AM.

    9AM - sql fails - attempts to restart on Node1 - fails - failsover to Node2 - attempts to restart on Node2 - fails - stays failed.

    Now where I am confused is, https://support.microsoft.com/en-us/help/947712/failover-cluster-resource-recovery-behavior-in-windows-server-2008 , "If there is no intervention and the resource remains in the failed state for 60 minutes, Windows Server tries to bring the resource online again. "

    So, according to above, after 60mins, say sometime after 10AM - Does sql try a cycle again :

    Attempts sql restart on Node2 - fails - (but cannot failover as 6hrs have not passed)

    11AM- Attempts sql restart on Node2 - fails - (but cannot failover as 6hrs have not passed)

    12PM - Attempts sql restart on Node2 - fails - (but cannot failover as 6hrs have not passed)

    1PM - Attempts sql restart on Node2 - fails - (but cannot failover as 6hrs have not passed)

    2PM - Attempts sql restart on Node2 - fails - (but cannot failover as 6hrs have not passed)

    3PM - Attempts sql restart on Node2 - fails - failsover to node1 as 6 hrs have passed, tries coming online there , and so on.. is this how it happens? Thanks.

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply