New Cluster: Fail Over Test Breaks Connections

  • We have set up a two node active-active cluster utilizing sql2008 r2. Regarding SQL clustering, I am a complete layman. This is my very first exposure to it.

    To test it's behavior during a fail over, I ran a script that writes to a data to a table every tenth of a second. With that code running we downed the primary node. The cluster did, in fact, fail over to the second node; however, my query failed the very moment the node went down.

    The failure message was:

    Msg 64, Level 20, State 0, Line 0

    A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available.)

    Seconds later, with one node still down, I was able to reconnect just fine. My question is whether there is any way to avoid this hiccup. My initial assumption was that any fail over would be a little more seamless.

    What have I overlooked?

    Thanks much,

    Mike

  • Mike

    It's not seamless, I'm afraid. During a failover, the services stop on one node and start up again on another. Any open connections are therefore killed.

    John

  • With SQL Clustering, on a failover the connections are in fact lost unfortunately. The SQL service is actually restarted on the new server. So you will see a few seconds downtime depending on how quickly your databases run recovery.

    The advantage of clustering is automatic failover to the backup hardware, but the downside is the loss of and rollback of connections.

    To achieve seemless failover, you would have to implement Database mirroring, which also requires a duplicate of the database files (not just the server hardware).

    If you check in the SQL Logs, you should actually be able to see the shutdown and starting up of the sql server.

  • <whoops... double post> Apologies.

  • Thank you for the quick responses! Very helpful.

    I guess my assumption that an active-active cluster would avoid that problem is unfounded. 😀

  • plaid_draco (1/31/2012)


    Thank you for the quick responses! Very helpful.

    I guess my assumption that an active-active cluster would avoid that problem is unfounded. 😀

    Well, there are a couple of things wrong with this assumption. First, you are assuming that active/active means the same database on both nodes is accessed. That is not correct - active/active just means you have an instance of SQL Server and it's associated databases on one node, and another instance of SQL Server on the other node with it's own associated databases.

    Second, in SQL Server failover of any instance is a restart of the services on the other node. Any in-process transactions would be disconnected and on startup would be rolled back through the startup procedure. This is done using the transaction log by rolling back any transactions that were not completed.

    If you really need that flexibility - you would need to go with either Oracle RAC which is the only one I know of that maintains state on in-process transactions.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • SQL_Giant (1/31/2012)


    To achieve seemless failover, you would have to implement Database mirroring

    Mirroring does not guarantee to provide seamless failover, in fact, a mirror session can take sometime to failover depending on the size of its redo queue!

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Hi,

    i just set up a 2 node failover clustering for SQL SERVER 2008 on Windows Server 2008 R2 Enterprise Edition,

    previously i set up using 1 NIC on each node,

    the failover went successfully when i tested with unplugged the network cable on the active node ( during the failover, the connection to database was cut off)

    then i added 1 more NIC to each node ( i was aware about the warning of the single point of failure if we are using only one NIC).

    the question is, how we configured this secondary NIC ? should it be public network ( same setting with the first NIC)

    i read an article on the blog:

    Purpose: Testing network adapters will serve dual purposes. We will test the functionality of the heartbeat and the ability for cluster heartbeat communications to be routed over the public network. We will also test the failover scenario in the case of both public network adapters (members of the network team) losing connections to the network.

    Test Procedure: First we will test the heartbeat interconnect and ensure that cluster communications are carried through the public network without interrupting cluster communications by disconnecting the private network adapter. Secondly, we will test network adapters to ensure that when one out of the two network adapters are unplugged, the other network adapter will communicate with the network as usual. Then we will unplug the remaining network adapter, which at this time is carrying all network communication, including the heartbeat. After this series of tests is complete, we will bring the system back to a normal configuration and test the public network adapters by removing their connection to the network. However, in this last test we will leave the heartbeat interconnect in place.

    Expected Result: For a disconnected private adapter, Windows is expected to switch internal cluster communications to a public adapter automatically. There should be no service interruption. For disconnecting one of the public network cards, network team driver is expected to switch to using another network card automatically. There should be no service interruption. For disconnecting all network adapters, cluster is expected to initiate the fail over once it discovers that active node is unavailable. Service interruption should be in the range of 0-2 minutes.

    how do i setup the secondary NIC full fill this test network?

    the expected result as said is when we unplug cable the first NIC on the active node, there should be no service interruption

    when we unplug cable the secondary NIC on the active node ( the first NIC cable is plugged back) there should be no service interruption also.

    if we unplug both cable on first and secondary NIC, the fail over will run and there is some interruption ( during the fail over)

    is the exception is correct ?

    Thanks in advance 🙂

  • x_japanfans5312 (3/6/2012)


    Hi,

    i just set up a 2 node failover clustering for SQL SERVER 2008 on Windows Server 2008 R2 Enterprise Edition,

    how do i setup the secondary NIC full fill this test network?

    you could team the 2nd NIC with the first NIC on each node to create a fault tolerant public connection or you could configure it as a dedicated heartbeat, although this is no longer required with Windows 2008 clusters.

    x_japanfans5312 (3/6/2012)


    the expected result as said is when we unplug cable the first NIC on the active node, there should be no service interruption

    when we unplug cable the secondary NIC on the active node ( the first NIC cable is plugged back) there should be no service interruption also.

    if we unplug both cable on first and secondary NIC, the fail over will run and there is some interruption ( during the fail over)

    is the exception is correct ?

    Thanks in advance 🙂

    its going to depend how you setup the 2nd NIC on each node

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (3/6/2012)


    its going to depend how you setup the 2nd NIC on each node

    how do i setup the second network to fulfill those expected result ?

    is it to set the same public network ? it could be a private network ( only provides communication to each cluster only )

  • Jeffrey Williams 3188 (1/31/2012)


    plaid_draco (1/31/2012)


    Thank you for the quick responses! Very helpful.

    I guess my assumption that an active-active cluster would avoid that problem is unfounded. 😀

    Second, in SQL Server failover of any instance is a restart of the services on the other node. Any in-process transactions would be disconnected and on startup would be rolled back through the startup procedure. This is done using the transaction log by rolling back any transactions that were not completed.

    If you really need that flexibility - you would need to go with either Oracle RAC which is the only one I know of that maintains state on in-process transactions.

    It's been nearly three years since I made the initial post and we have set up two successful sql clusters that are spinning away today. However, I wanted to revisit the question of interrupted connections when an instance fails over. We are SQL and won't be switching to anything Oracle per the suggestion above.

    My question: Is there now anything relative to SQL server that allows us to fail over while insuring current connections aren't lost?

  • plaid_draco (11/24/2014)


    Jeffrey Williams 3188 (1/31/2012)


    plaid_draco (1/31/2012)


    Thank you for the quick responses! Very helpful.

    I guess my assumption that an active-active cluster would avoid that problem is unfounded. 😀

    Second, in SQL Server failover of any instance is a restart of the services on the other node. Any in-process transactions would be disconnected and on startup would be rolled back through the startup procedure. This is done using the transaction log by rolling back any transactions that were not completed.

    If you really need that flexibility - you would need to go with either Oracle RAC which is the only one I know of that maintains state on in-process transactions.

    It's been nearly three years since I made the initial post and we have set up two successful sql clusters that are spinning away today. However, I wanted to revisit the question of interrupted connections when an instance fails over. We are SQL and won't be switching to anything Oracle per the suggestion above.

    My question: Is there now anything relative to SQL server that allows us to fail over while insuring current connections aren't lost?

    there is no RAC equivalent in SQL Server at present, the closest is AlwaysOn Availability groups which can pretty much guarantee readonly connections will be uninterrupted.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply