New Cluster: Fail Over Test Breaks Connections

Question

New Cluster: Fail Over Test Breaks Connections

plaid_draco

Mr or Mrs. 500

Points: 523
More actions
January 31, 2012 at 9:46 am

#252138

We have set up a two node active-active cluster utilizing sql2008 r2. Regarding SQL clustering, I am a complete layman. This is my very first exposure to it.
To test it's behavior during a fail over, I ran a script that writes to a data to a table every tenth of a second. With that code running we downed the primary node. The cluster did, in fact, fail over to the second node; however, my query failed the very moment the node went down.
The failure message was:
Msg 64, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available.)
Seconds later, with one node still down, I was able to reconnect just fine. My question is whether there is any way to avoid this hiccup. My initial assumption was that any fail over would be a little more seamless.
What have I overlooked?
Thanks much,
Mike

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Mike

It's not seamless, I'm afraid. During a failover, the services stop on one node and start up again on another. Any open connections are therefore killed.

John

SQL_Giant SSC-Addicted Points: 420 More actions · Answer 2

With SQL Clustering, on a failover the connections are in fact lost unfortunately. The SQL service is actually restarted on the new server. So you will see a few seconds downtime depending on how quickly your databases run recovery.

The advantage of clustering is automatic failover to the backup hardware, but the downside is the loss of and rollback of connections.

To achieve seemless failover, you would have to implement Database mirroring, which also requires a duplicate of the database files (not just the server hardware).

If you check in the SQL Logs, you should actually be able to see the shutdown and starting up of the sql server.

plaid_draco Mr or Mrs. 500 Points: 523 More actions · Answer 3

plaid_draco

Mr or Mrs. 500

Points: 523

January 31, 2012 at 9:58 am

#1440776

<whoops... double post> Apologies.

plaid_draco Mr or Mrs. 500 Points: 523 More actions · Answer 4

Thank you for the quick responses! Very helpful.

I guess my assumption that an active-active cluster would avoid that problem is unfounded. 😀

Jeffrey Williams SSC Guru Points: 90252 More actions · Answer 5

plaid_draco (1/31/2012)
Thank you for the quick responses! Very helpful.
I guess my assumption that an active-active cluster would avoid that problem is unfounded. 😀

Well, there are a couple of things wrong with this assumption. First, you are assuming that active/active means the same database on both nodes is accessed. That is not correct - active/active just means you have an instance of SQL Server and it's associated databases on one node, and another instance of SQL Server on the other node with it's own associated databases.

Second, in SQL Server failover of any instance is a restart of the services on the other node. Any in-process transactions would be disconnected and on startup would be rolled back through the startup procedure. This is done using the transaction log by rolling back any transactions that were not completed.

If you really need that flexibility - you would need to go with either Oracle RAC which is the only one I know of that maintains state on in-process transactions.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Perry Whittle SSC Guru Points: 233846 More actions · Answer 6

SQL_Giant (1/31/2012)
To achieve seemless failover, you would have to implement Database mirroring

Mirroring does not guarantee to provide seamless failover, in fact, a mirror session can take sometime to failover depending on the size of its redo queue!

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

x_japanfans5312 SSC-Addicted Points: 458 More actions · Answer 7

Hi,

i just set up a 2 node failover clustering for SQL SERVER 2008 on Windows Server 2008 R2 Enterprise Edition,

previously i set up using 1 NIC on each node,

the failover went successfully when i tested with unplugged the network cable on the active node ( during the failover, the connection to database was cut off)

then i added 1 more NIC to each node ( i was aware about the warning of the single point of failure if we are using only one NIC).

the question is, how we configured this secondary NIC ? should it be public network ( same setting with the first NIC)

i read an article on the blog:

Purpose: Testing network adapters will serve dual purposes. We will test the functionality of the heartbeat and the ability for cluster heartbeat communications to be routed over the public network. We will also test the failover scenario in the case of both public network adapters (members of the network team) losing connections to the network.

Test Procedure: First we will test the heartbeat interconnect and ensure that cluster communications are carried through the public network without interrupting cluster communications by disconnecting the private network adapter. Secondly, we will test network adapters to ensure that when one out of the two network adapters are unplugged, the other network adapter will communicate with the network as usual. Then we will unplug the remaining network adapter, which at this time is carrying all network communication, including the heartbeat. After this series of tests is complete, we will bring the system back to a normal configuration and test the public network adapters by removing their connection to the network. However, in this last test we will leave the heartbeat interconnect in place.

Expected Result: For a disconnected private adapter, Windows is expected to switch internal cluster communications to a public adapter automatically. There should be no service interruption. For disconnecting one of the public network cards, network team driver is expected to switch to using another network card automatically. There should be no service interruption. For disconnecting all network adapters, cluster is expected to initiate the fail over once it discovers that active node is unavailable. Service interruption should be in the range of 0-2 minutes.

how do i setup the secondary NIC full fill this test network?

the expected result as said is when we unplug cable the first NIC on the active node, there should be no service interruption

when we unplug cable the secondary NIC on the active node ( the first NIC cable is plugged back) there should be no service interruption also.

if we unplug both cable on first and secondary NIC, the fail over will run and there is some interruption ( during the fail over)

is the exception is correct ?

Thanks in advance 🙂

Perry Whittle SSC Guru Points: 233846 More actions · Answer 8

x_japanfans5312 (3/6/2012)
Hi,
i just set up a 2 node failover clustering for SQL SERVER 2008 on Windows Server 2008 R2 Enterprise Edition,
how do i setup the secondary NIC full fill this test network?

you could team the 2nd NIC with the first NIC on each node to create a fault tolerant public connection or you could configure it as a dedicated heartbeat, although this is no longer required with Windows 2008 clusters.

x_japanfans5312 (3/6/2012)
the expected result as said is when we unplug cable the first NIC on the active node, there should be no service interruption
when we unplug cable the secondary NIC on the active node ( the first NIC cable is plugged back) there should be no service interruption also.
if we unplug both cable on first and secondary NIC, the fail over will run and there is some interruption ( during the fail over)
is the exception is correct ?
Thanks in advance 🙂

its going to depend how you setup the 2nd NIC on each node

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

x_japanfans5312 SSC-Addicted Points: 458 More actions · Answer 9

Perry Whittle (3/6/2012)
its going to depend how you setup the 2nd NIC on each node

how do i setup the second network to fulfill those expected result ?

is it to set the same public network ? it could be a private network ( only provides communication to each cluster only )

plaid_draco Mr or Mrs. 500 Points: 523 More actions · Answer 10

Jeffrey Williams 3188 (1/31/2012)
plaid_draco (1/31/2012)
Thank you for the quick responses! Very helpful.
I guess my assumption that an active-active cluster would avoid that problem is unfounded. 😀
Second, in SQL Server failover of any instance is a restart of the services on the other node. Any in-process transactions would be disconnected and on startup would be rolled back through the startup procedure. This is done using the transaction log by rolling back any transactions that were not completed.
If you really need that flexibility - you would need to go with either Oracle RAC which is the only one I know of that maintains state on in-process transactions.

It's been nearly three years since I made the initial post and we have set up two successful sql clusters that are spinning away today. However, I wanted to revisit the question of interrupted connections when an instance fails over. We are SQL and won't be switching to anything Oracle per the suggestion above.

My question: Is there now anything relative to SQL server that allows us to fail over while insuring current connections aren't lost?

Perry Whittle SSC Guru Points: 233846 More actions · Answer 11

plaid_draco (11/24/2014)
Jeffrey Williams 3188 (1/31/2012)
plaid_draco (1/31/2012)
Thank you for the quick responses! Very helpful.
I guess my assumption that an active-active cluster would avoid that problem is unfounded. 😀
Second, in SQL Server failover of any instance is a restart of the services on the other node. Any in-process transactions would be disconnected and on startup would be rolled back through the startup procedure. This is done using the transaction log by rolling back any transactions that were not completed.
If you really need that flexibility - you would need to go with either Oracle RAC which is the only one I know of that maintains state on in-process transactions.
It's been nearly three years since I made the initial post and we have set up two successful sql clusters that are spinning away today. However, I wanted to revisit the question of interrupted connections when an instance fails over. We are SQL and won't be switching to anything Oracle per the suggestion above.
My question: Is there now anything relative to SQL server that allows us to fail over while insuring current connections aren't lost?

there is no RAC equivalent in SQL Server at present, the closest is AlwaysOn Availability groups which can pretty much guarantee readonly connections will be uninterrupted.

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉