peer to peer topology Replication - Node Failure issue

Question

peer to peer topology Replication - Node Failure issue

smarttrail

SSC Enthusiast

Points: 128
More actions
January 27, 2011 at 6:03 am

#388721

I am not a database administrator, how ever i have been looking into
peer to peer topolgy of microsoft sql server 2008 R2. I have a major
doubt.
all the msdn material and blogs say - sql server peer to peer
replication is a High Scalability + High Availibility strategy.
the examples say that if i have 4 nodes in the peer to peer topology
- eg Node1 London , Node 2 Berlin ,Node 3 California and Node 4 Tokyo.
I understand that each of these nodes can handle writes and reads
independently. so when i write into Tokyo all the records are
replicated in the other 3 nodes -london , berlin and california.
Now my doubt is - How is failure handled in peer to peer topology.
Theory say that if Tokyo is down the users at tokyo might still
manage to use the node from berlin ,london or califoria and have
live data. How exactly is this done?? Does this need to be configured
in the application or is there some config in the sql server nodes?
I am not sure if the sql connection string with parameter "failover
partner= berlin" might be used in this scenario. can any one
elaborate? thanks

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 1

Replication would continue to save up the transactions that needed to be replicated to the failed location for the retention period that you specify when configuring replication and distribution. So, in the example that Tokyo went down, then the commands that were set to be replicated to Tokyo would be kept in the distribution database and when the distribution agent was able to communicate with that site again it would work on getting all those commands replicated there. All other sites would continue to replicate normally.

How you force your other applications to note that failure and look to other sites for data is something completely different and would not be something maintained in replication.

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

smarttrail SSC Enthusiast Points: 128 More actions · Answer 2

"How you force your other applications to note that failure and look to other sites for data is something completely different and would not be something maintained in replication. "

How is this part done.? In peer to peer topology how do u detect a node failure and how do u handle it. Are you saying there's no mechanism like the heart beat to handle a failed node situation ?

thanks

steveb. SSC-Forever Points: 46733 More actions · Answer 3

Failover Partner = in the connection string is only used with Database mirroring so it wont be useful for replication

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 4

Yes, as Steve stated that is for Database Mirroring. Is that what you are trying to look into?

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

smarttrail SSC Enthusiast Points: 128 More actions · Answer 5

Yeah may be the failover partner is for database mirroring.- u r right. But i am still looking for - how do i detect a node failure in a peer to peer topology? and finally how do i handle it?

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 6

smarttrail (1/27/2011)
Yeah may be the failover partner is for database mirroring.- u r right. But i am still looking for - how do i detect a node failure in a peer to peer topology? and finally how do i handle it?

For peer-to-peer, replication will detect and handle the node failure. Please be sure to check your retention settings as I mentioned in my first post though.

As for your application, you will have make provisions for it to handle a site being down.

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

smarttrail SSC Enthusiast Points: 128 More actions · Answer 7

if i have to detect a node failure and handle it myself manually - peer to peer replication can't be classified as High scalability + High Availability solution.

Basically my application is in c#.U mean to say i put the connection through a try catch block to detect that the site is down? that shouldn't be the way. there should be a way to do that automatically.

More over if i need to check every request manually for a site down issue the whole process is going to be really slow.

http://msdn.microsoft.com/en-us/library/ms151244.aspx plz check this link. what does the eg say?

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 8

Typically for the scenario that is described in that article you would have an internet presence which would do the detection and routing of traffic to the sites that are online. You would then have an application server(s) which would be dedicated to connecting to the specific database server.

So, what you are looking for is something done at the network routing layer, and that coupled with peer-to-peer replication provides the solution described in that article.

Does that help?

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

smarttrail SSC Enthusiast Points: 128 More actions · Answer 9

Yeah the example says it across the internet but - i was talking of a case were i have a small data center with 4 machines and each machine is a node. There 's no internet detection at all. the webservers are load balanced using windows 2008 NLB and each web server is attached to the nodes london or california or berlin or tokyo.

No lets say the hard disks at tokyo fails or there is some corruption of data. A user request comes in for tokyo.It has no clue that tokyo has failed.It tries to open a connection to the tokyo server but it struggles. How do i detect a failure of the tokyo node at the application level ? there may be other requests that are just being queued up at Tokyo,because the webserver is not down,only the db is down !

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 10

First, you are asking replication to do something that needs to be done via a load balancing solution for the sites, not by SQL Server replication. I can ask the same question that you ask about disks failing on the database server in relation to the network connection failing between your webserver site and Tokyo. How will you detect that and not have a customer perceptible impact? You have to have the layer that is going to handle this detection and route intelligently OR you better be prepared to have someone watching this all the time and manually changing connection properties on your application server, which of course would be silly.

Hopefully this helps.

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

smarttrail SSC Enthusiast Points: 128 More actions · Answer 11

"You have to have the layer that is going to handle this detection and route intelligently"- can you elaborate on this point. exactly how ?

if u remember in sql server failover clustering each of the nodes are aware that the other nodes are alive.If a node goes dead it stops sending the heart beats. so the other nodes take upon themselves to take up the load.

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 12

smarttrail (1/27/2011)
"You have to have the layer that is going to handle this detection and route intelligently"- can you elaborate on this point. exactly how ?
if u remember in sql server failover clustering each of the nodes are aware that the other nodes are alive.If a node goes dead it stops sending the heart beats. so the other nodes take upon themselves to take up the load.

Yes, failover clustering is aware of a node failure. What you have described in your initial post are standalone SQL Servers that have peer-to-peer replication in place to keep the data in sync between them. Additionally, I believe you described a dedicated web / application server connected to each of those database servers. With that being said, and assuming that I am correct in describing your architecture, you need to have a layer, most likely at the router layer, that is going to detect active sites as well as load balance connections coming into your web site / application and send them to the appropriate application server. That way if a sites connection is down (i.e. network connectivity to Tokyo drops out) or a database server at that site drops offline then routing can be sent to the other application servers and minimal downtime would be noted by customers.

I will also say that the document link you provided earlier shows the webservers as being load balanced which is basically what I am describing. There is a load-balancing layer that you are missing in your descriptions.

This article might be helpful although it doesn't describe what I am sharing explicitly it comes pretty close. Check out Figure 9 and that will give you an idea of what I am referring to. In your case you would have a database server connected to each one of the application servers directly and would have peer-to-peer keeping the data in sync between the database servers.

Hopefully this helps a bit more.

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

smarttrail SSC Enthusiast Points: 128 More actions · Answer 13

I didn't miss the load balancing.I mentioned i was using NLB in windows 2008 TO LOAD balance the web servers. However the NLB Only load balances web servers.It has noo clue about the database connectivity or a database error. if on the other hand the web server had failed the load balancer would allocate you to a different webserver like london on california.

we must also remember tht there is no so called "layer" between the web application and the database server.The web app uses connection strings directly to open a db connection.If a db connection fails the only way an application would know abt it , would be by catching an exception.we must keep in mind that the application has not failed so load balanced app would be

still working as usual.

If u r mentioning a layer between the web app and db are you talking abt any ready mde product which helps in that case or do i need to write some custom logic in my application?

smarttrail SSC Enthusiast Points: 128 More actions · Answer 14

may i ask another question. I am not sure if it's foolish ? but is it wise to have a raid mirror db for each of the nodes which are replicated.? is it possible to have a mirror db at all for the replicated nodes?