Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234

The Brittleness of Replication Expand / Collapse
Author
Message
Posted Thursday, August 14, 2014 9:42 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 9:25 AM
Points: 2,407, Visits: 1,003
Another one I thought of, having to use the hosts file so you can see replication monitor, as you can't see it when your instances have multiple part names, sqlinstance.live.local, etc.

qh


SQL 2K acts like a spoilt child - you need to coax it round with lollipops.
Post #1603358
Posted Thursday, August 14, 2014 10:53 AM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Today @ 3:05 PM
Points: 37, Visits: 771
Steve Jones - SSC Editor (8/14/2014)
Brandon J Williams (8/13/2014)

My point is that replication novices are quick to point the finger at replication when it's not replication that is the problem.

We actually haven't done much in the way of getting a stable replication environment, it just works. Sure, we've read BOL, but as far as getting it stable, that's it.


You're conflating your experience with the capability or robustness of replication. I've had to be stable in places, I've had it inexplicably fail. Certainly some of the failures have to do with a lack of bandwidth, or maybe disk space, or networking, or data, or something else.

However that's where replication is brittle. It will fail with things that shouldn't cause it to fail.


Replication is not brittle. When it fails, due to something like lack of bandwidth, disk space, networking, or data, that is not replication's fault. Any other piece of software could fail under those circumstances. Replication is no different.

Just because it hasn't for you, doesn't mean it can't or won't.


That is true. The learning curve with replication can be steep, and while there is usually one way of setting up replication correctly, there are many ways to set it up incorrectly.

The fact you reinitialize subscriptions leads to my point that it has plenty of room for improvement (along with other features).


I think you misunderstand. There are publication and article properties, that if changed, require that the snapshot be regenerated and/or subscriptions be reinitialized. This is no secret, it is spelled out in BOL. So what we have in that case is the need for a maintenance window. The ability to reinitialize subscriptions quickly has to do with shrinking that maintenance window down as small as possible.

The impression that replication is brittle is just inaccurate. Just like any other technology, its concepts must be understood to avoid common pitfalls.
Post #1603387
Posted Thursday, August 14, 2014 12:45 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Today @ 4:00 PM
Points: 31,181, Visits: 15,627
I'll still disagree with you, Brandon. If something doesn't tolerate imperfections in the environment, it's a bit brittle. There are plenty of ways that replication could auto recover from these events. The fact that it doesn't means it's not as robust as it could be. Or perhaps, should be.






Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1603451
Posted Thursday, August 14, 2014 1:13 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 12:50 PM
Points: 2,908, Visits: 1,835
Back in the SQL2000 days replication would retry 3 times if it failed and on each attempt it would provide the DBA with an alert. Then it would give up so those 3 emails would be lost amongst the morrass of "important" emails. To get around this we simply switched the jobs to run every minute or so each time it would retry 3 times and give 3 alerts.

I'd forgotten about the need for a hosts file hack, particularly in multi-part server names. That needs fixing.

Given the more distributed nature of data these days and that cloud based systems can and do fail replication does need to be improved. It was easy to live with when distributed data was less mission critical.


LinkedIn Profile
Newbie on www.simple-talk.com
Post #1603457
Posted Thursday, August 14, 2014 2:34 PM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Today @ 3:05 PM
Points: 37, Visits: 771
If the network goes down for whatever reason, if it comes back up within the retention period, replication will pick up where it left off. It auto recovers from that, no problems there.

What specifically would you like it to auto recover from?
Post #1603477
Posted Thursday, August 14, 2014 2:39 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Saturday, October 18, 2014 6:09 PM
Points: 58, Visits: 188
OK, I think we have really beat this one to death. Let's move on.
Post #1603478
Posted Thursday, August 14, 2014 2:42 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Yesterday @ 11:18 AM
Points: 169, Visits: 1,855
I'd like to be able to turn it off without turning it on again when I do a restore into test.
Post #1603481
Posted Thursday, August 14, 2014 2:46 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:37 PM
Points: 40,210, Visits: 36,619
I have really un-fond memories of doing replication with an Oracle publisher (with SQL 2005). That was a nightmare, random data conversion errors that were incredibly hard to track (errors didn't indicate in any way what article was the problem), subscribers being invalidated for no apparent reason, at one point the snapshot agent hung after the 255th article any time it ran. That required a complete tear-down and reconfigure to get past.

Keep things simple and replication works. Try to get fancy and it's a recipe for pain.
At the very least it (and many other components of SQL) needs better monitoring built in. Way better.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1603483
Posted Thursday, August 14, 2014 5:22 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:45 AM
Points: 7,804, Visits: 9,556
Steve Jones - SSC Editor (8/14/2014)
I'll still disagree with you, Brandon. If something doesn't tolerate imperfections in the environment, it's a bit brittle. There are plenty of ways that replication could auto recover from these events. The fact that it doesn't means it's not as robust as it could be. Or perhaps, should be.

It certainly didn't handle problems well back in SQL2000. It was fairly easy though to detect some of its failures to handle problems and hadle them for it, without any human intervention (that's why I say "fragile" rather than "brittle"), and in my opinion that is already enough to make it clear that the monitoring and the error management were of a standard far less than acceptable.

Far more important were the explicable errors: inserting row with already existing primary key, updating row with primary key that didn't exist, deleting nrow that didn't exist - the only thing updating the database at the subscriber was replication, there was no software that wrote to it, no-one was updating it, but these things happened. They were a big problem, because we were using replication to create a (cold) standby copies of critical databases on customer sites, and we aimed to restore service very rapidly (minutes, not hours) if a server went kaput, and servers did go kaput, now and again. If a main server had gone phut while a subscription was being reinitialised and we were left with recovery from backups we wouldn't have met that target, which would have damaged our reputation even though our contracts allowed a much longer time to recover. Servers broke simply because hardware breaks sometimes, especially if it's in a country where mains electricity voltage sometimes fluctuates wildly enough to cause damage even to kit which is certified for use in that country, or climate and computer room air conditioning is such that the equipment is being run at something quite a bit higher than its proper operating temperature, and most of our customers were in such countries and ran such computer room cooling.


Tom
Post #1603500
« Prev Topic | Next Topic »

Add to briefcase «««1234

Permissions Expand / Collapse