The Brittleness of Replication

  • Comments posted to this topic are about the item The Brittleness of Replication

  • From the perspective of a developer who occasionally delves into SQL Server I can see that there is a benefit to the stagnant nature of software i.e. I come back after a couple of years with no real SQL Server interaction and most is still familiar.

    I am not convinced though that this benefit cannot be realised if the product was no longer stagnant. If the UI remained as slow moving as before but the features were tidied up then it would sometimes require zero changes in the UI (I am including command line etc. here), sometimes a few small ones and occasionally an overhaul. Most people can live with this especially when considering the benefits.

    Backward compatibility is already managed on a compatibility level for the SQL Database Engine. Is it time that each subsystem has a different compatibility level? Should we be looking to allow more changes to eventually break backwards compatibility? These changes are not made lightly so should we consider them permanent breaking changes after a particular advertised time span? Do we need something akin to the product support lifecycle for features?

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • I'm not looking to break backwards compatibility. I just want improvements and things fixed that don't work well. There's no shortage of those. In replication, tooling has barely advanced, and robustness-wise, it's barely moved. Certainly in other areas tooling could be improved.

    However let me also comment on Oracle. They are very backwards compatible. They also suffer security issues because of this. I'm not sure I want complete backwards compatibility. In fact, I like the move from mirroring to AO/AG. While I like mirroring and I think AO/AGs need some work, I am also glad they didn't try to retrofit this and made a new technology.

  • Your post came just as I was initializing a replication subscription that has mysteriously been marked inactive and needed to be reinitialized. My company uses replication from servers in remote locations with limited connectivity so when it breaks it is a long process to get is working again. Fortunately this is not a frequent occurrence and the product works as advertised with just a few 'mysterious' events. I have not found a better product for data exporting with a usable user interface that I can get setup within an hour of getting the request. CDC is rather more complicated that I need as is Service Broker which I have used for other projects. I'll continue to use Replication and hope for less 'brittleness' in future upgrades.

  • I'm always surprised when people don't use replication, especially with reporting problems. Real time data is very often over rated outside of financial markets.

  • "Brittleness of Replication" is the best way I've heard of to describe that feature.

    Replication is awesome when it works but there seems to be no end of weird circumstances that can bring it crashing down. So much so that in our shop, we have a No-New-Replication policy. We have a lot of replication already and fixing the frequent problems takes way too much of my team's time.

    Replication has definitely improved over the years but it is still not robust enough to do the self-healing it should do.

  • Love "Brittleness of Replication"

    With Cassandra, peer to peer replication is a big selling point. And Hadoop has distributing computing. MSSS has had peer to peer replication for quite a long time, but has it advanced?

    If they had invested on making peer to peer in sql server solid, it would give it a multi-site HA, DR and scale-out strategy position. No, they follow the cluster approach, and even deprecate mirroring. We can only guess what it would have been.

    The more you are prepared, the less you need it.

  • I know this really old guy who once said that replication is only as reliable as the network and hardware it runs on.

  • No, they follow the cluster approach, and even deprecate mirroring. We can only guess what it would have been.

    It seems to have turned into Always-on availability groups.

  • What about full text search. It could have been good but the Solr/ElasticSearch horse has well and truly bolted.

  • I used replication a lot for about 7 years. This was all transactional replication, except for some(very little) ad-hoc snapshot replication. It certainly had some defects. I called it fragile, rather than brittle, because when it broke one generally didn't have to clean up the mess and start again from creating publications and subscriptions, one just had to mess about a bit and then get it going again, which only rarely would require even reinitialising a subscription. For many problems it was possible to write monitoring software to detect them and do the mesing about and restart (not reinitialise) a subscription - of course that had to be monitored to make sure that it didn't just keep trying again. But there were some really nasty bugs. MS may have fixed them in later versions - I never used replication on anything later than SQL 2000, except for some quick and dirty checks that what we had done before on 2000 would still work on 2008 - but I suspect not. As well as teh bugs, it was harder to use and manage than it ought to have been; and from what I've seen (and from Steve's article) it seems that MS hasn't done anything to make it easier. That's not a good thing. The documentation was prety terible too, but that may for all I know have improved.

    But although replication has not improved as fast as MS's customers would like, MS does a lot better at improving hings than some other software suppliers. I remember one product from CISCO that incorporated MSDTC (the SQL 2000 name for SQL Express without tools). Not only did CISCO not provide enhancements, they didn't fix bugs; and they built the thing in such a way that it was impossible to apply MS's fixes to SQL Server, which was a big problem for anyone who used it. My big problem was that the envirenment in which user software added to the thing had to run had race conditions that were not resolvable in the user software, and CISCO was not going to fix them. But for some of its users perhaps it was an especially big problem early in 2003 when this malware hit the internet, and owners of that product couldn't apply the fix MS provided because they were stuck with SQL Server 2000 RTM and no subsequent SP or fix could be applied. I saw similar (perhaps not as extreme as that one) problems with other products too. So let's not get too upset with MS about not improving everything, some others are worse.

    Tom

  • I hate replication. And I do my best to avoid it if at all possible.

    Back in 2002 It was my responsibility to build a DR plan for our SQL servers at our offsite 50 miles away connected by 4 T-1s. So I started the first effort try to use transactional replication to a set of standby databases. It failed the first test and then a month later failed the retest. So it was back to square one.

    Then I come to my current company and they built the software to have a master DB at the headquarters and supposedly replicate "universal" data like suppliers and standard codes down to the individual faacility DBs. It would only work on a facility that had at least two solid T-1 connections. Otherwise the facilities had to be co-located on servers at HA and then would RDP into facility or terminal servers. They finally shut it down for lack of customers.

    So onto the next SW we produced. It had a master DB and then two they had built two silos of data to try and load balance the customers. They used replication to move data down hill in the individual silos. But they never differentiated the replication so it was duplicating the data in both silos. And taking up the space.

    Now when someone says replication it takes all my personal skills not to shudder. If I can avoid it -- I will. I'd rather build everything by had to transfer data from DB to DB.



    ----------------
    Jim P.

    A little bit of this and a little byte of that can cause bloatware.

  • Replication is not brittle, buggy, or unstable. Unreliable network connections can cause replication to fail, and the replication novice will interpret this as a bug or as replication being unstable.

  • Replication doesn't work as well as log shipping on small pipes.

    I would like to see a better tool set as well, at least to the point of being able to script the changes needed to bring replication up and down.

  • Log shipping cannot do what replication does.

    I agree, over small pipes, or an unreliable or slow network connection, replication may timeout or fail or something like that. But there are replication agent parameters and other replication settings which can mitigate this. This is why proper design and pre-deployment testing is crucial. Microsoft also recommends a fast network of 100 Mbps or faster.

    The cool thing is, once a replication solution is designed, tested, and deployed correctly, nothing can beat its synchronization capabilities.

Viewing 15 posts - 1 through 15 (of 38 total)

You must be logged in to reply to this topic. Login to reply