Improving Replication

  • Comments posted to this topic are about the item Improving Replication

  • The single biggest thing I would do is allow the option to select not only which distributor to use, but which distribution database. Replication shouldn't be bound at a server/server level. I support a high-volume replication environment and the biggest pain point is contention in the distribution database. When one publication decides to update 400+ million rows and sends that transaction over to the distributor, everything else comes crashing down with it. (For those that would argue against letting something update 400+ million rows at a time, my response is: I agree, but the API won't let us do that. ๐Ÿ™‚ )

    Let us select not only a distribution server, but also a distribution database, even if it's on the same dist server.

  • All of the above, but robustness must come with the ability to just continue, even if the network was down for x hours or the destination was wiped or the destination server was rebuilt.  Now that I would call robust. ๐Ÿ˜€

    5ilverFox
    Consulting DBA / Developer
    South Africa

  • I'd like something simple.....
    I found a mis-spelling in a table name and I tried to use F2 to rename the field to the correct spelling.
    As the table was replicated, it wouldn't let me do that (can't remember the exact error message) but I had to use SP_Rename to do exactly the same job (which took longer).

    OK, I should have spotted the spelling mistake before I set up replication but MS could have made things a little easier.

  • I found that the brittleness in replication is due to the complexity and flexibility of the facility.  The brittleness was more down to inadvertent misconfiguration than an inherent problem with the facility itself.  There are thousands of options and many with profound implications.

    • Will deletes be replicated?
    • Will the entire publication be reset and therefore a complete refresh on every subscriber?
    • How will updates be handled?
    • Will a published article cause a full drop and recreate on the subscribers?
    • Will the replication stored procs be auto-generated on the target?
    • ...etc

    Get this wrong and you will get the appearance of brittleness.  For me the knowledge to make it robust and reliable was hard and painfully won.  I think the pain to learn the ins and outs of the system is the problem.  The team I worked with gained a detailed understanding of replication and the implications of the settings flags to the extent that a data analyst requiring access to a new table that was previously only available in the production database would have it in a few minutes.

    One of the pain points in SQL2000 has been addressed and that was that a replication failure would only warn you three times and then cease to warn.  For that reason we had to switch the distribution agent from continuous run to 5 minute batch runs.  This meant that each run would warn you three times.

    I would not want to lose the Publisher/Distributor/Subscriber model as this is far more flexible than Master/Slave.

    The performance of replication post 2008 is astonishing.  It used to be that a problem communicating with the distributor or subscribers needed to be fixed within a very short time window otherwise production log files would bloat and eventually cause an outage.  One time we were worried that one of our busiest databases would not survive downtime to service pack a distributor and transactions would take weeks to catch up on the subscribers.  It took about 30 minutes to catch up while also coping with peak load.

  • The monitoring\alerting is needs to more robust.

  • I think microsoft is currently investing in 

    Azure SQL Data Sync  

    https://docs.microsoft.com/en-us/azure/sql-database/sql-database-get-started-sql-data-sync

    but i do agree with you , HA tools in  SQL server have always been difficult to use .  
    I would love to see Microsoft improve log shipping and make is more seamless to use without shares and much hassle . 

  • As the market has changed and is rapidly doing so, it is also important to replicate from on premise SQL Server host to either on premise SQL Server or Cloud (AWS EC2, RDS, or Microsoft Azure). As we watch companies move from Oracle having ability to have both a SQL Server source and an Oracle source replicating data to the same SQL target. Ability to activate two or more databases in a SQL Server source all going to the same, or different, targets (as other's could add value).

  • We've been working with SSIS quite a bit and it does feel brittle. Both the tools and the core functions seem like they got them to the beta stage and stopped. Things like not handling many data types correctly such as varchar(MAX) or nvarchar(MAX) are discouraging. They have embedded reserved columns like ErrorCode and ErrorColumn that conflict with our tables and there is no workaround. It seems like when combined with CDC that we get additional conflicts and issues. SSIS converts all the char and varchar columns to nchar and nvarchar and then the packages don't work unless we create derived columns to convert them back. The expression parser is buggy, complaining about things like โ€“2147483648 being too big to fit because it apparently only handles constants up 2147483647 and doesn't understand properly about the minus sign.  I would love for them to just improve the robustness overall by fixing so many of these little annoying issues.

  • Replication wish list

    • Better default indexes (See blog post)
    • Built in canary tables for monitoring latency.
    • QueryStore for distribution databases
    • Fix the silent fail issue where replication can timeout and fail to create tables or insert \ delete SPs.
    • Increase the compression limit for snapshots, maybe compress for each article rather than the whole snapshot folder.
    • Reduce \ eliminate locking when creating snapshot.
    • Why is the compatibility level of the distribution database 2005?
    • Ability to create pull subscription jobs on Web edition (hereโ€™s my scripts)
    • Ability to synchronise the indexes between the publisher and the subscriber. So new indexes and changes are automatically replicated out.
    • Improved logging and diagnostics for errors. (detailed command info etc.).
    • Ability to analyse the transactions per object (possible with scripts) to help identify tables that are being updated too often and causing unnecessary strain.
    • Ability to optimise the order of the snapshot (inspect the dependency and reorder using sp_changearticle and sp_changemergearticle).

      • Aireforge have written but not released this. If people are interested, let us know and weโ€™ll look into creating an SSMS plug-in.

    Solving data & infrastructure issues via xTEN[/url] and our application, Aireforge[/url].

  • Nice list.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply