• Hi R.P. - that's a really good suggestion, it was one of the items we'd originally thought may be causing our problem. A couple years ago we had to put similar processes in place, and I'd also recommend them for anyone maintaining a long-term merge replication system. We created a process that checks the oldest successful sync subscriber and sets the retention period automatically to 48 hours before that point, so we always have a 48 hour window of retained metadata. So usually we keep 48 hours, but if a subscriber stops synchronizing or fails for any reason, the system auto-adjusts retention back 1 day at a time to a max of 14 days retention. Then every night we have a tune-up on the publisher to run sp_mergemetadataretentioncleanup manually, and rebuild the indexes on a half dozen of the most used merge system tables (like contents, tombstone, current_partition_mappings, etc). Our publisher's msmerge_contents only has about 100-200,000 records in it, and at most we've had about 2 million during periods of high retention, which hasn't caused any problems. Once a week we do the same cleanup and rebuilding indexes on all subscribers.

    It keeps our replication lean and mean, which is why this is so troubling for us, most synchronizations deal with about 100 records every 10-15 minutes per subscriber and only takes about 20 seconds for the agent to complete start-to-finish.

    Under normal circumstances everything runs great until we get the odd subscriber who sticks on that uplineageversion sproc.