The impact of having data sitting in the distribution database that doesn’t need to be there can be significant. The aim of this series of articles is show you how to keep the distribution database as small (and efficient) as possible, and to keep the impact of the clean-up job to a minimum. Each of the steps outlined will help reduce the data that is unnecessarily held in the distribution database, and free up resources for other things.
Part 1 of this article focussed on the intricacies of SQL Server transactional replication, and made the observation that the clean-up mechanism treats publications as a collective rather than individuals. Part 1 looked into publication settings, and part 2 looks at how distribution job schedules can cause unnecessary bloating in the distribution database.
When the distribution clean-up job runs it executes the sp_MSdistribution_cleanup procedure. This procedure then executes a number of other procedures as part of the clean-up process. The most important procedure in terms of what will actually be removed is sp_MSmaximum_cleanup_seqno. This procedure finds the maximum transaction sequence number to be removed from the distribution DB, and uses the following process to find this:
* The MSdistribution_history table holds details of all synchronisations within the transaction retention period (default of 72 hours). If any distribution agent has not synchronised during this time then the oldest value in MSsubscriptions is used for that distribution agent.
If, for example, you have two distributions, one with a schedule of once a minute, and one with a schedule of once a day you will see something similar to the following:
In this specific example the MSrepl_transactions and MSrepl_commands tables will only be cleaned up once per day. For the rest of the day these tables will be filling up and the clean-up job will be running, reading more data and taking more resources each time it runs, but deleting nothing.
The resolution to this is simple; just set all of the distribution job schedules to be the same or similar.
Part 3 of the series will focus on replicating intensive stored procedure executions rather than the outcome of the execution.