How to estimate the size of the distribution database?

  • It is a database used in transactional replication. The user database is only about 6 GB, but the mdf file size of the distibution database is 34 GB, and it could not be shrunk.

    Does anyone know how the size of distribution database related to user database, type of the replication, ...? What size is reasonable?

    Any input will be greatly appreciated.

  • With Transaction Replication, the size of the distribution database is more closely related to the number of transactions against the publishing database than the size of the publishing database. This is because transaction replication normally runs each INSERT/UPDATE/DELETE against each subscriber.

    As an extreme example, you publishing might have just one record in it which is updated millions of times each day. Hence, the published database (and the subscriber) would be pretty small. However, there would be millions of records in the distribution database (one for each UPDATE done on the one record).

    So, to estimate the size of the distrubution database, you will need to get an understanding of the normal activity done against the published database. To some extent, there is a relationship between the size of the transaction log (of the published database) and the size of the data in the distribution database.

  • Thank you so much for your input.

    Here is one more question based on your input:

    "As an extreme example, you publishing might have just one record in it which is updated millions of times each day. Hence, the published database (and the subscriber) would be pretty small. However, there would be millions of records in the distribution database (one for each UPDATE done on the one record). "

    After the transaction has been replicated to the subscriber, I believe that the transaction in the distribution will be inactive and will be wiped out soon. It should not contribute to the size of the distribution database any more. Isn't it?

    An unreasonable size of a distribution database may be a signal of failure/or delay in replication. Is it a reasonable guess?

  • If you do have a very large distribution database, it could indicate something is wrong. Transactions sit in the distribution database until all subscribers have picked them up. If you have had a failure in a subscriber, you need to make sure you have removed that subscriber from the replication topology, or the records do not get cleaned up.

    Also make sure you distributor jobs for transaction cleanup are not failing, or that they are running. These are important to remove old transactions.

  • Steve's reply is close.

    Yes, once the each update has been applied to the subscriber, the record in the distribution database may be deleted. However, this is not done immediately. There is a cleanup agent that is responsible for that.

    Check you replication agents and confirm whether all of the agents are actually scheduled to run (except for the snapshot agent which probably only needs to be run on demand - when you need to deliver a new snapshot...e.g. if you make changes to the table that are published).

    I wouldn't schedule the cleanup agent should to run all the time - it should be scheduled to run at a quiet time so that it does not interfere with the logreader or distribution agents.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply