SQLServerCentral Article

Scaling Out the Distribution Database

,

SQL Server replication is a mature and reliable out-of-the-box means of copying

data from one database to another. Up to a point it is also a high performance solution.

There are many factors that influence the performance of replication and perhaps

the most overlooked one is the configuration of the server acting as the distributor.

By default the server acting as a distributor will have a single database called

"distribution". As the number of publications and publishers increase we

are faced with the fact that this single distribution database is attempting to

process all transactions.

Fortunately we can set up additional distribution databases on our distributor and

assign a separate distribution database to each publisher. Short of installing a

brand new distributor this acts as quite an effective way of scaling out distribution.

Basic topology for replication

There are a number of articles on SSC that deal with replication topology but for

brevity SQL Server follows a magazine publishing paradigm.

  • Publishers supply data to the distributor
  • The distributor provides data to subscribers.

A publication contains articles which can be any combination of the following

  • Tables
  • Views
  • Indexed views
  • User defined functions
  • Stored Procedures

You can have one or many publications from any database and the database at the

subscriber end can receive one or many subscriptions.

The diagram below illustrates this setup.

Basic replication topology

We can see that replication is quite flexible in what it allows us to do and, even

in this basic setup will work well for quite heavy loads.

Multiple distribution databases

The symptoms of a struggling distributor are not always apparent from the normal

Perfmon counters. The most obvious sign of a problem is in the replication monitor

where the rate at which outstanding transactions are dealt with seems to slow to

a crawl.

In a struggling distribution database the number of records in the MSRepl_transactions

table is likely to be very high and once it has gone beyond a certain size distribution

will slow to a crawl.

Fortunately we can have multiple distribution databases on the server acting as

the distributor as the diagram below shows.

Multiple distribution database topology

A publishing instance of SQL Server may only be assigned to a single distributor

and a single distribution database. Looking at the diagram above we could not have

Publish_01 using Distribution_A and Publish_02 using Distribution_B unless Publish_01

and Publish_02 were on separate SQL Server instances on the same machine.

Setting up a new distribution database

This is easy using either Management Studio or by scripts.

The following code snippet adds a new distribution database called "QA_Distribution".

USE master
GO
exec sp_adddistributiondb
    @database = N'QA_Distribution',
    @data_folder = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Data',
    @log_folder = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Data',
    @data_file_size = 5,
    @log_file_size = 5,
    @min_distretention = 0,
    @max_distretention = 24,
    @history_retention = 24,
    @security_mode = 1
GO

There are a few points to note about the parameters for the sp_adddistributiondb

stored procedure

ParameterComment

@data_folder

@Log_folder

If you have reached the point where you need separate distribution databases then

ideally you should aim to store the distribution database and log files on fast

disks

@data_file_size

@log_file_size

The figures are in MB. Ideally you want your file sizes to be big enough so you

are unlikely to need an autogrow.

@max_distretention

Books online tells us that if the subscriber does not receive transactions from

the distributor and there are actually transactions that are older than this figure

then the subscription will be marked as inactive.

Although this defaults to 72 hours you need to make a judgment call as to whether

your distributor can actually hold 72 hours worth of transactions! If your company has support cover on a 24/7 basis

and you can fix replication issues within 4 hours then drop this figure down to around 8 hours.

Why around 8 hours? Old DBA rule when estimating. Come up with a figure, double it then add a bit!

@history_retention

This is how long you wish to retain the replication history. Again, if you run a 24/7 service

then the default of 48 hours may be unnecessarily long.

 

Once we have created our new distribution database using the sp_adddistributiondb

stored procedure you need to run the following code snippet.

USE [QA_Distribution]
GO
IF NOT EXISTS (SELECT * FROM sysobjects WHERE name = 'UIProperties' AND type = 'U ')
    CREATE TABLE UIProperties(id INT)
IF EXISTS (SELECT * FROM ::fn_listextendedproperty('SnapshotFolder', 'user','dbo', 'table', 'UIProperties', null, null))
    EXEC sp_updateextendedproperty
        N'SnapshotFolder', 
        N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\ReplData', 
        'user', 
        dbo, 
        'table',
        'UIProperties'
ELSE
    EXEC sp_addextendedproperty 
        N'SnapshotFolder', 
        N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\ReplData', 
        'user', 
        dbo, 
        'table',
        'UIProperties'
GO

This snippet creates a user table called UIProperties and adds an extended property

called "SnapshotFolder" to the table containing the filepath for the snapshot

BCP files.

To do the same thing using management studio do the following

  • Connect to your distributor
  • Right-Click on replication and choose "Distributor Properties" as shown below.
    Distributor properties
  • On the Distributor Properties dialogue choose "New" as circled below in red.
    Add a new distribution database
  • Complete the resulting dialogue as shown below.
    New distribution database properties

It is worth emphasising that management studio does not allow you to set the data

and log file sizes in this dialogue so, as mentioned in the parameters for sp_adddistributiondb you

should set there sizes to ensure that they are not likely to require an autogrow.

It should also be noted that you can add data files to the primary filegroup for

the distribution database if required.

Configuring replication to use our new distribution database

If you are commissioning a new database server to use as a publisher then this is

quite straight forward. If you are changing existing publications/subscriptions

to use the new distribution database then there are few more steps to undertake

but they are straight forward, if a little inconvenient.

If you have any existing publications

This is the inconvenient bit. You have to script off the creation of ALL publications

and subscriptions and then drop said publications and subscriptions prior to going

any further.

This is necessary because changing the distribution database for a publisher affects

the entire publisher.

Configuring from a script

The code below is sufficient to assign a distribution database to a publisher.

exec sp_adddistpublisher
    @publisher = N'DEVELOPMENT',
    @distribution_db = N'QA_Distribution',
    @security_mode = 1, -- Use trusted security to allow the distributor to talk to the publisher.
    @working_directory = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\ReplData',
    @trusted = N'false', -- deprecated, for backwards compatibility only.
    @thirdparty_flag = 0, -- 0 = SQL Server, 1 = other (probably ORACLE)
    @publisher_type = N'MSSQLSERVER'

Once this has been done you can run you publication and subscription creation scripts

taking whatever remedial action is necessary to ensure that the data on your publisher

and subscriber are in sync.

Configuring from management studio.

Frankly this is a pain in the neck and is a good example where management studio

can be smartened up a bit. Do the following as before

  • Connect to your distributor
  • Right-Click on replication and choose "Distributor Properties"
  • In the left hand panel choose "Publishers" to see a dialogue box similar

    to the one shown below
    Changing the default distribution database

  • Uncheck the publisher you wish to reassign a distribution database for
  • Click OK to close the dialogue box. You will receive a warning message asking you

    if you want to disable this publisher and that all publications and subscriptions

    will be deleted. This is why you needed to script your publications and subscriptions

    before doing this! Although the warning will say that publications and subscriptions

    will be deleted I have found that management studio leaves behind orphaned subscriptions

    so I stress the need to drop publications and subscriptions manually.

  • Repeat the first 3 steps only this time your original publisher will not be shown

    so click the Add button shown in red.

  • Connect to your publisher and you will receive the "Publishers" dialogue

    box again only this time you can choose the distribution database from a drop down

    menu as shown below.

    Specifying a distribution database for a publisher

Once you click OK you will have configured a publisher to use a specific replication

database.

Again, you will now have to run your publication/subscription creation scripts and

take whatever remedial action is necessary to ensure that the data is in sync between

your publisher and subscriber.

Closing thoughts

Using separate distribution databases can have a dramatic beneficial effect on replication

performance. I would go as far as to recommend that a distribution DB per publisher

be adopted as standard practise.

By having multiple distribution databases we may actually get a benefit that is

perceived as being greater than reality simply because transactions from publisher

'x' no longer get jammed up in a distribution database shared with publisher

'y'.

Sizing of the distribution databases is important. Whatever else happens you want

to avoid the database auto-growing so size the database correctly in the first place.

If you make the distribution database too big then it can always be shrunk later.

Don't overlook the hardware for the distribution databases. The nature of what

these databases are asked to do is to read and write simple transactions at very

high speeds and the appropriate hardware should be chosen accordingly.

Rate

4.83 (18)

You rated this post out of 5. Change rating

Share

Share

Rate

4.83 (18)

You rated this post out of 5. Change rating