Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Scaling Out the Distribution Database

By David Poole,

SQL Server replication is a mature and reliable out-of-the-box means of copying data from one database to another. Up to a point it is also a high performance solution.

There are many factors that influence the performance of replication and perhaps the most overlooked one is the configuration of the server acting as the distributor.

By default the server acting as a distributor will have a single database called "distribution". As the number of publications and publishers increase we are faced with the fact that this single distribution database is attempting to process all transactions.

Fortunately we can set up additional distribution databases on our distributor and assign a separate distribution database to each publisher. Short of installing a brand new distributor this acts as quite an effective way of scaling out distribution.

Basic topology for replication

There are a number of articles on SSC that deal with replication topology but for brevity SQL Server follows a magazine publishing paradigm.

  • Publishers supply data to the distributor
  • The distributor provides data to subscribers.

A publication contains articles which can be any combination of the following

  • Tables
  • Views
  • Indexed views
  • User defined functions
  • Stored Procedures

You can have one or many publications from any database and the database at the subscriber end can receive one or many subscriptions.

The diagram below illustrates this setup.

Basic replication topology

We can see that replication is quite flexible in what it allows us to do and, even in this basic setup will work well for quite heavy loads.

Multiple distribution databases

The symptoms of a struggling distributor are not always apparent from the normal Perfmon counters. The most obvious sign of a problem is in the replication monitor where the rate at which outstanding transactions are dealt with seems to slow to a crawl.

In a struggling distribution database the number of records in the MSRepl_transactions table is likely to be very high and once it has gone beyond a certain size distribution will slow to a crawl.

Fortunately we can have multiple distribution databases on the server acting as the distributor as the diagram below shows.

Multiple distribution database topology

A publishing instance of SQL Server may only be assigned to a single distributor and a single distribution database. Looking at the diagram above we could not have Publish_01 using Distribution_A and Publish_02 using Distribution_B unless Publish_01 and Publish_02 were on separate SQL Server instances on the same machine.

Setting up a new distribution database

This is easy using either Management Studio or by scripts.

The following code snippet adds a new distribution database called "QA_Distribution".

USE master
GO
exec sp_adddistributiondb
    @database = N'QA_Distribution',
    @data_folder = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Data',
    @log_folder = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Data',
    @data_file_size = 5,
    @log_file_size = 5,
    @min_distretention = 0,
    @max_distretention = 24,
    @history_retention = 24,
    @security_mode = 1
GO

There are a few points to note about the parameters for the sp_adddistributiondb stored procedure

Parameter Comment

@data_folder

@Log_folder

If you have reached the point where you need separate distribution databases then ideally you should aim to store the distribution database and log files on fast disks

@data_file_size

@log_file_size

The figures are in MB. Ideally you want your file sizes to be big enough so you are unlikely to need an autogrow.

@max_distretention

Books online tells us that if the subscriber does not receive transactions from the distributor and there are actually transactions that are older than this figure then the subscription will be marked as inactive.

Although this defaults to 72 hours you need to make a judgment call as to whether your distributor can actually hold 72 hours worth of transactions! If your company has support cover on a 24/7 basis and you can fix replication issues within 4 hours then drop this figure down to around 8 hours.

Why around 8 hours? Old DBA rule when estimating. Come up with a figure, double it then add a bit!

@history_retention

This is how long you wish to retain the replication history. Again, if you run a 24/7 service then the default of 48 hours may be unnecessarily long.

 

Once we have created our new distribution database using the sp_adddistributiondb stored procedure you need to run the following code snippet.

USE [QA_Distribution]
GO
IF NOT EXISTS (SELECT * FROM sysobjects WHERE name = 'UIProperties' AND type = 'U ')
    CREATE TABLE UIProperties(id INT)

IF EXISTS (SELECT * FROM ::fn_listextendedproperty('SnapshotFolder', 'user','dbo', 'table', 'UIProperties', null, null))
    EXEC sp_updateextendedproperty
        N'SnapshotFolder', 
        N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\ReplData', 
        'user', 
        dbo, 
        'table',
        'UIProperties'
ELSE
    EXEC sp_addextendedproperty 
        N'SnapshotFolder', 
        N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\ReplData', 
        'user', 
        dbo, 
        'table',
        'UIProperties'
GO

This snippet creates a user table called UIProperties and adds an extended property called "SnapshotFolder" to the table containing the filepath for the snapshot BCP files.

To do the same thing using management studio do the following

  • Connect to your distributor
  • Right-Click on replication and choose "Distributor Properties" as shown below.
    Distributor properties

  • On the Distributor Properties dialogue choose "New" as circled below in red.
    Add a new distribution database

  • Complete the resulting dialogue as shown below.
    New distribution database properties

It is worth emphasising that management studio does not allow you to set the data and log file sizes in this dialogue so, as mentioned in the parameters for sp_adddistributiondb you should set there sizes to ensure that they are not likely to require an autogrow.

It should also be noted that you can add data files to the primary filegroup for the distribution database if required.

Configuring replication to use our new distribution database

If you are commissioning a new database server to use as a publisher then this is quite straight forward. If you are changing existing publications/subscriptions to use the new distribution database then there are few more steps to undertake but they are straight forward, if a little inconvenient.

If you have any existing publications

This is the inconvenient bit. You have to script off the creation of ALL publications and subscriptions and then drop said publications and subscriptions prior to going any further.

This is necessary because changing the distribution database for a publisher affects the entire publisher.

Configuring from a script

The code below is sufficient to assign a distribution database to a publisher.

exec sp_adddistpublisher
    @publisher = N'DEVELOPMENT',
    @distribution_db = N'QA_Distribution',
    @security_mode = 1, -- Use trusted security to allow the distributor to talk to the publisher.
    @working_directory = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\ReplData',
    @trusted = N'false', -- deprecated, for backwards compatibility only.
    @thirdparty_flag = 0, -- 0 = SQL Server, 1 = other (probably ORACLE)
    @publisher_type = N'MSSQLSERVER'

Once this has been done you can run you publication and subscription creation scripts taking whatever remedial action is necessary to ensure that the data on your publisher and subscriber are in sync.

Configuring from management studio.

Frankly this is a pain in the neck and is a good example where management studio can be smartened up a bit. Do the following as before

  • Connect to your distributor
  • Right-Click on replication and choose "Distributor Properties"
  • In the left hand panel choose "Publishers" to see a dialogue box similar to the one shown below
    Changing the default distribution database

  • Uncheck the publisher you wish to reassign a distribution database for
  • Click OK to close the dialogue box. You will receive a warning message asking you if you want to disable this publisher and that all publications and subscriptions will be deleted. This is why you needed to script your publications and subscriptions before doing this! Although the warning will say that publications and subscriptions will be deleted I have found that management studio leaves behind orphaned subscriptions so I stress the need to drop publications and subscriptions manually.
  • Repeat the first 3 steps only this time your original publisher will not be shown so click the Add button shown in red.
  • Connect to your publisher and you will receive the "Publishers" dialogue box again only this time you can choose the distribution database from a drop down menu as shown below. Specifying a distribution database for a publisher

Once you click OK you will have configured a publisher to use a specific replication database.

Again, you will now have to run your publication/subscription creation scripts and take whatever remedial action is necessary to ensure that the data is in sync between your publisher and subscriber.

Closing thoughts

Using separate distribution databases can have a dramatic beneficial effect on replication performance. I would go as far as to recommend that a distribution DB per publisher be adopted as standard practise.

By having multiple distribution databases we may actually get a benefit that is perceived as being greater than reality simply because transactions from publisher 'x' no longer get jammed up in a distribution database shared with publisher 'y'.

Sizing of the distribution databases is important. Whatever else happens you want to avoid the database auto-growing so size the database correctly in the first place. If you make the distribution database too big then it can always be shrunk later.

Don't overlook the hardware for the distribution databases. The nature of what these databases are asked to do is to read and write simple transactions at very high speeds and the appropriate hardware should be chosen accordingly.

Total article views: 5735 | Views in the last 30 days: 83
 
Related Articles
FORUM

Moving Distribution Database to different server.

Moving Distribution database to different server

FORUM

SQL server distributed database

Database Distribution

FORUM

Replicate SQL2000(Publisher,Distributor) to SQL2005(Subscriber)

Cannot start the distribution Agent

FORUM

Changing Distributor

Need help changing the distributor

FORUM

How to publish SQL Server 2005 database online ?

Publish SQL server database online

Tags
replication    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones