Reducing the size of the distribution database & improving transactional replication performance: Part 3

Part 1 of this series of articles looked at the intricacies of SQL Server transactional replication, and made the observation that the clean-up mechanism treats publications as a collective. Part 2 looked at how differing distribution job schedules can cause unnecessary bloating in the distribution database.

The final (and much delayed) part of this series focuses on the choice between replicating stored procedure calls and replicating the data modified by stored procedure calls.

As an example consider the scenario where you have a stored procedure that deletes old data from a table periodically. Lets say that it runs daily and deletes roughly one million records per execution. Here you have a choice in how you replicate the changes made by the procedure:

You allow the deletes to be replicated individually.
You replicate the stored procedure call.

Both of these choices will end up with the same result; however, the impact of each can be significantly different.

Choice one is the default. By this I mean that you will typically add articles to replicate data from a table in site A to a table in site B. With this scenario any inserts, updates, and deletes will be replicated individually. By using this method, based on the example above you would be writing 1 million seperate entries into the distribution database. All of which would have to be synchronised individually with the replication subscriber over the network.

Choice two is adding an article to a publication for a specific procedure call. Then, when that procedure is called, instead of writing each delete to the distribution database individually you actually just write one entry into the distribution database for synchronisation. This one entry is the stored procedure call. Here, rather than each delete being synchronised individually, the stored procedure call is replicated. This means that the same call that was made on the publisher is also made on the subscriber. Therefore, there is only one item that is replicated rather than one million, and the same records will be deleted from the subscriber via its own stored procedure call. This has a performance benefit for the distribution database as the amount of data it is holding and synchronising can be significantly reduced. Network IO will also benefit significantly in this scenario.

Whilst the performance benefits for replication by selecting choice two in the above example are obvious, you probably won’t want to be replicating every single procedure call that you make. You’ll need to find those procedures that perform a large number of DML changes in an individual call and add them to the list of items to replicate. Now if you already have an article that replicates all changes to Table A you can keep this in place without any changes, even if you want to replicate a procedure that makes DML changes to Table A. By having the article in place for the stored procedure call, these changes just won’t get read by the article replicating Table A. It is simple, and if used in the correct scenarios it can be hugely beneficial to your replication environment and system as a whole.

Implementing

To replicate a stored procedure call you’ll need to add it as an article using sp_addarticle. You can create a specific publication for procedure calls or add it to an existing transactional publication. It is up to you. An example execution of sp_addarticle is shown in the example below.

EXEC sp_addarticle @publication = N'Your publication name'
 , @article = N'Your procedure name'
 , @source_owner = N'schema owner, e.g. dbo'
 , @source_object = N'Your procedure name'
 , @destination_table = N'Your procedure name'
 , @type = N'proc exec'
 , @description = null
 , @pre_creation_cmd = N'none'
 , @schema_option = 0x0000000008000001
 , @status = 16
 , @vertical_partition = N'false'
 , @filter = null
 , @sync_object = null;

Now one thing that is worth mentioning here is what you select for @type. Microsoft recommends you use serializable proc exec, however with this option set your procedure call will only replicate if it executes within an explicit transaction using the serializable isolation level. If your procedure call doesn’t meet this specific criteria you’ll need to use proc exec as specified in the example above.

To check whether your stored procedure call is replicating you can check out the distribution database after the call has executed. If your replication article is set up correctly for your procedure you should be able to amend the following query specifying your procedure name and see some entries for it returned. You then just need to wait for it to replicate (or give it a nudge by executing the distribution job), and check that the expected changes have also been made on the subscriber.

--Query to find repl commands for specified article
SELECT TOP 10 art.article
     , trn.entry_time
     , cmd.[command]
FROM   [distribution].[dbo].[MSrepl_commands] AS cmd
INNER JOIN [distribution].[dbo].[MSarticles] AS art
   ON  cmd.article_id = art.article_id
INNER JOIN [distribution].[dbo].[MSrepl_transactions] AS trn
   ON  cmd.xact_seqno = trn.xact_seqno
WHERE  art.article = 'Your procedure name';

For more information you can look at the following Microsoft articles:

The post Reducing the size of the distribution database & improving transactional replication performance: Part 3 appeared first on BI Design.

Reducing the size of the distribution database & improving transactional replication performance: Part 3

Implementing

Rate

Share

Share

Rate

Reducing the size of the distribution database & improving transactional replication performance: Part 3

Implementing

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts

Cookies on SQLServerCentral