SQLServerCentral Article

The Trouble with Transactional Replication and large articles.

,

Problem

Your production SQL Server transactional replication just failed on your largest article and the business impact is critical. How do you get replication restored quickly without the time consuming reinitializing snapshot? In this article we look at how you can quickly restore SQL Server transactional replication.

Solution

Reliable replication is a critical component of many SQL Server shops, so DBAs need a tool to help them quickly restore transactional replication.  Many DBAs have found replication failures to be complex and frustrating taking hours or days to resolve. There are several methods available to restore your system quickly, but my favorite technique is to rebuild the publication/subscription from a proven template that requires no time consuming snapshot. You can accomplish this at your shop using five easy steps.

Background

In this example the publisher is named Pubr. Publications are Pub1 and Pub2. Subscribers are Sub1 and Sub2 as shown in figure 1.  Pub1 has three articles. Pub2 has a hundred articles.  Most articles have modest row counts but one contains 300 million rows. Pub1 is still running but Pub2 with its 300 million row article has failed.

Pubr
    Pub1 ------> Sub1  Status=active
        Article 1
        Article 2
        Article 3
    Pub2 ------> Sub2  Status=down, replication errors queuing
        Article 1   300 million rows   
        Article 2    10 thousand rows
        Article 3   100 thousand rows                                                                                 figure 1
When there are a handful of queued invalid replication commands you can fix them one by one (I'll show how to do this in a later article) but when there are hundreds of invalid commands you’ll need another option. If you attempt to reinitialize large articles it can take hours maybe even days (as in my case). I have found bosses to have little patience during expensive replication outages.

In this case articles 2 and 3 are small enough to reinitialize but not the 300 million row article 1. In my shop it would take many hours to reinitialize article 1 and during that time the applications are down.  Figure 2 shows that shortly after the failure of article 1 only a very small percentage of the total rows are not replicated, so why reinitialize the entire 300 M row article? What we need is to modify the replication definition on article 1 to avoid reinitialize and then clear out the invalid commands.

Pubr
    Pub2                                     Rows Not Replicated
       Article 1  300 M Rows                 100 K Rows
       Article 2   10 K Rows                   0 K Rows
       Article 3  100 K Rows                   1 K Rows                                                        figure 2

Five Easy Steps 

There are five easy steps that you need to follow.

First

Script out your existing publication. The easiest method is via Management Studio. Expand Replication, then expand Local Publications and right click Pub1. Click Generate Script. At the bottom of the pop-up click the drop down and select Open in New Query Window.

In this case we are only interested in sp_addaraticle for ART1 and the sp_addsubsciption. Here are the scripts:

-- Adding the transactional articles use [ReplTest]
exec sp_addarticle 
  @publication = N'Pub2', 
  @article = N'Art1', 
  @source_owner = N'dbo', 
  @source_object = N'Art1', 
  @type = N'logbased', 
  @description = N'', 
  @creation_script = N'', 
  @pre_creation_cmd = N'drop', 
  @schema_option = 0x00000000080350DF, 
  @identityrangemanagementoption = N'manual', 
  @destination_table = N'Art1', 
  @destination_owner = N'dbo', 
  @status = 24, 
  @vertical_partition = N'false', 
  @ins_cmd = N'CALL [dbo].[sp_MSins_dboArt1]', 
  @del_cmd = N'CALL [dbo].[sp_MSdel_dboArt1]', 
  @upd_cmd = N'CALL [dbo].[sp_MSupd_dboArt1]'

Notice pre_creation_cmd = ‘drop’ which means truncate the subscriber table something we do not want on large tables .

exec sp_addsubscription 
  @publication = N'Pub2', 
  @subscriber = N'SQLREMOTE', 
  @destination_db = N'ReplTest', 
  @subscription_type = N'Push', 
  @sync_type = N'automatic',
  @article = N'all', 
  @update_mode = N'read only',  
  @subscriber_type = 0

Notice sync_type='automatic' which means schema and data must be transferred to the subscriber before replication starts, something we do not want on large tables.   Ensure this script is saved.

Second

Modify the add article script for Art1 for quick starting by setting the @pre_creation_cmd=’none’. This avoids the snapshot for this article and is the first key to bringing replication online quickly. Use the sp_addarticle pre_creation_cmd to control reinitialization at the article level for any article. Avoid using sp_changearticle as this only bypasses the snapshot but does not clear out the bad replication commands.

use [ReplTest]
exec sp_addarticle 
  @publication = N'Pub1', 
  @article = N'Art1', 
  @source_owner = N'dbo', 
  @source_object = N'Repl1', 
  @type = N'logbased', 
  @description = N'', 
  @creation_script = N'', 
  @pre_creation_cmd = N'none', 
  @schema_option = 0x000000000803509F, 
  @identityrangemanagementoption = N'manual', 
  @destination_table = N'Repl1', 
  @destination_owner = N'dbo', 
  @status = 24, 
  @vertical_partition = N'false', 
  @ins_cmd = N'CALL [sp_MSins_dboRepl1]', 
  @del_cmd = N'CALL [sp_MSdel_dboRepl1]', 
  @upd_cmd = N'SCALL [sp_MSupd_dboRepl1]'

Next modify the add subscription for quick starting by setting the @sync_type=n'replication support only'.  This allows replication to start immediately and is the second key to bringing replication online quickly.  Expect the 'subscription not active' message if this parm is 'automatic'.

exec sp_addsubscription @publication = N'Pub2',
  @subscriber = N'SQLREMOTE', 
  @destination_db = N'ReplTest',  
  @subscription_type = N'Push', 
  @sync_type = N'replication support only', 
  @article = N'all',  
  @update_mode = N'read only', 
  @subscriber_type = 0

Third

Now we need to drop the publication. The easiest method is via SSMS. Expand replication and then expand publication. Select Pub2, right click, and select delete.

This also cleans up the unprocessed commands for this publication that caused the initial problem and is the thrid key to bringing replication up quickly. This isn’t as risky as it seems since you have saved the entire create script in step 1 and recreating a publication via commands takes less than 1 minute.

Fourth

Synchronize the data. This step is recommended since we have removed the snapshot for some articles. This step can be delayed if your goal is a quick restoration of replication and you understand the impact of the missing data. If you are going to synchronize I have found that Redgate’s Data Compare is a great tool to synchronize the subscriber. Microsoft's BCP utility used with a where clause selecting only the missing rows is another option. In this case we only have to synchronize Art 1’s 100K missing rows. Data Compare typically completes this in minutes.

Fifth

We are almost done, we only needed to run the modified script from Step 2 in Management Studio, which will create a new publisher and subscriber.  Since pub 1 is running, comment out the first three exec as they aren't needed. 

-- Enabling quick restart of replication 
use master
/*
exec sp_replicationdboption @dbname = N'ReplTest',
  @optname = N'publish', 
  @value = N'true'
GO
exec [ReplTest].sys.sp_addlogreader_agent 
  @job_login = null, 
  @job_password = null, 
  @publisher_security_mode = 1
GO
exec [ReplTest].sys.sp_addqreader_agent
  @job_login = null, 
  @job_password = null, 
  @frompublisher = 1
GO  */
Ensure the sp_article @pre_creation_cmd=n'none' and the sp_addsubscription @sync_type=n'replication support only' are set.  The script should run in seconds, if it fails make certain @pre_creation and @sync_type are set correctly then repeat steps 3,4 and 5. I have used this process many times on different version of SQL Server with success.
Congratulations, your replication will be up immediately and your boss will be happy, well at least until the next DBA emergency. I recommend creating quick restore templates for all your publications/subscriptions as this has been an absolute application savior on several outages.

Acknowledgement:

 Reviewers are critical to writers, thanks to Dale Gundersen @ Reply.com for his advice.

Rate

4.69 (36)

You rated this post out of 5. Change rating

Share

Share

Rate

4.69 (36)

You rated this post out of 5. Change rating