Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Understanding Peer-to-Peer Transactional Replication, Part 1

By John F. Tamburo,

Peer-to-Peer Transactional Replication ("PPTR" for short) is the ugly duckling of SQL Server replication topologies.  However, it need not be so.  There are occasions where PPTR is the perfect solution.  In this article I try to explain the right occasions to use PPTR and what the DBA needs to do when implementing it.

When is PPTR a good solution?

The first question that most admins ask when confronted with PPTR is, "Why don't I just use Merge Replication?"  At first blush, it seems that Merge Replication is a good fit and PPTR is unnecessary. 

However, Merge Replication adds a GUID row identifier to every table that is included in the topology.  That identifier could break the application.  With very large tables, the 16-byte GUID could cause storage issues to arise.  PPTR has no such limitations.  

Therefore, if you are dealing with black-box code that cannot be amended, or if the changes needed to support merge replication cannot be easily implemented, you should consider PPTR.  You should also consider PPTR if you are dealing with storage limitations that make adding 16 bytes to every row of every table to be replicated impractical.

Considerations when implementing PPTR

In PPTR, every database in the topology acts as both publisher and subscriber.  Updates to the database are published to all other databases, and the database is subscribed to receive updates that occur on any other database.

This fact brings up the following considerations, in addition to the normal requirements of transactional replication (e.g., primary keys must be defined):

  • Timing:  Conflicts may arise if the same row is updated at two nodes at the same time.
  • Identity Management:  Identity columns, natural primary keys and other unique constraints need to be managed.  You need to ensure that a row with the same unique identifier is not inserted into more than one database.
  • No Snapshots:  You cannot snapshot any node in PPTR.  This is an important caveat, as explained below.

Timing and Conflict Detection

Timing issues are usually handled by disciplined application design.  For example, if row 123 is being modified in database A, a row is inserted into a replicated table that acts as an application level "lock" on the row.  That table is examined by any program using any database prior to allowing modifications.  This (or another solution) should be done because locks on replicated databases are not replicated to the other databases in the topology.

Sometimes, however, the application is not coded to do such a thing.  In that case, a conflict might occur.  PPTR comes with optional conflict detection, in which each database in the topology is assigned a number, with the lowest number giving the highest priority.  Conflicts are resolved by priority, with the rows that were rejected written into a system table, where developers and DBA can work to ensure that no critical business data are lost.  Each table in the topology is given its own conflict table.  For example, the table [dbo].[orders] would have a conflict table named [dbo].[conflict_dbo_orders].

Identity and Primary Key Management

I won't repeat the classic DBA mantra that every table should have a non-business-significant sequential integer (int or bigint) clustered primary key defined, with any "natural" primary keys defined as unique nonclustered indexes.  The performance benefits of this arrangement are well-known, as are the exceptions to this rule.  If you follow this discipline, you will have a primary key defined as int or bigint with the identity specification.

Now, you will have the same table on all of the PPTR nodes, and you need to set ranges for all of those identity columns.  Assigning identity ranges requires you to determine growth room, for if you run out of identity values in one range, you will overlap identities.  If conflict detection is not enabled, PPTR will crash.  If conflict detection is enabled, you will fill up the conflict tables with data that should be in the production tables.  Either circumstance will result in user discomfort.

I recommend that you set up the identity as a BIGINT and try to project a few years' worth of growth for the tables in each database.  Then add a fat pad to that amount, and use that to set ranges.  I also recommend creating a SQL job on each server that queries the tables to examine that range for coming close to running out.  If the range is within a defined percentage of running out, email or page DBA.  Some monitoring tools will allow you to put the query into the tool and generate an alert.  It doesn't matter how you get the alert notice, as long as it gets rapid attention.

Setting the ranges is not a problem.  You simply use DBCC CHECKIDENT with the RESEED option.  I have a script that I use to generate the reseed option.  In my environment, the identity column has the same name on all tables in the PPTR topology.  If your environment differs, you may need to alter the SQL to recognize the tables and the identity columns.

set nocount on;

declare @sn varchar(128), @seedmin bigint, @seedmax bigint, @schemaname varchar(128), @tablename varchar(128), @maxkey bigint

set @sn = @@SERVERNAME;
print '--Executing partitioning on: ' + @sn

select @seedmin = case @sn
	when 'Server0' then 500000001
	when 'Server1' then 20000000001
	when 'Server2' then 30000000001
	else 0
end

select @seedmax = case @sn
	when 'Server0' then 10000000001
	when 'Server1' then 20000000001
	when 'Server2' then 30000000000
	else 0
end

print '--Seed value = ' + convert(varchar(25),@seedmin) + '-' + convert(varchar(25),@seedmax) 
print '--Executed at: ' + convert(varchar(80),getdate());
print ' '
print 'declare @maxkey bigint'
print 'declare @newkey bigint'
print 'declare @seed bigint = ' + convert(varchar(25),@seedmin)

declare cur cursor for 

select
	T.TABLE_SCHEMA
	,T.TABLE_NAME 
from INFORMATION_SCHEMA.TABLES T
inner join INFORMATION_SCHEMA.COLUMNS C
	on T.TABLE_NAME = C.TABLE_NAME
where C.COLUMN_NAME = 'SequentialIdentity'
and T.TABLE_NAME not like 'conflict%'
and T.TABLE_TYPE = 'BASE TABLE'

open cur

fetch next from cur into @schemaname,@tablename

while @@FETCH_STATUS = 0
begin
	print ' '
	print ' '
	print 'select 
	@maxkey = isnull(max(SequentialIdentity),0)
	from [' + @schemaname + '].[' + @tablename + ']
	where SequentialIdentity >= ' + convert(varchar(25),@seedmin) + '
	and SequentialIdentity < ' + convert(varchar(25),@seedmax) + ';'
	print ' ' 
	print 'print '' '''
	print 'print ''--Table name = [' + @schemaname + '].[' + @tablename + ']'''
	print 'print ''--Maxkey='' + convert(varchar(25),@maxkey)'
	print 'set @newkey = @seed'
	print 'if @maxkey > @seed
	set @newkey = @seed + (@maxkey - @seed)'
	print 'print ''dbcc checkident(''''' + @schemaname + '.' + @tablename + ''''',RESEED,'' + convert(varchar(25),@newkey) + '');'''
	fetch next from cur into @schemaname,@tablename
end

close cur
deallocate cur

The script above generates a second script into the messages in SQL Server Management Studio.  That generated script is executed and in turn generates a script full of the DBCC CHECKIDENT commands that you execute in yet another SSMS window to set the range for the node in question. 

Thus, the above script is useful to regenerate range keys in the event that you must restore the topology, a fate that I hope that you never experience!  It may be possible to reduce effort and write a step that does not generate a script that generates a second-level script.  I haven't put the energy into that effort.

Clustered Index Considerations

If you cluster on a sequential identity (either as a PK or an unique index), you will have identity ranges for each database in the topology.  Therefore, you will have to consider using a fill factor of less than 100 on the clustered index.  Page splits along the high boundaries of the ranges (except the last range) are inevitable.  I generally use a fill factor of 90; you may need to go lower than that.  A lower fill factor may waste space within the range, but the performance boost will be worth the wasted space.

Whatever database gets the most INSERT traffic should be placed into the highest range.  This will reduce page splits, since rows inserted at the end of the clustered index will simply fill up a page and allocate the next one.

Snapshots = Never

PPTR cannot be populated by snapshot.  That's the story, all of it.  When you implement the PPTR topology, all articles must contain identical data rows.  How do you populate the PPTR nodes?  Backup and Restore is the easiest method.  The procedure from 10,000 feet:

  1. Quiesce the system
  2. Take a backup of the canonical database (where I work, this is the database that "wins" all conflicts and is numbered 1 in the PPTR topology wizard (see below)
  3. Restore that backup to all other planned databases
  4. Reset the ranges of all identities and key generation mechanisms.
  5. Create the publication
  6. Place all the nodes into the topology
  7. Release the databases for use.

Of course, Backup and restore assumes that all of the tables in the database are either static or in the PPTR topology.  Isolating the replicated tables into a separate database containing nothing but those tables is an excellent idea if your application can handle it.  I have also had excellent results using RedGate SQL Data Compare to synchronize individual tables when adding them to the PPTR topology after it has initially gone live.

Big caveat:  SQL Server generates snapshot agents and jobs for PPTR nodes.  Please, don't ever run that agent. However, you also cannot delete that agent. 

In the next article, we will set up and demonstrate how PPTR works.

John F. Tamburo is the Chief Database Administrator for Landauer, Inc., the world leader in radiation measurement and physics.

Total article views: 3277 | Views in the last 30 days: 15
 
Related Articles
FORUM

database diagram print on one page

database diagram print

FORUM

collation conflict in sqlserver 2005

collation conflict

FORUM

Exporting and Printing Database Diagrams on large paper stock

Seeking info on exporting and printing database diagrams

FORUM

ERROR: Collation conflict!!!

ERROR: Collation conflict!!!

ARTICLE

Centrally collect replication conflicts and send an alert

Automatically capture replication conflicts and generate an automated email notification.

Tags
peer-to-peer    
replication    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones