RE: Log Shipping or Transactional Replication

SSC Guru

Points: 60445

July 5, 2013 at 3:37 am

We use SQL P2P replication to provide scale-out, high availability, and resilience. It does what it says on the can.

There are some warnings in the P2P documentation about quiescing the system before doing schema changes and some other operations, but this is only necessary if you are initiating these changes from multiple P2P hubs. You also need a strategy for managing identity column ranges.

We designate one of our P2P hubs as the 'master write server' and send all our updates to this hub. This avoids any need to manage identity column ranges during normal use. All schema changes are also initiated on the master hub, and because of this we have never needed to quiesce the system. We use a DNS vanity name to identify the master server, so that no application changes are needed when we change the vanity name to point to the other hub.

The main 'gotcha' we found with P2P replication is manging Identiiy column ranges at failover. The high-watrer mark for an identity column is only ever updated at the server that ran the INSERT statement. We have developed a process where the high water marks for all identity columns are regularly copied into a control table that is available at all P2P nodes. At failover time we run a process that interrogates this table and runs a RESEED operation on all tables with identity columns, so that the high water mark on the new master is set to a higher value than we had recorded for the old master.

The use of P2P does not guarantee no data loss.

At a macro level, if the network fails between your sites, then obviously the data gets out of synchronisation. When the network comes back P2P will automatically resynchronise, but if your master site goes down during the network outage then the second site cannot have all the data.

At a micro level, SQL replication is a batch-based process, kicked off at 5-second intervals. It should be expected that your secondary site is 5 to 10 seconds behind the primary site, plus any network delay. Depending on the type of failure you have, there is a window of opportunity to have this amount of data loss. BTW, seek Microsoft advice before reducing the time interval between replication batches, this can have unwanted side-effects.

If you need zero data loss then you should use synchronous mirroring.

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara