Question re: replication & initializing from backup

  • So I had to rebuild a server that was subscriber using pull replication.  Currently I'm restoring the database on the subscriber form backup and it's a big one.  I know from previous recovery test that this will take about 5 hours.  Now I'm planning on applying DIFF and Log backups as needed when the Full backup restore is finished...yes I set it to NO RECOVERY. 😉  Now my transaction log (min) retention is 8 hours based on my distributor properties.  The full backup is from about 19 hours ago.

    Now my question is how do I add this subscriber on the publisher without it causing any data consistency errors.  I'm assuming when I run my sp_addsubscription command with the initialize from backup (using the same backup) that it will attempt to run all the transactions since the backup but many of them will already have been applied.  Unfortunately, I can't just restore the DB without applying the updated logs since the distributor will not have those logs older than 8 hours. 

    I'm wondering if I could use a different strategy here but I was afraid any other form of recovery would have taken much longer considering the size of the database in question.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Y.B. - Wednesday, November 29, 2017 7:12 AM

    So I had to rebuild a server that was subscriber using pull replication.  Currently I'm restoring the database on the subscriber form backup and it's a big one.  I know from previous recovery test that this will take about 5 hours.  Now I'm planning on applying DIFF and Log backups as needed when the Full backup restore is finished...yes I set it to NO RECOVERY. 😉  Now my transaction log (min) retention is 8 hours based on my distributor properties.  The full backup is from about 19 hours ago.

    Now my question is how do I add this subscriber on the publisher without it causing any data consistency errors.  I'm assuming when I run my sp_addsubscription command with the initialize from backup (using the same backup) that it will attempt to run all the transactions since the backup but many of them will already have been applied.  Unfortunately, I can't just restore the DB without applying the updated logs since the distributor will not have those logs older than 8 hours. 

    I'm wondering if I could use a different strategy here but I was afraid any other form of recovery would have taken much longer considering the size of the database in question.

    How large is this database that you are restoring? The size of the database and the size of the backup?
    I'm not really following the sequence of what you are concerned about with initializing from a backup and then applying a differential but things you can do would be to not use a backup that is from 19 hours ago. Why do you need to use a backup from 19 hours ago? You can restore this sooner, take a differential and the differential should then be smaller and really shouldn't be taking 8 hours. You can also increase the retention time if needed.
    The commands that get replicated are based on the last lsn not the last full backup. It won't restore things out of order. It doesn't necessarily mean you won't get an error about missing transactions but they won't be applied. You would need to do another differential and apply that. The following is like a template for the process:
    SQL Server Replication: Synchronize From Backup

    Sue

  • The DB is ~350GB.

    You are right in that I don't need to use a 19 hour old backup but even if I were to do a full backup it takes about 40 minutes to complete plus another  5 hours to restore it to a remote server over the network.  The worse part was that after I got the database restored it looked like the transactions had already been flushed from the distribution db.  I had specifically changed the retention to keep at least 4 hours but it didn't take.  I'm not sure if it required a service restart or if it was something else. 

    I know initializing from backup is supposed to be the recommended method for large database but I must say I'm not a fan.  Considering there is a lot of manual work to clean up the tables and columns that are not needed.  I don't know what the performance hit on the publisher would be but I'd love to test and see what kind of time it would take to initialize a subscriber using a snapshot and automatic sync type. 

    I guess I can't complain too much, this is the first major issue I've had with replication in the 7 years since I've replaced my predecessor.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Y.B. - Wednesday, November 29, 2017 3:14 PM

    The DB is ~350GB.

    You are right in that I don't need to use a 19 hour old backup but even if I were to do a full backup it takes about 40 minutes to complete plus another  5 hours to restore it to a remote server over the network.  The worse part was that after I got the database restored it looked like the transactions had already been flushed from the distribution db.  I had specifically changed the retention to keep at least 4 hours but it didn't take.  I'm not sure if it required a service restart or if it was something else. 

    I know initializing from backup is supposed to be the recommended method for large database but I must say I'm not a fan.  Considering there is a lot of manual work to clean up the tables and columns that are not needed.  I don't know what the performance hit on the publisher would be but I'd love to test and see what kind of time it would take to initialize a subscriber using a snapshot and automatic sync type. 

    I guess I can't complain too much, this is the first major issue I've had with replication in the 7 years since I've replaced my predecessor.

    Sound like you are restoring it across the network - copy it over to the server and restore it from there. And if your copies are horrendously slow, talk to your network folks. A 350 GB database shouldn't really cause you so many headaches. You could also try copying it using Robocopy and do it from the server. But you are lucky if this is the first major issue you've had  - I only wish. 🙂
    Your db size is usually in the area of an easier size with the reinit from backup. It can have some odd things come up but it's not too bad. Like everything else in replication, you kind of have to play with it, tweak it a bit to get it working how you want.
    Change the retention time before you start everything and when you add the publication, try setting @immediate_sync to true. That generally helps if you are hitting issues with the subscriber missing data.
    If you wanted to try to initialize from the snapshot it normally shouldn't be too bad, I've done that size before that way. From what you have said, the thing I would watch is the speed of things across the network. So I would engage whatever IT folks can help you look at that and check if the pipe you are using to copy is congested. And it may help to get their feedback on the network share you are using for the same reasons.

    Sue

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply