Replication- distribution server crashed

  • What are the options of setting up the replication without configuring it from the scratch when distribution server is crashed. Is there any way to build up the distribution database if we have the data in sync on both publisher and subscriber?

  • Are there any distribution db backups? If so there is an MS article on how to restore it.

    Will the data have changed on the published databases? If not, you can create a new distributor and rebuild replication. Add the subscribers with a @sync_type of 'replication support only' which will only recreate the subscriber procs but not touch the data or require a snapshot.

  • MysteryJimbo (11/25/2013)


    Are there any distribution db backups? If so there is an MS article on how to restore it.

    Will the data have changed on the published databases? If not, you can create a new distributor and rebuild replication. Add the subscribers with a @sync_type of 'replication support only' which will only recreate the subscriber procs but not touch the data or require a snapshot.

    We have the latest backup of distribution database. But the data on publisher changes every minute and also as the distributor will not be available, it will be hard for us to clear the orphan publications and subscriptions and the publisher can have only one distribution server. So first we have to clear the publications not to use distributor and remove the publisher from replication.

    Can you please provide the link for that MS article ? Thanks.

  • If you plan on restoring the distribution db on the same distribution server you shouldn't need to cleanup replication on the publisher/subscriber. You would only need to do that if it is a new distributor.

    This is the article I was referring to. The strategy is different for merge as data can change at either end.

    http://technet.microsoft.com/en-us/library/ms152560(v=sql.105).aspx

  • deep_kkumar (11/25/2013)

    We have the latest backup of distribution database. But the data on publisher changes every minute

    If you cannot restore the distribution db, you will have to reintialise from snapshot. You're publisher transaction logs will continue to grow until you recover the distributor as the log cannot be cleared until all replicated commands are committed there. This is good, as it means once recovered, the log reader will know everything that still needs to replicate.

    deep_kkumar (11/25/2013)

    So first we have to clear the publications not to use distributor and remove the publisher from replication.

    If you do this, the above doesn't apply. As soon as replication is removed, the next transaction log backup will go through and clear out all of the commands. You will need to reinitialise the subscriber in some way.

  • MysteryJimbo (11/25/2013)


    This is good, as it means once recovered, the log reader will know everything that still needs to replicate.

    I should have been clearer. You continue to stand a risk of that undistributed transactions in the distribution db be lost if they weren't captured in the latest backup. The number of these (if any) depends on how busy the system was at the time of the corruption.

  • MysteryJimbo (11/25/2013)


    MysteryJimbo (11/25/2013)


    This is good, as it means once recovered, the log reader will know everything that still needs to replicate.

    I should have been clearer. You continue to stand a risk of that undistributed transactions in the distribution db be lost if they weren't captured in the latest backup. The number of these (if any) depends on how busy the system was at the time of the corruption.

    Yeah that's a big problem. I guess it's better to reset from scratch by generating the snapshot. Let us assume the distribution server is down and we have to delete the orphan publications so that when we reset it up it will not throw any error saying that it cannot connect to the distribution database as the distribution server is down. This makes the situation more complex as a publisher can have only one distribution database. We have to delete the orphan publications and subscriptions by ignoring the database.

    A different scenario, let us assume before doing the maintenance work on distribution server , if we have the latest backups of distribution database and msdb database of distribution server. Can we restore these backups on to a new server(same name as previous distributor server) and kick start the jobs? Does this take care of all the undistributed transactions in the distribution db and the pending transactions on the publisher? Can this be done ? Any suggestions ?

  • In theory this should work as a two stage process.

    Restore the SQL server as the host using master and msdb recovery

    Restore the distributor using the distribution db.

  • Jimbo,

    I tested it by restoring the master,msdb and distribution databases. But i am facing problems with log reader agent throwing following errors--

    The process could not execute 'sp_repldone/sp_replcounters' on 'MyPublisherServer'

    The specified LSN (%value) for repldone log scan occurs before the current start of replication in the log

    Our production server is a huge OLTP , I cannot run the sp_replrestart and miss the transactions. Is there any work around for this problem??

  • It looks like published database has had its transaction log restored past what was logged in the distribution db. There is a gap in LSN's preventing the logreader from starting as the log has been cleared of that LSN so it is now in the "future".

    Its been a significant amount of time since this issue occurred, I suspect you may have take it on the chin now and consider alternatives. Initially though, you need to get replication up and running on production and that involves loosing those transactions.

    You will need a plan on how to synchronise the data on the subscriber in someway.

    Perhaps consider setting up a logship from the published db and having a small outage to set up a no-sync subscription. You could logship to a different db name and rename it when you bring it online if you need to keep the subscriber db available (even though its out of sync). This would eliminate the need to generate a snapshot.

  • Jimbo,

    so in this case even if we have a backups of distribution server we will be not able to use them when a server crashes. Is there any high availability solution for distribution server we can work on in order to eliminate these kind of situations( mirroring, clustering, log shipping etc).

    Thanks,

  • The difficulty you face is that the LSN for the replicated transactions in the distribution database are out of sync with what you have coming from the publisher. The error "could not execute sp_repldone" is indicative of that. So, if you restore the distribution database you are going to see this. The only way to avoid that would be to restore distribution database and the published databases to the exact point in time. Not an easy feat.

    You can restore the distribution database and then force a sp_repldone which will invalidate all transactions in the published database transaction log that are marked for replication. Basically log reader will no longer pick those up. That also means that you have to manually sync all the tables in the publications as they will no longer be in sync with the subscriber. It will allow replication to start working again though. So, if tables are reasonably sized, and a initialize is not an option, this will work.

    Hope this helps.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Thanks David. I am able to resolve that issue.

    But here the problem is if we have a backup of distributor around 4 A.M in the morning and the server crashed at 3 A.M next morning we will be loosing lot of transactions.Would you suggest to take backups more often ? If we take the case of high OLTP environment this will create more issues. Again with table sizes of 30 GB, it's very difficult to run the commands and sync manually.

    I am working to setup synchronous mirroring for the distribution server. For this i have to change the recovery model for the distribution database. Any suggestions ?

  • Yes, agreed. That is a bit troublesome to sync, though not impossible. Having a synchronous mirror for the distribution server would be helpful. I definitely recommend having a HA solution in place for the distribution server and that includes the proper RAID configuration at the storage layer so that you are not suffering from a catastrophic failure that can't be recovered from locally. DR is another situation and what you spend on that will depend on your requirements. I think that most companies would be willing to suffer the downtime required to reinitialize a publication, even if it has a 30 GB table, to save on some of that from a DR perspective.

    Another note here, in most cases the distribution server is small enough that they will work well on a VM so you could move it there and partake of the VM environment HA solution if you have that in place.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • David,

    I was testing the synchronous mirror method. Automatic failover works fine as soon as the principal goes down but the tricky part is how to make the mirror act as distributor ? Tried to rename the DNS name, but we have to go for drop server and add server back. I restored master and msdb databases from principal distributor server to mirror server. I am kind of lost here. Any help would be appreciated.

    Thanks,

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply