Replication critical issue

  • Right firstly our Log Agent fell over because it ran out of locks (What the hell?) during a process of 9,000,000 individual delete commands (poor data transfer software we use).

    So I upped the number of locks to 50,000 and the log reader stayed up and collected the 9,000,000 commands.

    However, it now has 5,000,000 inserts to implement and it falls over again based on lack of lock resource. So the Log Reader is not running, but the distribution agent should still be able to push through the previous 9,000,000 right? You don't need the log reader to be running to distribute commands already in the dist db?

    So the dist fell over due to the record it trying to delete not being present (long story but this is correct, so I turned on the Skip Errors). So now it is distributing 9,000,000 deletes and ALL of them will have to be skip errored.

    This has now been running for 6 hours so far and the command count has not gone up. If I look in the various tables and sps it would appear that it has yet to replicate 1 transaction. BUT on sp_who2 it seems it recognises that it is running.

    Is it just not updating the stats because it has so much to do? Or is it not doing anything because the log reader is not running and it needs it for reference? (surely not?)_

    Thanks

  • If I use sp_repldone to push it past all the error commands will this mean I have to drop and recreate the publisher or subscriber?! Or will it just carry on from the new position?

  • As I thought the log reader doesn't need to be running once it has passed the transactions/commands onto the distributor. It is still running and shows no signs of progress, but does act as if it is active and not frozen. No locks exist. You can see the last batch moving on under sp_who2. All very odd. sp_repldone is appealing to me, depending on outcome of question above,.

  • Just for interest - My resolution to this issue.

    I had to drop the replication subsciber and publisher. I used BCP to get all 3 servers I am using aligned (took less than an hour once transferred and restored everywhere whilst also carrying out other tasks)

    I then put this massive table into its own publication, with the no sync setup. It is now back in line.

    It has made me question whether to bother replicating such big tables if I can egt away with a once a day update using bcp that is much quicker! Or at the least, I feel I probably need some routines in place to drop subscribers and resync with bcp if everything goes out of sync.

    😎

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply