Scary Merge Replication issue

  • Has anyone seen something like this before?

    I'm creating a replication infrastructure almost identical to our successful 2005 implementation, using scripts generated from the '05 imp. Topology: One master, 6 republishers, 100 end-point CE subscribers. Two publications (one effectively download-only, one bi-directional).

    After applying the scripts and re-initializing the subscribers, I sit back and watch. After four hours, I have 30% of the devices synced up, and its looking good.

    Then, without warning, the following error: Object referenced by the given @article or @artid '1A616B5D-0C11-4CD1-A020-8A8D68874A19' could not be found. (Source: MSSQLServer, Error number: 20669).

    As I watch, the republishers start losing articles from the publication. In the end (perhaps half-an-hour), all six republishers are left with the two publications, but without any articles in them.

    Its happened twice now. Any ideas? I'm going to check the default trace, then pour over the scripts for any anomolies...

    TIA,

    Grubb

  • Grubb (11/3/2011)


    Has anyone seen something like this before?

    I'm creating a replication infrastructure almost identical to our successful 2005 implementation, using scripts generated from the '05 imp. Topology: One master, 6 republishers, 100 end-point CE subscribers. Two publications (one effectively download-only, one bi-directional).

    After applying the scripts and re-initializing the subscribers, I sit back and watch. After four hours, I have 30% of the devices synced up, and its looking good.

    Then, without warning, the following error: Object referenced by the given @article or @artid '1A616B5D-0C11-4CD1-A020-8A8D68874A19' could not be found. (Source: MSSQLServer, Error number: 20669).

    As I watch, the republishers start losing articles from the publication. In the end (perhaps half-an-hour), all six republishers are left with the two publications, but without any articles in them.

    Its happened twice now. Any ideas? I'm going to check the default trace, then pour over the scripts for any anomolies...

    TIA,

    Grubb

    seen this? http://social.msdn.microsoft.com/Forums/en-US/sqlreplication/thread/0b72403b-06de-4990-992d-a3dbd3a759ce

    sounds like the same issue.

  • Thanks for replying, NJ.

    It might be it. They said a "full sync" fixed it...I'm going to assume they mean a re-initialization. One guy hacked the sysmergearticles table to realign it, but a hack is not an option. Also, I'm just trying to complete a Pilot...The data's been lost already. We have to show that this isn't going to happen in Production.

    In the default trace of a repub, I see where a table belonging to the article is Deleted, Created, and Altered, as part of the Master sync. I only see it once in the last 18 hours, around the time it stopped working. I'm going to look in the Master default trace, and see if I can see something triggered.

  • Have you checked which article the id ties to in sysarticles in the published db? You could find invalid meta data is the cause.

    What other investigations have you done?

  • Well, I dug through the logs on the Repubs, and the Master, as well as the default traces.

    I found that I could see exactly when the repubs started getting the error, from the Master log. I could see when the tables were dropped, recreated, and altered on the Repub trace, but that didn't tell me what initiated the changes.

    I have opened a ticket with Microsoft, and they give me some triggers to put on some system tables, and we'll try to catch it.

    -Grubb

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply