Transactional replication fails with no meaningful error message

  • Hi,

    I am trying to setup replication between two clustered instances belonging to the same 3 node cluster. The distributor is a separate instance from publisher and subscriber.

    When I re-initialise a publication I get this error message:

    The replication agent encountered a failure. See the previous job step history message or Replication Monitor for more information. The step failed.

    Neither job history nor SQL Server logs provide any other meaningful messages.

    When I try to run the snapshot from command prompt it just works fine:

    "C:\Program Files\Microsoft SQL Server\100\COM\SNAPSHOT.EXE" -Publisher [sqlcluster1\publisher] -PublisherDB [MyDB] -Distributor [sqlcluster2\distributor] -Publication [MyPublication] -DistributorSecurityMode 1

    Log reader and distributor agent fail with the same generic message.

    SQL Server agent account is a local admin on each cluster node and a member of sysadmin role on each instance.

    I am running SQL Server 2008R2 SP2.

    Any ideas?

    Thanks.

  • Did you check replication monitor? Seems like you did but didnt specifically mention it.

    That or

    select * from distribution.dbo.msrepl_errors order by time desc

    Also regarding permissions, is the service account different for each instance? The snapshot should be in a full unc path (\\server\drive\) accessible by the subscriber's service account.

  • Yep, I did check the replication monitor. And this query returns errors which happened two days ago and had been fixed.

    Amazingly, the snapshot agent works fine when run from command prompt using the same service account.

    All instances are using the same service account. The snapshot folder looks like this:

    \\SQLCluster8\SQLRepl

  • Roust_m (6/20/2013)


    The replication agent encountered a failure. See the previous job step history message or Replication Monitor for more information.

    Locate the SQL-Job in management studio (not within the replication monitor) belonging to the snapshot agent. Have a look at the history of this sql job. There you mostly find the mentioned "previous step" with some (more or less) useful error messages.

  • WolfgangE (6/22/2013)


    Roust_m (6/20/2013)


    The replication agent encountered a failure. See the previous job step history message or Replication Monitor for more information.

    Locate the SQL-Job in management studio (not within the replication monitor) belonging to the snapshot agent. Have a look at the history of this sql job. There you mostly find the mentioned "previous step" with some (more or less) useful error messages.

    Did this already, no useful message there.

  • This turned out to be the problem:

    http://connect.microsoft.com/SQLServer/feedback/details/273892/replication-agents-hang-when-a-host-has-a-large-number-of-agents-running-on-it

    Just migrated to the new cluster, did not change the replication architecture. Everything worked on the old one.

  • Interesting, so it's a load issue with the distributor? Did you change to a pull subscription or update the jobs to use cmdexec?

  • Try to capture error through profile.

    ---------------------------------------------------
    "Thare are only 10 types of people in the world:
    Those who understand binary, and those who don't."

  • Andrew G (6/26/2013)


    Interesting, so it's a load issue with the distributor? Did you change to a pull subscription or update the jobs to use cmdexec?

    I did cmdexec. Very weird, as our replication is not that big.

  • free_mascot (6/26/2013)


    Try to capture error through profile.

    Which profile?

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply