Major performance problem after replicating database

  • After setting up one of our databases up for replication and then generating a snapshot. We are randomly getting timeout errors on simple update queries. The timeout time is 2 minutes so there really isn't anything major happening with the query. the actual statement is below, only the table and column name were changed.

    "UPDATE TableName SET iLockedBy = 1 WHERE lPrimaryID = 128098"

    If I setup replication without generating a snapshot or delete the replication that statement runs without problem or hesitation.

    I ran a trace and thats the only statement running for the application. Everything in between looks to be the normal events when running a trace. I may not have the right trace settings so I maybe missing some things. the snapshot event seems to be causing the performance problem.

    Has anyone ran into something like this?

  • Are these "update" statememts affecting a lot of rows ?

    You should look at how many locks are being generated, probably on the distribution database.


    * Noel

  • Ill look at the locks but no. That statement is only effecting one row which is a parent record.

  • Are your snapshots using NATIVE mode or CONCURRENT mode. It could be that there is a table lock ( when in native mode) that does not gets released until all rows are BCPed out. In adition if the destination drive for the snapshot does not have sufficient IO Bandwdith your "write" time-outs could increase. In adition you can monitor CPU usage on the server some times the snapshot agent is very agressive.

    If possible try to use DATABASE SNAPSHOT as the mode (this is only available on EE)

    Just my $0.02


    * Noel

  • We resolved the problem, by accident.

    We had one single publication and the table causing the timeouts had about 70 joined filters on it for the child tables, in total there are about 300 published articles. We split the replication into multiple publications because child tables were trying to insert before the parent table record was added for Web sync's. the Local sync's worked fine.

    Since the change the performance problem is gone and I think it might have had something to do with the joined filters. I have tried to reproduce the problem using multiple publications and it thankfully wont show its head. If I set it back up with a single publication and the joined filters it shows back up.

    New trace files kept showing that the SP:CacheMiss for the single publication but not with the multiple publications. When using the single publication I even restarted the SQL instance and then rebooted to make sure the all memory was released. The server is a dual dual-core Xeon with 8GB RAM and AWE w/Lock pages policy was enabled. The server is dedicated to SQL and no other process were running that would have eaten up that much memory.

    We still have some problems to iron out with this replication project but the performance problem is no longer one of them.

    Thanks for the assistance.

  • Bob Shaw (7/2/2008)


    We resolved the problem, by accident.

    We had one single publication and the table causing the timeouts had about 70 joined filters on it for the child tables, in total there are about 300 published articles. We split the replication into multiple publications because child tables were trying to insert before the parent table record was added for Web sync's. the Local sync's worked fine.

    Since the change the performance problem is gone and I think it might have had something to do with the joined filters. I have tried to reproduce the problem using multiple publications and it thankfully wont show its head. If I set it back up with a single publication and the joined filters it shows back up.

    New trace files kept showing that the SP:CacheMiss for the single publication but not with the multiple publications. When using the single publication I even restarted the SQL instance and then rebooted to make sure the all memory was released. The server is a dual dual-core Xeon with 8GB RAM and AWE w/Lock pages policy was enabled. The server is dedicated to SQL and no other process were running that would have eaten up that much memory.

    We still have some problems to iron out with this replication project but the performance problem is no longer one of them.

    Thanks for the assistance.

    Thanks for posting back. This is one of those things I don't do very often (join filters) and I am glad I have avoided them so far. You probably saved me hours of troubleshooting such issues. Thanks again.


    * Noel

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply