Replicating a large replicated database

  • We're starting to hit a wall with our in-house skill set, and hopefully someone in the community can provide some input, or point to some additional online resources to give us some insight. First some background.

    Our source systems are written on a proprietary platform, and cannot be accessed on an ad-hoc basis via standard tools. The vendor essentially replicates the proprietary source data to a SQL Server database so that the data is available to query. The problem is that poorly written queries can effect the replication service, and degrade performance on the source system, and so we made the decision to replicate the data that we need to another server.

    At the beginning, transactional replication worked just fine. Now, however, the business requirements have become more complex. We need to take a subset of the replicated data, and replicate it again to another server. We filter the data via indexed views, and then replicate the indexed views to the destination server as tables.

    SourceDB --> Repl_DB1 --> Repl_DB2

    Here is where the real trouble starts...the article properties on Repl_DB1 are set to "Drop Existing Object" when the destination object name is in use. Because the destination object is now referenced via a view, we can neither drop nor truncate the destination object. Deleting the data is also not an option, because serveral of these tables contain hundreds of millions of rows, likely overwhelming our transaction log.

    So far, it's looking like we need to perform the following steps whenever we have to regenerate a snapshot:

    1. Drop publication from Repl_DB1 to Repl_DB2

    2. Drop all views on Repl_DB1 that reference replcated tables.

    3. Regenerate snapshot of Repl_DB1 using the truncate destination object option.

    4. Recreate views on Repl_DB1

    5. Recreate publication from Repl_DB1 to Repl_DB2 using the truncate destination.

    This is an option that we would like to avoid, due to the size of the databases. Potentially, given the number of publications we have, it could be days before the data on REPL_DB2 is back to the state it was before the snapshot was recreated on REPL_DB1.

    What advice/experience can anyone offer?

  • The vendor is doing this step ? SourceDB --> Repl_DB1

    Using SQL replication, or another process ??

  • The vendor is populating the SourceDB via their own routines, since the underlying data is not relational. Their process basically creates a SQL Server copy and broadcasts it to SourceDB. For our purposes, the SourceDB is the starting point, and we are using Transactional replication from SourceDB to Repl_DB1, and then again from Repl_DB1 to Repl_DB2

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply