Replication Scenario - Need Advice

  • Hi, I’m a developer and I’ve never used replication before.  I don’t have a specific technical question.  Instead I’m looking for some “replication wisdom”, advice, being pointed in the right direction.  I’m building a system with multiple databases; we have a different database for each of our company’s customers.  We would like a few of the tables’ data to be consistent across all the databases.  For example, one of the tables is Events.  An Event is like “Billy Joel in Concert - Madison Square Garden - July 21st”.  If one customer enters this event, we would like it to appear in all the other customers’ databases.  Additionally, we want to avoid duplicates; a second customer should be prevented from entering another event record as “Billy Joel LIVE in Concert - Madison Square Garden - July 21st”.  Also, as soon as a user enters a new event, that user needs to be able to work it quickly, not wait longer than 10 seconds or so.  New events will be added frequently and the table will hold 1,000’s of records.  There may be around 25 databases.

     

    The first solution I thought of was to have a central database (A) to which all new events are added.  Events would then be replicated continuously to all the other databases. 

     

    Suppose user Fred logs into database B.  He would add a new event to database A (where it would be

    validated as not a duplicate).  Then replication would add the new event to database B.  Then Fred could work with the event.

     

    I’m concerned that with this approach, if A is not available or the replication fails, then Fred is dead in the water.  I’m not sure if replication is intended for this type of near real-time synchronization. (?)

     

    The second solution is for Fred to enter the new event into database B.  He could immediately work with it.  Replication would then run to merge the new event in B up to A, and bring down any new events in A.  I’m concerned this approach might be more complicated in the way it avoids duplicates and maintains unique ID’s. (?)

     

    I apologize for this long post.  Thanks in advance for any help/advice.

     

    Edit: not sure if it matters, but we're running SQL 2005

     

     

  • Hi Mike,

    This sounds like an interesting problem. Unfortunately it's going to be difficult to offer you a definitive answer because so many things depend on what model of replication you use. And as you'll find out over the coming days/weeks careful thought and planning is needed for replication.

    Here are my thoughts though.

    The second option immediately strikes me as the more complicated approach (as you already suspected). The biggest problem I see with the second approach is resolving conflicts.

    Say for example, Fred adds a new event into database B. As it so happens, database A is offline (or the network link is down). This means that Fred's new event doesn't get replicated to A, which in turn means that it doesn't get replicated to the other 25 databases.

    Now, imagine that Tom, working on database C, tries to add (as luck would have it) the same event that Fred added two minutes ago. Becuase Fred never replicated his event no one else has knowledge of it so Tom was able to add the same event.

    In the mean time (for a period of an hour), Tom goes ahead and adds several records related to this event. As does Fred on his database, B.

    After a while, database A comes back online and Tom and Fred start replicating their data to A. At which point there's a conflict. Which event do you take, Tom's or Fred's. And whichever one you do take, it could mean losing all of the data related to the other person's event.

    Now, let's look at the first option. This one is much simpler, but as you pointed out, if database A offline then Fred is dead in the water. In fact, it's worse than that. Not only Fred, but anyone that wants to add an Event is dead in the water.

    This could be viewed as a good thing in the sense that at least it's going to ensure transactional consistency in a way that the second option doesn't. The draw back is that you are relying on this one database to handle the creation of new events and if it's down, no one can create new events - period.

    Personally, I would opt for the first choice because it maintains transactional consistency. It will also be easier to manage than the second option. And easier to setup because you don't need to think about what to do in the case of a conflict. You will need to appreciate that this approach presents a single point of failure to the operation of the entire system. If the machine or network link goes down then one one can create an event. So you need to build in lots of redundancy to allow for this eventuality. And you also need to understand what happens if the worst comes to the worst and it does go offline (including the redundant system). Can the business live with an outage that prevents new events from being created for example? If the answer is no, then you need to think up another strategy.

    Transactional replication can handle near real-time synchronization (with a latency in the order of seconds). This of course is dependant on various factors, including the speed of the network link, the spec of the machines involved, etc.

    The key thing to realise with replication is that it often involves having to make a compromise. If you want maximum transactional consistency then you lose your publisher goes offline. If you want subscribers to be able to work autonomously then you have to put up with transactional inconsistencies (which must be resolved).

    One thing I will say is that replication can be a big area and I wouldn't go into it lightly without fully investigating and planning the whole thing out. I'm not trying to scare you but I have seen clients set up replication (because it was easy to do) but they didn't consider the management overhead and ended up getting things drastically wrong.

    You'll also need to consider whether you want to get a seperate server to handle the distribution side of things. This is dependant on how many transactions you're going to be replicating, the volumes of data, and how beefy your primary server is. Just be aware that replication does place a performance overhead on your system so you need to think about that too.

    Also bear in mind that managing/supporting replication will require a full-time DBA if you don't already have one. I'm not saying that the DBA will work on managing replication full time. But things will go wrong and without proper monitoring and support it can get out of hand sometimes.

    The reality is that I don't know enough about your current design to give your more specific answers but hopefully I've given you some things to think about and consider.

    Hope that helps rather than confuses the matter .

  • Karl,

    Thank you so much for generously giving your time and expertise to my problem!  Your post helps me very much.  It definitely gets me started with lots of good info and things to consider.  I have a lot of learning/planning to do.  (Unfortunately we don't have a DBA.  Sounds like the first approach would be easier for a replication newbie to implement, but it's the approach that is more dependent on keeping the replication up and running all the time.)

    -Mike

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply