Single versus separate databases

  • This is a question that crops up a good bit where I work. We have several applications that we support. Some are developed internally, some externally, but most are dependent on some key tables that store data on personnel, facilities, etc. Most also only work with a smaller subset of the larger data set. As such, I can see a tremendous value in storing master tables with the full set of information in them so that any information updated is instantly available to all applications that use that data. However, that can mean that a lot of completely unrelated information is being stored in the same database, which I can see becoming a headache to maintain security, documentation, etc. I also know that most commercial apps prefer to have their own "space".

    I'm just curious to see what thoughts and/or guidelines other people have to determine whether to store the tables for an application in the main database versus creating separate databases. If you do separate the data, what do you use to replicate the information from the main database to the other tables? Do you use SQL Server replication, or do you do your own ETL? In our case, full replication seems like overkill since I don't expect the combined resources to overtax even a single database. (I'd guess order of magnitude would only be in the hundreds of thousands of rows.)

    I posted this in the SQL 7,2000 board just because that is what we are using now, but I'd be interested in feedback as well on whether your recommendations change moving ahead to 2k5.

    Andrew

    --Andrew

  • We develop an app (Medical Records / Theatre Costing) that needs to know a LOT of information that's created by the hospital's billing system. We maintain our own database and copy of the patient list, theatres, doctors, procedures, etc for several reasons

    1. We need to integrate with several different billing packages (we are semi "off the shelf" I suppose finally). Some are MS SQL but many are FoxPro based. Real-time access is out of the question

    2. We store some extra information or they store some extra information (depending on the system and the case in point)

    3. I'd like to be able to ship updates to our software that don't interfere with the other systems. Likewise the vendors from the other systems wish to do the same.

    4. Since we have a pretty reasonable database structure it's a "bonus" for our client that they get a nice single repository that they can report from for simple patient demographics, number of procedures, etc.

    5. Our software IS the medical record. It MUST endure for many years even if the billing system changes or a billing system is no longer in use. This is the answer that makes most people go "ooohhh. Yeah fair enough - you keep your own copy". I hate trying to justify why our software doesn't just read into the tables of other software. The explanation that they'd be an orphan site doesn't seem to phase them unfortunately...

    As for how we do the sync, I have a number of stored procs in our database - one for each system with which we integrate. The stored proc accepts the name of the billing system's DB. I use linked servers to access MS SQL 2000 DBs as well as FoxPro, etc. Because the database name is dynamic the proc uses dynamic SQL and temporary tables to pull the data from the other system. It's then a case of

    1. Update data that's changed

    2. Insert new data

    Note that you do NOT update data that has not changed because whilst you know there's no change, you're still updating the rows which is both a waste of time/cpu and will also grow your transaction log needlessly.

    We do this, depending on the site, once every few minutes. At some sites it's as often as every minute. It syncs 20-50k patients, probably twice as many procedures again, etc.

    The code is broken up into very small transactions to avoid significant locking issues. If you don't break the code up into small transactions you WILL encounter locking problems. If it's a process run once each night then you needn't worry so much but we had to have the above ETL running live all through the day so that users can switch between systems and see newly entered data within a reasonable amount of time.

    In health there's a standard called HL7 to automate this mess - no one follows it the same way 🙂 Thus it was easier, faster and cost less $ to do it in SQL rather than via HL7.

    Hope that gives you some food for thought!

  • If you are an all SQL Server shop then with SQL 2005 you would want to look into using Service Broker which allows for Asynchronous transactions (not in the DB sense). Another thing I have been involved in in the past is using message queuing so the source system writes to a staging table that is read by a windows service that writes the message to a queue on modification and there is a service that reads the queue and writes to the source db. This allows for asynchronous updates and, in the event of an outage, queues the events until the reader service/destination is available.

  • I don't know that realtime updates are that critical. I work for a university, and in our case we are already receiving nightly feeds from the official student system. It's just that we receive several of these feeds for each of the small web-based components that my office manages. Each of these stores a different segment of the overall student population, and there are obviously different details for each as well. We also pull in staff data from AD.

    We would ideally like to tie all of these together into a common web site so that students or faculty can sign into a single location and have links to the various applications as needed. This is where having a central parent database would make user management and basic security much simpler. (We pass off user authentication to other systems, so we aren't storing passwords.) However, each application has its own security structure and rules and its own segment of the user population. In that case, it seems to make sense for each application to have its own database for storing this information. Some of these are third-party commercial applications that insist on their own database. We are also trying to distribute these applications across different physical clusters/machines because a lot of them have seasonal peak demands.

    I guess what we're trying to decide is how much information should be stored in a central database, how much should be stored in separate databases for each application, and how best to update handle updates from the nightly feeds. I like the idea updating the main database and then distributing changes as needed to the child databases, but given the feeds we currently receive it may be necessary to go the other way. In either event, are there simpler ways to set this up than SQL Server replication?

    Andrew

    --Andrew

  • IMO, replication is going to be the easiest thing to setup. With replication you set up the publication(s) and subscribers and SQL Server does the rest. In any other solution that has been mentioned you will need to write a lot of code to do the same thing that replication does.

  • Jack Corbett (3/4/2008)


    IMO, replication is going to be the easiest thing to setup. With replication you set up the publication(s) and subscribers and SQL Server does the rest. In any other solution that has been mentioned you will need to write a lot of code to do the same thing that replication does.

    In many enterprises that I've worked in the past years when we needed to share information across applications (different DB's) we've normally created a Central Repository of date. That has helped us on maintaining the information in 1 spot (VERY important).

    Remember that when you have dispersed information (even with replication) there is a risk of maintaining that (thinking in Availabiliyy).

    Because SH1T happens and replication causes overhead and it CAN fail.

    Cheers,

    Gonzalo

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply