Initial data dumps

  • We have a fairly large production database for few hundred customers. When we add a new customer, we need to import data from the customer system into several tables. Currently we do it straight into production tables that are in use by existing customers. The import process is asynchronous which doesn't guarantee correct order of loading tables, so we are running into reference integrity errors. Another problem is timeouts for existing customers that happen frequently during adding new customers.

    What would be a good solution?

  • Eliyahu (5/21/2013)


    The import process is asynchronous which doesn't guarantee correct order of loading tables, so we are running into reference integrity errors.

    Do you mean it uses SSIS or something similar? I'd tend to load a staging area first (loading empty copies of the tables in the same or a different database first). Then load the 'staged' data into the production database in a transaction scope (if possible) and in a defined order with a series of INSERT statements, probably in a Stored Procedure.

    [Edit] better explaination!

    .

  • Thanks.

    We don't use SSIS, everything is done by a Windows service with regular INSERTs. The Windows service gets records to insert from MS queues, one queue per table that's why the whole process is asynchronous.

    I am also thinking of a set of staging tables, perhaps in a different database. One concern is maintaining the staged database schema synchronized with the production. Is it a good idea to re-create staging tables on fly just before the dump?

  • I don't usually say this but your problem is a bit too large for a forum post. The amount of details and discussion required to get into the nitty gritty of the best ways to work with this are huge, and usually best done locally with an expert.

    However, there are some basics that can be approached independently.

    First: You have asynchronous data loads going into a key dependent system without order controls. You need to repair this. There are a few ways to deal with this but in general how do you know when the customer system (for lack of a better term) is done loading your queue? The entire purpose of the queue is to allow for eventual data transmittals.

    Can the Customer System send you entire hierarchies for your system, that you can then break down and determine if the header data needs to be re-inserted? For this particular instance this would be a better approach so that all of your detail records would fill in their necessary references on inbound.

    Second: You're locking tables on users during large data inputs.

    This defeats the purpose of your queue systems, which is one of the major reasons to bring them into play. Either you're backlogging the data into the tables and thus keeping huge locks open, or you have too many activations against the queue. Determine why these timeouts are occurring, and adjust accordingly. It may have something to do with the foreign relations and/or internal scans the activator procs are performing to check the data.

    Third: Your windows service is using direct inserts. Gyeah.

    Call a proc for each activation. Depending on what format the message is in (usually XML) will depend on what you do with it from there. It's possible this is tripping on the above where the insert is sent down and the service isn't waiting for a 'success' before sending the next one down, or it is multi-threading and sending down multiple inserts simultaneously.

    Staging tables won't help you directly here. The entire point of the queue system is to avoid locking up the production system in the first place. Take a massive staging component and shove it into all the base tables and you'll end up with lockups still. However, it might be worth getting the customer data into staging and then working from there. It depends on if you're trying to use an existing queue updating system that keeps the other customers current or if this is an independent queue.

    There's so many more questions to ask... I hope that helps a little though.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks, Craig, this definitely helps.

    Customers systems actually send export files, then, already on our end, we have a process that parses the files and write messages to queues. On the other end of queues we have that Windows service that turns every queue message into an INSERT.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply