Dealing With (Relatively) Large Data

  • Sanity Check:

    I have a large table - 85 million records in one server and database that has a twin on another server and database. About twice a week one of the tables gets updated by a million records (updates and new records). I suppose the problem with keeping them in sync is resources. Truncating the destination table and using a dataflow task to copy over the entire table takes a long time and times out or overflows the window to accomplish the task. The servers are linked, but using a distributed query in the source SQL Command (logic: where source RecordID not in select RecordID from Destination) also takes a long time and times out. One thought was copying over the whole thing and telling the destination to ignore errors, however constraint violation on the destination when the RecordID exists is not ignored and fails the package.

    Questions:

    What is the best way to go about this with large tables?

    Would 'Redirect Rows', say to a text file, rather than 'Ignore Error' allow the system to go ahead an ignore the 'bad' records and just insert the new records?

    Thanks for your help.

  • The way I would do it would be to use a staging table on your destination server to stage the 1 millions updated rows and then do the update via a t-sql Update statment (MERGE if you were on 2008) this would do the update in a set based manner and should be a lot quicker.

    You will aslo be able to include the check for existing records in the update so you can avoid errors, assuming your errors were related to PK violations.

  • Thanks for your reply.

    The problem has been identifying the 1 mil records to copy over in the first place. We don't currently have a way of tracking the changes in the source table without comparing them with the destination table. This is not an ideal situation. I should have started the thread with that statement.

  • You're going to need a delta mechanism, either an audit table with the pk and datetimestamp, or adding in a datetimestamp column on the main table. Or build out two tables with the pk and the checksum and compare those (or use md5 hash or whatnot, whatever collision avoidance levels you need). There's not really going to be a pretty way out of this...


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply