July 20, 2004 at 6:59 pm
Hello all,
I've written a dataloader for my company using a DTS package. Basically the process is this ...
1. File downloaded in CSV form.
2. DTS package loads data into table CCS.
3. Several stored procedures are called for validation purposes.
Very simple, right? Indeed it was when the process was designed and coded. However, now that the system is in place there has been a change. Now the downloaded file will also contain historical data, therefore there will be a tremendous number of duplicates. Not only will there be duplicates, there will be updated copies of old records with one field changing. The field that will be changing is a datetime field. IF it changes it will go from NULL to a valid date value. If that happens I need the DTS package to recognize that it is an updated copy of the original record and simply change the field in the existing record. If however it is NOT an updated record then it should be discarded unless it is a totally new record.
Did that make no sense at all? If not I apologize as I'm a bit frustrated. Do I need to re-engineer the whole process? Or is there a way to make my DTS package smart enough to differentiate between the old/new/duplicated/updated data?
Thanks very much in advance for all your help.
July 20, 2004 at 8:04 pm
Sounds nasty! First point to make is that if you do not have a shared unique identifier (or combination of identifiers) that will allow direct matching between the imported data records and the records in the system, there is no way that you will be able to update existing system records from the imported data.
Assuming you do have this, I would recommend creating an intermediate table into which all of the data is imported. Then run several processes against this table to fully process the import, along the following lines:
1) Delete existing records in the intermediate table
2) Validation of data import as required
3) Update matching records in system if datetime field does not match
4) Insert non-matching records into system
This breaks the process up a bit, rather than trying to do everything as part of the import. It also allows you to perform set-based inserts/updates, rather than record-by-record, which will be a performance improvement.
Regards
Phil
Viewing 2 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply