RE: load five sources into five destinations

SSChampion

Points: 10969

December 23, 2008 at 6:59 am

I like the idea of moving the files and we currently use the file watcher in one of our ETL's, works great. If you can't move the files, here's a couple options, both require either a timestamp or a unique key.

If the key is available you can re-read the file each time and only add records where the key does not exist in the destination. Do this by staging the flat file to a temp table then use a merge join set up as a left join.

Join the new data(left side of join) to existing data (right side of join) just include 1 piece of key data from the right side to check for null. Use a conditional split and only output data where the piece of right key data is null.

If the data is timestamped then do as above and stage the entire flat file. Check the destination data for the greatest time stamp value and select all records that have a timestamp > than the max of the destination. This may be prone to errors depending on the accuracy of the timestamp and frequency which records are added to the flat file. A way around this would be to select records from the staged data using a range, the low end would be > max of the destination and the high would be the max of the staged less 1 of the smallest time unit. So if are dealing with seconds take the max stage less 1 second, that way you know all entries with that time value are in the stage table. hope that made sense.