OK so the reason behind this is a follows:
1. I need to create several type 2 dimensions from OLTP tables which have been put together as track historical data changes as this data is held in the source system. Each table has a primary key, date updated, date row start and date row end.
2. The first requirement is that I need to build up the history in these new dimensions incrementally from the beginning of time to the present day.
3. I will also have to create and load transactional fact tables and join the to these dimensions (at the right point in time) to obtain the correct surrogate key.
4. Once I reach the present day the ETL process will then run incrementally for each day thereafter.
5. The changed records are staged then verified in the merge which maintains the dim so extracting the same records into staging won't pollute the dimensions, the only drawback is the inefficiency.
So ideally I would look to write one import routine which would carry out the one off historical load and then continue daily. This brings me back to the issue I raised originally as I only wish to pick up changed records.
There is an identity on the source tables but this cannot be trusted as it may not be in the correct order. There is also a date updated field which can be used going forward but it's use was inconsistent previously.
So the options I have thought about are as follows:
1. The historical data load is a one off process so I may have to bite the bullet and stage the unchanged data time and time again and let the merge deal with the actual changes.
2. For historical proces only: Maintain a table of the identity values loaded for each table and use this in a not in statement to prevent picking up the same records again and again.
3. For ongoing load processes I can use the date updated column to establish only the true changed records.
I would be interested in to hear your opinions