I am looking after / building a solution that loads data from flat files using bulk insert.
The file contains a unique reference (PolicyID) which is in each file but the system architect has requested that each file load is uniquely identified. To do this we have added Validfrom and ValidTo columns to the table. The Bulk Insert now uses a format file to skip these columns as they are not in the source file.
The whole process is
Update the current records which have a dummy end date (31st Dec 2099) with today's date.
Bulk insert the file with the ValidFrom and ValidTo dates being NULL.
Update the new records with ValidFrom = todays date and ValidTo = dummy date.
I have an index built (Unique but not a PK and not clustered) PolicyID (ASC),ValidFrom(ASC),ValidTo(ASC)
The data load seems to be reasonably quick; But I think the index update is taking a huge performance hit. The file is about 750K records and the first update will be updating records at the end of the index pages, then the load will load with NULL values which I am assuming are going into the front of the index, then being rewritten to go to the end of the index. It this a reasonable assumption and if so,
how do I measure it whilst the script is running, and is it worth turning off the index whist doing the load and then rebuilding it. The data will be loaded on a weekly basis and I need to replicate the process across about 20 files of differing sizes.
I will normally only be reporting on records from the most current insert, looking for a specific policyID, so should I change the order or direction of the index fields: PolicyID and either ValidFrom or ValidTo should be enough to give me a unique key. Is it worth dropping either of the fields from the index.