etl2016 - Sunday, November 19, 2017 3:31 PM
If you're going to used TYPE 2 SCDs, the DON'T use a "current-ness identifier column". You should have two columns to identify the start and end dates of when the row was valid. The start date should be when the row (identified by some key) is first inserted. The end date should be the starting date of when the row next appears in the spreadsheet OR it should be CONVERT(DATETIME,''9999') for the "current" (latest) row so you don't have to muck around with double-checks in the criteria you'll eventually need to write against the table.
Obviously, when you end up with a new row for a given key, you'll need either for the loading process to find and update the end dates of existing rows or a trigger to do the same. Despite the coming objections of some of my peers, I recommend the trigger method so that someone else can't screw things up for you. Done correctly, it will be as fast or faster than a separate chunk of code in your import process.
As for data quality, I recommend NEVER loading data from a spreadsheet directly into the final table. Always use a staging table to pre-validate the data.
--Jeff Moden
Change is inevitable... Change for the better is not.