I am looking at automating a repetitive task (using SSIS) and would be interested to know if anybody has alternatives or pros/cons on the design I have so far.
External parties send us data in Excel sheets, based on 20+ templates, due to the bespoke nature of the system, a new template could be requested at any time. A LOT of time is spent manually importing the data into SQL, then cleansing and checking, only to realise there is a problem, the import request is rejected and the process starts again 🙁 . The rub is that once this system is setup, there will be limited developer support to create new packages when new templates are introduced and therefore is a real risk it will become less useful as time goes by - the type of checks are always the same (validate Length, validate against List of Values, validate against another column in another Sheet) .
I get around the differing templates by importing the data into a staging table with columns [F1] to [F26], all at nvarchar(255), the excel template structure is modelled in TSQL so we know which columns need to be validated E.g Template 1 uses [F1] to [F7].
My plan is to store column specific cleanse commands and check queries in the config database itself and substitute the column names at runtime, allowing the user to maintain all template info and checks without having to create specific ETL packages for each scenario. E.g If template = X and column = X this needs to have LTRIM(F1), RTRIM(F1), REPLACE(F1, ",", "") and "Select Cols from RawData where [F1] not in (select Col from ListOfValuesCheckTable)". This would really be SSIS executing SQL commands in batch rather than using native SSIS transformations.
Any suggestions/pitfalls/alternatives for implementing such a system?
Thanks in advance.