I'm a newbie here with SSIS. I've worked extensively with multiple versions of SQL Server for about 20 years, but am working on a project that's using SSIS extensively.
The project involves polling a specific folder once per day to determine if clients have uploaded files into this folder. If so, I've got coding in place thus far in SSIS that loops through each of these files in the folder. Next on my list is the required coding to process each of these incoming Excel files and ultimately get the structure of the files adjusted properly to where the meet the required schema for import into an underlying SQL Server database.
Before I get into any of the data insertions from the Excel spreadsheets to the underlying SQL Server database, I'm researching exactly how to apply various "data validation" criteria to each Excel file to verify that the client has indeed sent me a "valid" Excel file that can legitimately be worked with in making any necessary transformations to the file to get it in the proper structure for import into a SQL Server database. I'm having problems determining exactly how/where this data validation will need to take place. I'm not even sure if "data validation" is even the right thing to call what I'm trying to do.
I'm trying to figure out how to look at a few important columns within each Excel file and determine if the data within those columns is of a valid/expected format to even warrant dealing with the file, or if the file needs to be "failed" so we can send by to the customer for adjustment and resubmission to us.
For example, if SSIS finds that the "SSN" (social security #) field unexpectedly has "Hello" in that column for one of the records instead of the expected 9-digit format for an individual's social security #, that's a big enough deal to warrant failing the file to not be dealt with in the import process.
I'd like to have multiple data validation events occur simultaneously in addition to the SSN field example shown above. I might want to validate that all SSN entries contain 9 digits, as well as making sure that all entries in the "State" column contain a 2-letter state code, and also might want to verify that the format of the entries in the "ZipCode" field follow the standard 5-digit zip code format. If I find only 4-digits in a row's "ZipCode" field or I find "Chevrolet" in the "ZipCode" field for a record, or I find an issue with the "SSN" or "State" fields as described above, SSIS needs to somehow locate those issues to where I know to stop the file import process and flag things to send the file back to the client to correct those issues.
I've researched Derived Columns and Script Tasks and other items and just can't seem to land upon a clear example / understanding of how to implement the initial data checks / validation efforts such as what I've described above.
If anyone can clearly explain how to accomplish what I've described above or can point me in the direction of any online articles / tutorials that directly address how to accomplish this sort of thing, I'd be greatly appreciative.
Thanks in advance!