I have used SSIS a bit in the past (2005), but I am embarking on a new BI project and want to know the best practice for using a single vs multiple data flows. My scenario is, I have 16 databases that are 'almost' identical. Rules needed to clean and transform the data will be 90% the same, but each database will have a few outliers that will require special steps just for that db's data.
1. Create a single data flow with 16 data sources that each have a couple db specific steps before hitting a union all clause and running the other 90% validation at once? This helps keep logic implemented in only one place even if it is more complicated to handle all the different use cases.
2. Create 16 data flows, one for each database, and duplicate all the logic in each data flow to appropriately handle the db specific issues? Each data flow is smaller, but duplicate logic is spread across the package.