In my last job, the first project I had to do was building 20 dimension tables using text files created by the mainframe system. Different mainframe programs produced the files, so the text files did not all show up in the directory at the same time.
I would use DTS package "Bulk Insert Task" to load the data into SQL Server tables. Each text file had its own format file and the DTS Package would check if the text file existed in the directory. If the file did not exist in the directory, it would loop through checking if the file existed every few minutes until it passed certain time, then the DTS Package would stop and report the file was missing. If the file existed and the creation date was the current date, then I would load the text file into a staging table. If the loading was successful and there was data in the staging table, then I would load it into the dimension table. The text file then moved to the backup directory.
There were two ways that I could have done this: the first was I could have written one big DTS Package to load all twenty files; the other was writing 20 different DTS packages. If I wrote one big package and one of the file had problem, I had to restart the package and load all twenty files again. If I wrote twenty different packages, it would be a DBA nightmare to migrate the packages from one environment to another environment.
Actually it would be my nightmare; I did not have permission to see the DTS packages and the jobs in production because of SOX. So when the DBA moved the DTS packages and the jobs to production, I could not verify they still worked. Although the DBA manager was very confident that his three DBAs were very capable and would not make mistake, I was a little skeptical. Everyone made mistakes; I already had found a lot of mistakes that they had made before, so I took a different approach.
I created a table that had the following fields: SQL Server name, database name, text file directory, text file name, format file directory, and format file name, number of column in the text file, table name and an active/inactive indicator. The indicator would be used to turn off the loading if the dimension is no longer needed.
I created a procedure to load the data into the table. In the procedure I read the table and created a loop to use "dtsrun" to execute the DTS Package. I put the text file, the format file, server name, database name, table name as parameters. Since the package used Window authentication, I did not need to put in user id and password in the command.
The DTS Package had the following global variables: SQL Server name, database name, text file, format file, SQL server table. The "dtsrun" command had all these fields as parameters.
I used the Dynamic Property task to get the the SQL Server connection. Since I put the SQL Server name and data base as the global variables, I used the task to change the data source and the catalog to use the server and database name from the global variable. In this way, when the DBA moved from one environment to another environment, they did not need to open the DTS Package to change the connection. The connection server and database were stored in the table.
Then I wrote an Active X script to update the "Bulk Insert Task" for the right text file name, format file name and the SQL Server table name from the global variables. The script checked if the file existed in the directory and if the creation date was current date; it then loaded the data into the staging table. If the staging table was empty, an email would be sent out to the programmer; otherwise it loaded the data into the real table. The last step moved the text file into the backup directory.
The procedure ran fine; however I found out the files were running in sequence, and some of the files were big and it took a long time. This was exactly what I did not want, otherwise I would have created one big DTS package. Also if one file had a problem, the DTS package would fail, but the procedure would continue to run for the next file. However, it would be hard to rerun the file that had problem.
I changed the procedure so instead of using the "dtsrun"; it created a temporary job to process each file. The job had one step, which was to execute the DTS package. Then if the job finished successfully, it would kill itself. In this way, all files could be processed at the same time. If any one of the file failed, the job remained in the job queue. I could just fix the problem and started the job again. I did not need to run all twenty files. Also later on when I have to add more files, all I need to do is to add an entry in the table. I do not need to change any procedure or DTS Package.
Someone may think I made the whole process too complicated. Actually later on I expanded the table entries so it could load the data from an Oracle database to the SQL Server table. I agreed the procedure was a little complicated for some of the experienced SQL server developers and the junior DBA had a hard time understanding all of it. However when I tackle a problem, I like to look at the big picture and design a long term solution.
It may be easy to make 20 DTS Packages, however when the DBA moves from one environment to another environment, they have to open each one to change the connection. Also when the data warehouse grows, it would add more and more dimension tables, which mean more and more DTS packages. When the DBA moves a lot of packages from one environment to another environment, it is easy to make mistakes; for example they may forget to change the connection property or they might miss moving one package. (Of course the DBA in that company NEVER makes mistakes!!!!!)
One simple mistake like missing one dimension table or missing data in the dimension table can cause the data warehouse a lot of problems, especially nowadays that data quality is considered the main concern of building the data warehouse.