I need to copy a number of tables from an ERP system's Progress database to a SQL Server database that will be used for reporting. I have a working package, but was wanting recommendations on how to improve the performance.
The Progress database has tables grouped by function, for example AR (Accounts Receivable) and AP (accounts Payable). I have strung a number of Sequence Containers together, each containing the transfer steps for each grouped function. They are linked together and are set to continue on success. Some tables will be completely replaced. Other large tables that contain data that can be grouped by year, like order history, are usually partially replaced, but may be fully replaced. For example, I will normally delete the last 2 years of order data and replace it. However, once a month I will do a complete wipe and refresh.
Each Sequence Container contains the following steps:
1. Drop Indexes - An Execute SQL task that drops any non-primary indexes from the tables getting transferred
2. Truncate Tables - An Execute SQL task that truncates any tables that are being fully replaced
3. Delete Tables - An Execute SQL task that deletes data from tables that will be partially loaded, like the order data.
4. A series of Data Flow tasks for each table. Each task does an ADO Net Source to an ADO Net Destination. The source being a SQL query that pulls only the columns I need. The destination being a SQL Server table. For large tables, I will use a For Loop Container to load data in chunks, for example, order data by year.
5. Recreate Indexes - An Execute SQL task that creates any non-primary indexes for the tables getting transferred
Nothing overly complicated here. I do have a couple of tables that are driven off of other tables, like the order detail table. That mainly affects the way the ADO Net Source query is generated. The Truncate/Delete tasks may also be enabled/disabled based on whether it is the once a month run or the daily run.
Like I said, this works fine. It just takes over an hour and a half to run. I did try doing a couple of tests with one of the tables. The first test was to write the source results to a flat file and then using Bulk Insert task. I also tried changing the ADO Net Destination to a SQL Server Destination. This one required me to add a data conversion transformation step due to the way the Progress ODBC returns string and date data. Both attempts took near or slightly longer than my original method.
Should I break all this up into separate packages? Should I try to flow these so there are multiple tables being concurrently loaded? Anything else I should be looking at?