• Well, just tonight I've wrapped up my work on this issue and thought I'd post my findings for those who might find it useful.

    When the problem between the lab and the production server first turned up, I did get focused on what obvious information was not up to performance, i.e. the Database IO rate which was originally 1.5Mb+ in the lab but only 0.3Mb in the production server. However, when I brought the production database into another lab server with the same configuration as the original lab test environment and it still produced on 0.3Mb throughput, I started digging deeper.

    As I mentioned in another post, I discovered that there was a significant difference in the query plans created by the two servers. Since I have built a little monitoring tool for my Clustered indexes, I knew that the fragmentation rate on the two servers was roughly identical and I routinely rebuilt the indexes to ensure current statistics, etc. After studying the details, I traced the differences to the size of the data tables. While we had created a lab environment which stressed the number of transactions per hour, it did not approximate the number of detail records of the production environment. The process that I was working with is taking text based data and normalizing it into the appropriate Primary Keys including creating new keys/records where we don't have an appropriate record.

    I created a clean database and re-processed my production data and noted that while the initial inserts happened as quickly as the lab server it did not take much time at all for the processing performance to degrade.

    After studying the data for a while, I decided that direct table access via the indexes was causing the core problem. My processing procedures are in a Master/Slave arrangement where the master procedure picks the work to be done and then calls the slave procedure to actually handle the detail work. I simply added a new set of steps to the Master procedure to create 5 temp tables representing the known record/details for the five object classes in the project. Now when the detail procedure executes it merely performs a join between the transaction record and the 5 temp tables to determine which Objects are unknown and then when necessary it calls another procedure to create the missing object. If all items are known then the appropriate keys are supplied and the procedure inserts the new record into permanent storage. In my original design, the production database was running execution times of 500ms+ but in my new structure most executions run in 1-2ms and the long runs, where there are new records to create, take 30-50ms. A VAST improvement, don't you think? 😀

    On the lab server I had been able to boost my throughput to 180,000 transactions per hour but with the new procedure I've been able to boost that performance to 360,000 transactions per hour. This should keep me well above the actual transaction rates of the production environment. And that is still not 100% utilization. I'm using two jobs which handle 30,000 items in less than 3 minutes which then I stagger so that every 5 minutes one of them is processing records. This gives me the throughput and little to no contention for resources. Oh... and on the Database IO performance in the Activity Monitor. Using my new procedures the production database is now showing 2Mb-3Mb IO throughput so easily a 5x to 10x performance on the IO as well.

    So, I would like to offer a final big thank you to all who took the time to respond to my post. I appreciated the willingness to share your professional insights and ideas as I worked my way through this challenge.