• Ok... so here are some new information:

    I moved the TempDb to a separate ESATA drive on the server. My initial thought was that things seemed a little better but 5 hours later, if anything, the system is running slower than it was. Jobs that were running in 2 1/2 hours are now taking 2 3/4 hours.

    I have some more specific statistics to share.

    For SQLIO results, the Production server is running 881.66 IOPS with 6.88Mb/s on 8k and a 5 minute run. The Lab server runs 678.28 for 5.29Mb/s.

    On Performance monitor, I'm seeing the following:

    Avg Disk Sec. Read Write Transfer

    Primary 0.017 0.028 0.202

    EventData 0.017 0.028 0.028

    Log 0.000 0.003 0.003

    TempDB 0.000 0.004 0.004

    I just observed an event that has given me some pause. Maybe this will change the nature of the conversation. What I've been focusing on is procedure that fires off and moves data from a raw storage table to a normalized data warehouse table. This requires taking data from remote clients and looking up key values and doing data conversion and other actions such as detecting new data and creating keys, etc. On the lab server I can run 4 instances of this job offset by 5 min each at together they will clear 180,000 records an hour. The production server, not so much... However, I happened to be watching the server at the top of the hour when an external agent fires off a sync command to the remote clients and they begin dumping their records to the raw storage table. In this sync process, eight remote clients at a time will dump their records in batches of 100 records each. What caught my attention is that during this phase, the Database IO rate shot up to over 100Mb/sec and the processor which had been pegged at 50% utilization was suddenly peaking at 100%. After the sync process completed, the IO went back down to 0.3 and the processor dropped back to 50%.

    Could there be some odd result because this process is fired from a Job? Of course, I'm just talking out loud now... working my way through the system a piece at a time.

    Oh, and addressing an earlier question about the structure. The production server was just deployed with fresh scripts from the latest working Lab deployment that we have so I'm confident that the procedure logic is identical to what is working extremely well on the Lab Server.