RE: Transaction Log Growing Despite Simple Recovery

SSCertifiable

Points: 6762

April 19, 2010 at 12:26 pm

Lynn Pettis (4/19/2010)
All of this:
The first INSERT statement moves about 800,000 rows from these tables. The second INSERT statement creates about 200,000 rows in the fact table by querying the fact table itself.
Is written to the log file before it is written to the database (not completely true, as some data may get written to the database before it is all written to the transaction log). The reason for this is in case of a failure and the transaction needs to be rolled back before it is completed (such as a server crash).
If the transaction log has grown to 9 GB during this process, then that is how large it needs to be. If you are concerned about the size of the transaction log, can this stored procedure be modified to complete the load in smaller batches, say of 100,000 rows each instead of 800,000 and 200,000 respectively?

This makes sense. I've had really large INSERT/SELECT statements fail 3/4 of the way through, and the destination will still be blank. I figured it was using tempdb, but my tempdb is 5G, and typically never uses more than 3.

I suppose I could build a loop into the process, I would probably have to add an identity key to the intermediate table. Not a huge problem. It shouldn't add too much time/effort to the creation of the intermediate.

What confuses me is that the creation of the intermediate table doesn't cause the same bloat. The intermediate table is created using two different procedures, each adding data from one system. My numbers are a little off. The intermediate table is actually slight over a million rows. 800,000 of those come from one of the two systems. Those 800,000 rows are moved in a single INSERT/SELECT (very similar to the procedure I'm having a problem with). I have confirmed that the transaction log was at 2G after this intermediate step ran. It grow from 2G to 9G during the final step of processing, in about 5 minutes time. It seemed strange, so I didn't think that the data mass alone was causing the problem...