We were experiencing this issue this past weekend. We have a large volume of SSIS jobs that run thoughout the day (24x7). During the nightly 1:00am run of [SSIS Server Maintenance Job], the transaction log was filling up our disk drive and impacting other more critical processes. The issue is the TRY/CATCH containing the delete loop. That's one huge transaction that was causing the problem. Our retention window is set to 1 day and we were still having issues.
I addressed this in a similar manner by making the following changes:
- Cloned [internal].[cleanup_server_retention_window] to my own version of the stored procedure.
- Added a variable for the max number of the rows to delete. I set this to 5000 so we delete 5000 rows and get out with reduced transaction log growth.
- Added a variable to accumulate the number of rows deleted during the loop iterations.
- When the max number of rows to delete is exceeded, I BREAK out of the WHILE loop. There are actually two WHILE loops in the IF-THEN-ELSE where I check if the max number of rows is exceeded.
- Lowered the @delete_batch_size to 100 from 1000 since we're dealing with cascading deletes.
- Added a PRINT statement at the end of the TRY/CATCH so it's present in the job history log.
- The job was originally running once a night at 1:00am. I changed the schedule to run every minute between 1:00am-2:00am. This results in the transaction log being hit in smaller chunks with breaks in between job runs.
- Increased the size of our disk drive from 50GB to 300GB for more breathing room.
- Set the size of the SSIS data and log files to 40GB each to help address any file growth delays that were occurring.
- The database is already in SIMPLE recovery mode, so no change needed there.