How to commit transaction log during a large ETL process

  • Hello SSC!

    First, I would like to thank everyone who has responded to my posts. This website has helped me tremendously over the years!

    My question...

    I have a large ETL process that I inherited. I have been tasked with optimizing this process which uses SSIS and plenty of stored procedures. The source tables are very large, over 500 million rows.

    Is there a way to commit the transaction log after a proc executes, so the log doesn't fill up and kill the job? I know the keyword "COMMIT" supposedly helps, but I have had instances where it doesn't do what's expected or I did not use it correctly. I am a developer who has dabbled in the Admin arts, but certainly I am no expert. So, I ask the SSC experts. Looking forward to your responses.

    Any assistance would be greatly appreciated!

    Dave

    The are no problems, only solutions. --John Lennon

  • Is there a way to commit the transaction log after a proc executes, so the log doesn't fill up and kill the job:

    Smaller transactions: as long the transaction is running the log of it can't be truncated as it needed for rollback

    Regular log backups: archive non-active portions of logfile

    Different logmodels: full (point in time), bulk-logged (as for etl-loading processes), simple (no point in time)

    https://www.mssqltips.com/sqlservertutorial/5/sql-server-bulk-logged-recovery-model

  • Thank you Jo,

    This is very helpful. The recovery Model is set to Simple, so I don't think that is the issue. I will take executing in batches into consideration, the only issue is that I have a lot of procs to consider.

    I am really looking for a way to commit everything in the transaction log after a proc completes successfully. Not sure if that is possible or recommended.

    The are no problems, only solutions. --John Lennon

  • In the OLEDB Destination - in SSIS - you control the batch and commit sizes.  If you leave those as default then all of the rows processed through the pipeline will be committed in a single transaction.

    You can set a reasonable commit size and after that many rows have been inserted - a commit will be issued.  Not only will this manage the space in the transaction log, it can dramatically improve the performance of the data load.

    Also check the batch size - adjust as needed.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Lord Slaagh wrote:

    I am really looking for a way to commit everything in the transaction log after a proc completes successfully. Not sure if that is possible or recommended.

    Simple recovery mode already auto-commits transactions as soon as they complete, so there is no need to do it after a procedure finishes unless you have created an explicit transaction before executing the procedure.

    Do you have a problem with log space? If so then I would identify what step is blowing up the log and think about how do the same thing in smaller chunks. You mentioned stored procedures so I assume the ETL is using execute sql tasks not data flows. is this correct? Data flows automatically commit data in batches, but if you have sql code performing large inserts you may need to make the inserts smaller. It's possible you could take advantage of bulk logged inserts, but that's unlikely to be a single solution.

  • Hi Jeffery,

    Responding here for other users with the same issue....

    Changing the commit size and row batch worked like a charm! Plus, I dropped the indexes on the table before loading and recreated them after the table was loaded. Forgot that loading a table with indexes causes issues. Indexes should be used for SELECTS and not INSERTS.

    Thanks again Jeffery and the folks @ SSC!

    Dave

    The are no problems, only solutions. --John Lennon

  • Thank you for the update - glad this worked for you.

    FYI - my approach is to disable the non-clustered indexes and rebuild all indexes after the data loads.  I only do that if I can see a clear improvement in the overall process - but in general, rebuilding all indexes after a load works well.  If the loads are more than once a day I probably would not disable/rebuild every time - it really depends on the workflow.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • This was removed by the editor as SPAM

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply