Row limit for Updates before transaction log fills up

  • Hello SSC,

    I hope everyone is staying safe and healthy.

    I am using SQL 2017 and I have over 17 million rows that I have to UPDATE. Obviously I have to do this in batches, but what should the batch limit be? In some of my other scripts I have UPDATED 1 million at a time. Can I increase this?

    The obvious issue is filling the transaction log, which I want to avoid.

    Any help would be appreciated and thank you all in advance.

    Happy Holidays,

    Dave

     

    Everyone has a plan until they get punched in the mouth. --Mike Tyson

  • There's no magic number as how much log space each transaction uses will depend on how much log space each record needs to run the update.  And how much drive space you have for your log as well as how often the log backups are running.

    • This reply was modified 1 month, 1 week ago by  ZZartin.
  • The other bit of "fun" with that, on top of what ZZartin siad, is that the TLOG may be in use by other queries that could be inserting, updating, or deleting data.

    So for example, 5 million may be good when the system is idle, 3 million may be good when the system has average use, and 1 million may be good when you have heavy usage on the system.  I made those numbers up mind you as I don't know your system or what you are changing in it, so it is impossible to say with any certainty.

    You may be writing 1 GB of data to the TLOG file with 1 million rows (for example) and if your TLOG file is only 1.1 GB in size, 2 million rows would overfill it.  So you'd need to do the 1 million rows, take a backup, do the next 1 million rows, take a backup, and repeat until all are done.

    If you REALLY want to know how much it will use, I would say grab a test system,  and update all of the rows at once.  Or try 2 million rows instead of all, or whatever number makes sense in your scenario.  This will tell you (roughly) how much log space is required by your query.  Then on your live system, you could look at the tlog usage right before your log backups for several cycles to do an estimate on how much is in use by other queries (I say estimate as today you may use 1 GB and tomorrow you may use 10 GB), and then you can estimate how many rows you can do at once without filling the TLOG between backups.

    Again, this is all estimates as we don't know your system or its utilization. BUT you should be able to do a test on a test system to get an idea of how much log space will be used by your query and then you can adjust accordingly.

    Also, breaking it up into batches COULD still fill the TLOG.  If you don't backup the tlog in between those batches, the tlog will continue to get filled until the next backup.  So doing 17 1 million row updates MAY end up using just as much (or more) tlog space as 1 17 million row update.

    My understanding is that the point of breaking it up into batches is NOT to stop the tlog from getting full - it is to reduce the blocking.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Lord Slaagh wrote:

    Hello SSC,

    I hope everyone is staying safe and healthy.

    I am using SQL 2017 and I have over 17 million rows that I have to UPDATE. Obviously I have to do this in batches, but what should the batch limit be? In some of my other scripts I have UPDATED 1 million at a time. Can I increase this?

    The obvious issue is filling the transaction log, which I want to avoid.

    Any help would be appreciated and thank you all in advance.

    Happy Holidays,

    Dave

    I tell people that if you have to DELETE more than about 20% of a table, that you need to use the "Swap'n'Drop" method of copying the the data you want to keep to a new table and then doing the rename thing, etc.  The cool part there is that you can do so in a Minimally Logged fashion and it's going to seriously reduce log file size, duration, and overall impact to the system.

    Updates are twice as bad as Deletes and yet no one thinks of treating them same way as I just described where the UPDATE is done as the data is being transferred in a minimally logged fashion to a new table.

    Think about it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • This was removed by the editor as SPAM

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply