Optimization for Single Table with Thousands of Concurrent Updates

  • I have a table with 5 million records and each record requires a "Value" column to be updated based on outside sources.  Essentially there will be 5 million difference updates.  I need these updates to occur as quickly as possible.  I have written an engine that will concurrently execute a configurable number of updates through a configurable number of connections. This done by processor.  So my math is 8 processors * 10 connections * 100 concurrent updates = 8000 concurrent updates.  This keeps my CPU at around 90% and my Memory at about 15%.   It is only processing about 100 updates\second which correlates to over 13 hours of processing (if my math is correct).  I need this to happen in minutes not hours.  I have optimized the update statements individually and concurrently by tuning the scripts based on the execution plans both single and concurrent.   My issue is that it is taking too long to process the 8000 concurrent transactions.  I would think that SQL Server could handle 8000 concurrent updates without a problem, so there must be some type of tuning that I am missing.  I am running on an AWS EC2 R5.12Xlarge instance with Data, Logs, and TempDB all on different EBS volumes.  Any advice for getting this to process is greatly appreciated.

     

    Additional Info:  The architecture is a follows:

    Table:  Base

    BaseID     CalculationStatement                                                                        Value

    1                 'update base set value = {some formula} where BaseID = 1

    2                'update base set value = {some formula} where BaseID = 2

    ...

    5000000  'update base set value = {some formula} where BaseID = 5000000

     

    Each formula in the update statements is coming from an outside source and each update is slightly different.  Essentially, I want to concurrently execute the update statements to fill in the "Value" column.

     

  • Are you seeing blocking/locking?

    What transaction isolation level does the database use? the connections?

    It's likely that batching the updates with far fewer connections will be much faster than issuing 5 million single-row updates.

  • And, of course, look at actual execution plan to see if you have index needed to support the where clause(s) for the updates. Without that, it may be having to scan the table for every update.

  • can't you group /pre process updates so you have a lower number of transactions but each updating a higher number of records in a way that the records updated by each transaction reside in same or adjacent pages?

    other things that will affect the concurrency is any other indexes that may need to be updated as well as any fk (you didn't specify if the "value" field is the same in all records neither if it is a FK column), as well a s possible lack of supporting indexes to do the update itself.

    and updating 5 million rows in 2 or 3 transactions should only take a few minutes.

  • based on your update you can still do multiples in one go - we don't have enough detail on how you get those formulas and how much they differ, neither if the formula references other tables (or is built from other tables) but the following approach will likely work better than your current option.

    update tbl
    set value = case
    when baseid = 1 then baseid_1_formula
    when baseid = 2 then baseid_2_formula
    end
    from basetable tbl
    where tbl.baseid in (1, 2)

    and while you say formulas are different are they completely different for EACH of your 5 million rows? e.g. how many really distinct formulas do you have?

    and even if a good amount of them it is likely they can be aggregated partially into one, and then slight differences applied using case statements as per above example.

     

  • Just guessing here, but if the data was ordered first, so that you're doing your batches in the same order that the data is already stored in, you'll likely see some improvements. If it has to keep bouncing around to find the data that has to be updated, that could certainly slow things down. However, with the data being ordered, you load a page in order to update a row, then, the next row update is the same page, and the next, etc. you should see quite the improvement over load a page, then go get another, then another, for three rows.

     

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore Roosevelt
    The Scary DBA
    Author of: SQL Server 2022 Query Performance Tuning, 6th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • Is the clustered index on the table on BaseID?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who give you the shirt off *someone else's* back.

  • You could insert all the data that needs to be updated into a temporary table with a clustered index on BaseId, also as Scott says make sure the clustered index on the base table is on BaseId. Then just do an update statement:

    update b
    set b.value = t.Value
    from base b
    inner join #temp t
    on t.BaseId = b.BaseId

     

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply