A good way to do an update statement in 1000 row chunks

  • I have a temp table and a main business table I am working with. The main business table has one column I am updating. I cannot add any additional columns to the table or update any other columns.

    The temp table contains a key and a value and the main table is to be updated with this value. I can do anything I want to this table.

    So for example:

    Temp Table

    ID Name

    1 Bob

    2 Charley

    3 Duggins

    4 Duggins

    Business Table

    ID Name many other columns

    1 Null

    2 Null

    3 Null

    4 Null

    (notice that the values aren't unique).

    My goal here would be to update the name field in the business table.

    Due to how heavily hit the business table is, I've been asked to run the update statement in 1000 row chunks to minimize the possibilities of table locking.

    My problem is coming with a good way to track my progress. I can do the update statement on a 1000 rows with no problem, and I can create a column in my temp table to track what's been updated.

    My problem is, how can make absolutely sure that I properly check off the specific 1000 rows I just updated in the main table...since I believe there is no formal order to how update statements work.

  • huston.dunlap (5/9/2010)


    I have a temp table and a main business table I am working with. The main business table has one column I am updating. I cannot add any additional columns to the table or update any other columns.

    The temp table contains a key and a value and the main table is to be updated with this value. I can do anything I want to this table.

    So for example:

    Temp Table

    ID Name

    1 Bob

    2 Charley

    3 Duggins

    4 Duggins

    Business Table

    ID Name many other columns

    1 Null

    2 Null

    3 Null

    4 Null

    (notice that the values aren't unique).

    My goal here would be to update the name field in the business table.

    Due to how heavily hit the business table is, I've been asked to run the update statement in 1000 row chunks to minimize the possibilities of table locking.

    My problem is coming with a good way to track my progress. I can do the update statement on a 1000 rows with no problem, and I can create a column in my temp table to track what's been updated.

    My problem is, how can make absolutely sure that I properly check off the specific 1000 rows I just updated in the main table...since I believe there is no formal order to how update statements work.

    Gosh... a properly written UPDATE takes about 3 seconds on a million rows. They sure are making you jump through hoops.

    My recommendation would be to make the inserts in the same order as what the target table is. Since it's a temp table, you can have an identity column on it which means you can use simple integer division to mark rows in groups of 1000. Then, you'll need to use a WHILE LOOP to control the whole thing. It would step through numbers which represented the groups of 1000 and do set based updates of 1000 rows for each iteration of the While Loop.

    Now... since the table is so heavily hit upon, you should probably give the target table a little breathing room by using a WAITFOR DELAY '00:00:01' which will, of course, wait for 1 second between iterations so other processes have a chance to keep up.

    The counter in the WHILE LOOP will keep track of which groups you'll update.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/9/2010)

    Gosh... a properly written UPDATE takes about 3 seconds on a million rows. They sure are making you jump through hoops.

    --Hehe, don't I know it!

    My recommendation would be to make the inserts in the same order as what the target table is. Since it's a temp table, you can have an identity column on it which means you can use simple integer division to mark rows in groups of 1000. Then, you'll need to use a WHILE LOOP to control the whole thing. It would step through numbers which represented the groups of 1000 and do set based updates of 1000 rows for each iteration of the While Loop.

    Let me make sure I understand what you are saying.

    So basically a create a column in the temp table that has an integer for every 1000 rows, so 1,2,3 etc. Then I loop through the Update where I do something like

    UPDATE TABLE

    SET column = value

    WHERE ROWID = n

    and then n will be the 1,2,3 number. Is that the gist?

  • I think you're over thinking this a bit - if I am understanding the relationship correctly. Since the ID's in both tables match, you really don't need to track which rows are updated in the temp table, just need to track which rows need to be updated in the destination.

    WHILE 1 = 1

    BEGIN;

    UPDATE TOP(1000) dest

    SET dest.Name = source.Name

    FROM business_table dest

    INNER JOIN temp_table source ON source.ID = dest.ID

    WHERE dest.name IS NULL;

    IF @@rowcount < 1 BREAK;

    WAITFOR DELAY 00:00:01;

    END;

    I don't see any reason to worry about which rows you have updated and/or tracking that in the temp table.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Ah, I can understand the confusion.

    In the actual scenario, sometimes the field is null in the business table, and sometimes its a previous value that needs to be updated...so I can't use null entries as a tracking point. Sorry I didn't clarify that before.

  • Jeffrey Williams-493691 (5/9/2010)


    I don't see any reason to worry about which rows you have updated and/or tracking that in the temp table.

    Keeps the locking localized to just a couple of pages.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/9/2010)


    Jeffrey Williams-493691 (5/9/2010)


    I don't see any reason to worry about which rows you have updated and/or tracking that in the temp table.

    Keeps the locking localized to just a couple of pages.

    But, that is going to be done with the TOP(1000) and only touching the rows where the value is NULL. I still don't see why another column is needed in the temp table to track which rows have been updated.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams-493691 (5/9/2010)


    Jeff Moden (5/9/2010)


    Jeffrey Williams-493691 (5/9/2010)


    I don't see any reason to worry about which rows you have updated and/or tracking that in the temp table.

    Keeps the locking localized to just a couple of pages.

    But, that is going to be done with the TOP(1000) and only touching the rows where the value is NULL. I still don't see why another column is needed in the temp table to track which rows have been updated.

    You're right, of course. I do tend to over pre-validate data sometimes. :pinch:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply