RE: UPDATE - row by row vs batch

SSC Guru

Points: 150467

February 20, 2013 at 2:17 am

Rob-350472 (2/18/2013)
I'm wondering of the implications of two different update methods, say you're changing a bunch of titles from Mr to Mrs (or something similar), for say 300 records.

The batch approach is certainly more efficient for the update operation; that side of things has been well-covered in previous comments.

A couple of things in your opening post look odd to me though:

1. Both batch and row-by-row queries feature hard-coded numeric values, which are presumably keys for the table concerned. It is much more common to use a dynamic selection query, or to use keys from a table (usually a temporary one). Do the queries concerned really look like this?

2. Neither query checks that the condition that caused them to be selected for change, still applies. Your example is a change from 'Mr' to 'Mrs' - in that context, the update queries ought to check that the source record still contains 'Mr', otherwise you risk a lost update (where someone else has changed the data in the meantime, their change is lost).

As far as choosing between the two approaches is concerned, it all depends on your objectives. Extremely busy OLTP systems generally opt for a very defensive approach. In practice, this often means row-by-row changes with a small delay between each. Combined with other defensive measures like SET DEADLOCK_PRIORITY LOW, and setting LOCK_TIMEOUT or using the NOWAIT table hint, this helps to ensure changes are applied at a rate the server can tolerate, with the update activity taking a back seat to regular database activity. This is often much more important than the efficiency of the updates.

On to specifics. If there is a useful index on the Title column, I might write the *batch update* something like:

SET DEADLOCK_PRIORITY LOW;

SET LOCK_TIMEOUT 0;

WHILE 1 = 1

BEGIN

UPDATE TOP (300) dbo.Contact

SET Title = 'Mrs'

WHERE Title = 'Mr';

IF @@ROWCOUNT = 0 BREAK;

END;

If there is no useful index on Title, something like:

CREATE TABLE #Selected (pk integer PRIMARY KEY);

CREATE TABLE #Batch (pk integer PRIMARY KEY);

-- All rows to be processed at some point

INSERT #Selected (pk)

SELECT c.pk

FROM dbo.Contact AS c

WHERE c.title = N'Mr';

WHILE 1 = 1

BEGIN

-- Move up to 300 keys from #Selected to #Batch

INSERT #Batch (pk)

SELECT s.pk

FROM

(

DELETE TOP (300) #Selected

OUTPUT DELETED.pk

) AS s;

-- Finished if no rows in the current batch

IF @@ROWCOUNT = 0 BREAK;

-- Do the batch update, checking the selection condition still applies

UPDATE dbo.Contact

SET title = N'Mrs'

WHERE pk IN (SELECT b.pk FROM #Batch AS b)

AND title = N'Mr';

-- Clear the batch table

TRUNCATE TABLE #Batch;

END;

You would need to add error handling (for example where we are the victim of a lock time-out or deadlock), but you get the general idea, I'm sure.

Overall, the 'best' solution depends on your system and business requirements. I have worked with clients with such high transaction rates and availability requirements that the only option was to trickle-feed single-row changes with WAITFOR DELAY '00:00:00.010'; between each.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi