SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


UPDATE - row by row vs batch


UPDATE - row by row vs batch

Author
Message
Rob-350472
Rob-350472
SSCommitted
SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)

Group: General Forum Members
Points: 1533 Visits: 684
Guys,

I'm wondering of the implications of two different update methods, say you're changing a bunch of titles from Mr to Mrs (or something similar), for say 300 records.

1. 'Batch'
UPDATE Contact
SET Title = 'Mrs'
WHERE ConID in
(
1,2,3,4..n up to 300 or so.
)

2. Row by row
UPDATE Contact SET Title = 'Mrs' WHERE ConID = 1
UPDATE Contact SET Title = 'Mrs' WHERE ConID = 2
UPDATE Contact SET Title = 'Mrs' WHERE ConID = 3
UPDATE Contact SET Title = 'Mrs' WHERE ConID = n... up to 300 or so


I've tried both of these with an actual example, the row by row tool 10 seconds vs the batch taking just 1 second, clearly that's the 'quicker' option.

What are the other implications for these two methods which ultimately achieve the same outcome?

If you were using the (slower) row by row approach on live data which could at any moment be updated via a user would this be a safer approach? - The one update statement would fail? - Under the batch it could run for ages.

Just curious, I've not thought about it that much but a deadlock on live data via the batch method last week made me wonder this morning if the row by row would have been better.
GilaMonster
GilaMonster
SSC Guru
SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)

Group: General Forum Members
Points: 223460 Visits: 46297
Batch - faster, less resources, consistency at a statement level

Row by row - slow, more locks, no automatic consistency, you need to use transactions (if one update fails, the rest will succeed and you'll have to test and check for that)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


Rob-350472
Rob-350472
SSCommitted
SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)

Group: General Forum Members
Points: 1533 Visits: 684
Thanks for the response. In the situation I faced last week where I was updating contacts, about 2,000, on a local version of the database this took 1 second, on the live version, it completed in just over 1 min, causing a deadlock and timeouts for a few people!

I was curious as to whether the row by row would have been better because (I presume the problem was cause becasuse someone was trying to update a contact I had in my batc) as it would have looked at a contact one at a time?

Not sure if I'm explaining myself that well, hopefully the above makes sense.
GilaMonster
GilaMonster
SSC Guru
SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)SSC Guru (223K reputation)

Group: General Forum Members
Points: 223460 Visits: 46297
Row by row better? Well, in the sense that if one fails the whole batch won't fail, but then you have no guarantee that all the contacts were updated.
A single-row update can still cause timeouts and deadlocks.

Personally that kind of update I do out of business hours.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


Rob-350472
Rob-350472
SSCommitted
SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)

Group: General Forum Members
Points: 1533 Visits: 684
That was my decision in the end to be honest, out of business hours. Worked out okay for this situation. I guess I was just thinking if one was foced to do it in business hours if this would be better or not.

I suppose one could run a simple select on the universe being updated to verify they had been updated, still, the batch/set approach would be better I guess.
Gabriel P
Gabriel P
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1098 Visits: 947
Rob-350472 (2/18/2013)
That was my decision in the end to be honest, out of business hours. Worked out okay for this situation. I guess I was just thinking if one was foced to do it in business hours if this would be better or not.

I suppose one could run a simple select on the universe being updated to verify they had been updated, still, the batch/set approach would be better I guess.


For large tables, I usually do a hybrid of both...basically a while loop with an update of small batches ( It requires more code and 1 or 2 staging tables to help keep track of the records and the completed batches in case you have to stop the process). But you want to do everything in your power to avoid doing 1 row per transaction. Not only will it take forever, but it will cause rampage in your transaction log. Even if you only did 5 or 20 rows per transaction, the amount of time and transaction log IO you use is cut down is pretty significant compared to only doing 1 row at time.

But for a table that is being used during business hours, the primary goal is to make sure each transaction still happens pretty instantaneous, so of course there is a limit to how many # of rows you can include in each transaction.
Rob-350472
Rob-350472
SSCommitted
SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)

Group: General Forum Members
Points: 1533 Visits: 684
Gabriel, that makes quite a lot of sense, I guess I could have broken my say 2,000 into blocks of 500 perhaps, however, in that instance out hours makes more sense as it's not mission critical to be done instantly.
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)

Group: General Forum Members
Points: 35422 Visits: 11361
Rob-350472 (2/18/2013)
I'm wondering of the implications of two different update methods, say you're changing a bunch of titles from Mr to Mrs (or something similar), for say 300 records.

The batch approach is certainly more efficient for the update operation; that side of things has been well-covered in previous comments.

A couple of things in your opening post look odd to me though:

1. Both batch and row-by-row queries feature hard-coded numeric values, which are presumably keys for the table concerned. It is much more common to use a dynamic selection query, or to use keys from a table (usually a temporary one). Do the queries concerned really look like this?

2. Neither query checks that the condition that caused them to be selected for change, still applies. Your example is a change from 'Mr' to 'Mrs' - in that context, the update queries ought to check that the source record still contains 'Mr', otherwise you risk a lost update (where someone else has changed the data in the meantime, their change is lost).

As far as choosing between the two approaches is concerned, it all depends on your objectives. Extremely busy OLTP systems generally opt for a very defensive approach. In practice, this often means row-by-row changes with a small delay between each. Combined with other defensive measures like SET DEADLOCK_PRIORITY LOW, and setting LOCK_TIMEOUT or using the NOWAIT table hint, this helps to ensure changes are applied at a rate the server can tolerate, with the update activity taking a back seat to regular database activity. This is often much more important than the efficiency of the updates.

On to specifics. If there is a useful index on the Title column, I might write the *batch update* something like:

SET DEADLOCK_PRIORITY LOW;
SET LOCK_TIMEOUT 0;

WHILE 1 = 1
BEGIN
UPDATE TOP (300) dbo.Contact
SET Title = 'Mrs'
WHERE Title = 'Mr';

IF @@ROWCOUNT = 0 BREAK;
END;



If there is no useful index on Title, something like:

CREATE TABLE #Selected (pk integer PRIMARY KEY);
CREATE TABLE #Batch (pk integer PRIMARY KEY);

-- All rows to be processed at some point
INSERT #Selected (pk)
SELECT c.pk
FROM dbo.Contact AS c
WHERE c.title = N'Mr';

WHILE 1 = 1
BEGIN
-- Move up to 300 keys from #Selected to #Batch
INSERT #Batch (pk)
SELECT s.pk
FROM
(
DELETE TOP (300) #Selected
OUTPUT DELETED.pk
) AS s;

-- Finished if no rows in the current batch
IF @@ROWCOUNT = 0 BREAK;

-- Do the batch update, checking the selection condition still applies
UPDATE dbo.Contact
SET title = N'Mrs'
WHERE pk IN (SELECT b.pk FROM #Batch AS b)
AND title = N'Mr';

-- Clear the batch table
TRUNCATE TABLE #Batch;
END;



You would need to add error handling (for example where we are the victim of a lock time-out or deadlock), but you get the general idea, I'm sure.

Overall, the 'best' solution depends on your system and business requirements. I have worked with clients with such high transaction rates and availability requirements that the only option was to trickle-feed single-row changes with WAITFOR DELAY '00:00:00.010'; between each.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Rob-350472
Rob-350472
SSCommitted
SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)

Group: General Forum Members
Points: 1533 Visits: 684
Paul, thanks for the reply, delayed response but to answer your two questions:

1. Both batch and row-by-row queries feature hard-coded numeric values, which are presumably keys for the table concerned. It is much more common to use a dynamic selection query, or to use keys from a table (usually a temporary one). Do the queries concerned really look like this?

Yes! The queries actually do look like this, basically someone else will hand me an excel spreadsheet of IDs they've reviewed and want the changes made fo - I'll just concatenate these IDs in Excel and run the update.

2. Neither query checks that the condition that caused them to be selected for change, still applies. Your example is a change from 'Mr' to 'Mrs' - in that context, the update queries ought to check that the source record still contains 'Mr', otherwise you risk a lost update (where someone else has changed the data in the meantime, their change is lost).

This is true and the actual queries would do that check, for similicity I omitted that here.

The defensive approach you mention is very interesting and introduces a few ideas I'd either not heard of or not thought of, I don't think we're at the stage that these would be necessary at the moment but it's something I'd like to play with before they do become a necessity, thanks for the information there Smile
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search