Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

UPDATE - row by row vs batch Expand / Collapse
Author
Message
Posted Monday, February 18, 2013 3:45 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, March 31, 2014 9:19 AM
Points: 314, Visits: 677
Guys,

I'm wondering of the implications of two different update methods, say you're changing a bunch of titles from Mr to Mrs (or something similar), for say 300 records.

1. 'Batch'
UPDATE Contact
SET Title = 'Mrs'
WHERE ConID in
(
1,2,3,4..n up to 300 or so.
)

2. Row by row
UPDATE Contact SET Title = 'Mrs' WHERE ConID = 1
UPDATE Contact SET Title = 'Mrs' WHERE ConID = 2
UPDATE Contact SET Title = 'Mrs' WHERE ConID = 3
UPDATE Contact SET Title = 'Mrs' WHERE ConID = n... up to 300 or so


I've tried both of these with an actual example, the row by row tool 10 seconds vs the batch taking just 1 second, clearly that's the 'quicker' option.

What are the other implications for these two methods which ultimately achieve the same outcome?

If you were using the (slower) row by row approach on live data which could at any moment be updated via a user would this be a safer approach? - The one update statement would fail? - Under the batch it could run for ages.

Just curious, I've not thought about it that much but a deadlock on live data via the batch method last week made me wonder this morning if the row by row would have been better.
Post #1421128
Posted Monday, February 18, 2013 5:48 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 5:23 AM
Points: 40,620, Visits: 37,088
Batch - faster, less resources, consistency at a statement level

Row by row - slow, more locks, no automatic consistency, you need to use transactions (if one update fails, the rest will succeed and you'll have to test and check for that)



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1421164
Posted Monday, February 18, 2013 5:52 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, March 31, 2014 9:19 AM
Points: 314, Visits: 677
Thanks for the response. In the situation I faced last week where I was updating contacts, about 2,000, on a local version of the database this took 1 second, on the live version, it completed in just over 1 min, causing a deadlock and timeouts for a few people!

I was curious as to whether the row by row would have been better because (I presume the problem was cause becasuse someone was trying to update a contact I had in my batc) as it would have looked at a contact one at a time?

Not sure if I'm explaining myself that well, hopefully the above makes sense.
Post #1421167
Posted Monday, February 18, 2013 6:08 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 5:23 AM
Points: 40,620, Visits: 37,088
Row by row better? Well, in the sense that if one fails the whole batch won't fail, but then you have no guarantee that all the contacts were updated.
A single-row update can still cause timeouts and deadlocks.

Personally that kind of update I do out of business hours.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1421173
Posted Monday, February 18, 2013 6:15 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, March 31, 2014 9:19 AM
Points: 314, Visits: 677
That was my decision in the end to be honest, out of business hours. Worked out okay for this situation. I guess I was just thinking if one was foced to do it in business hours if this would be better or not.

I suppose one could run a simple select on the universe being updated to verify they had been updated, still, the batch/set approach would be better I guess.
Post #1421176
Posted Monday, February 18, 2013 7:44 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, December 9, 2014 11:02 AM
Points: 235, Visits: 842
Rob-350472 (2/18/2013)
That was my decision in the end to be honest, out of business hours. Worked out okay for this situation. I guess I was just thinking if one was foced to do it in business hours if this would be better or not.

I suppose one could run a simple select on the universe being updated to verify they had been updated, still, the batch/set approach would be better I guess.


For large tables, I usually do a hybrid of both...basically a while loop with an update of small batches ( It requires more code and 1 or 2 staging tables to help keep track of the records and the completed batches in case you have to stop the process). But you want to do everything in your power to avoid doing 1 row per transaction. Not only will it take forever, but it will cause rampage in your transaction log. Even if you only did 5 or 20 rows per transaction, the amount of time and transaction log IO you use is cut down is pretty significant compared to only doing 1 row at time.

But for a table that is being used during business hours, the primary goal is to make sure each transaction still happens pretty instantaneous, so of course there is a limit to how many # of rows you can include in each transaction.
Post #1421225
Posted Monday, February 18, 2013 10:26 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, March 31, 2014 9:19 AM
Points: 314, Visits: 677
Gabriel, that makes quite a lot of sense, I guess I could have broken my say 2,000 into blocks of 500 perhaps, however, in that instance out hours makes more sense as it's not mission critical to be done instantly.

Post #1421296
Posted Wednesday, February 20, 2013 2:17 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 12:15 AM
Points: 9,928, Visits: 11,207
Rob-350472 (2/18/2013)
I'm wondering of the implications of two different update methods, say you're changing a bunch of titles from Mr to Mrs (or something similar), for say 300 records.

The batch approach is certainly more efficient for the update operation; that side of things has been well-covered in previous comments.

A couple of things in your opening post look odd to me though:

1. Both batch and row-by-row queries feature hard-coded numeric values, which are presumably keys for the table concerned. It is much more common to use a dynamic selection query, or to use keys from a table (usually a temporary one). Do the queries concerned really look like this?

2. Neither query checks that the condition that caused them to be selected for change, still applies. Your example is a change from 'Mr' to 'Mrs' - in that context, the update queries ought to check that the source record still contains 'Mr', otherwise you risk a lost update (where someone else has changed the data in the meantime, their change is lost).

As far as choosing between the two approaches is concerned, it all depends on your objectives. Extremely busy OLTP systems generally opt for a very defensive approach. In practice, this often means row-by-row changes with a small delay between each. Combined with other defensive measures like SET DEADLOCK_PRIORITY LOW, and setting LOCK_TIMEOUT or using the NOWAIT table hint, this helps to ensure changes are applied at a rate the server can tolerate, with the update activity taking a back seat to regular database activity. This is often much more important than the efficiency of the updates.

On to specifics. If there is a useful index on the Title column, I might write the *batch update* something like:

SET DEADLOCK_PRIORITY LOW;
SET LOCK_TIMEOUT 0;

WHILE 1 = 1
BEGIN
UPDATE TOP (300) dbo.Contact
SET Title = 'Mrs'
WHERE Title = 'Mr';

IF @@ROWCOUNT = 0 BREAK;
END;

If there is no useful index on Title, something like:

CREATE TABLE #Selected (pk integer PRIMARY KEY);
CREATE TABLE #Batch (pk integer PRIMARY KEY);

-- All rows to be processed at some point
INSERT #Selected (pk)
SELECT c.pk
FROM dbo.Contact AS c
WHERE c.title = N'Mr';

WHILE 1 = 1
BEGIN
-- Move up to 300 keys from #Selected to #Batch
INSERT #Batch (pk)
SELECT s.pk
FROM
(
DELETE TOP (300) #Selected
OUTPUT DELETED.pk
) AS s;

-- Finished if no rows in the current batch
IF @@ROWCOUNT = 0 BREAK;

-- Do the batch update, checking the selection condition still applies
UPDATE dbo.Contact
SET title = N'Mrs'
WHERE pk IN (SELECT b.pk FROM #Batch AS b)
AND title = N'Mr';

-- Clear the batch table
TRUNCATE TABLE #Batch;
END;

You would need to add error handling (for example where we are the victim of a lock time-out or deadlock), but you get the general idea, I'm sure.

Overall, the 'best' solution depends on your system and business requirements. I have worked with clients with such high transaction rates and availability requirements that the only option was to trickle-feed single-row changes with WAITFOR DELAY '00:00:00.010'; between each.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1421990
Posted Friday, February 22, 2013 7:53 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, March 31, 2014 9:19 AM
Points: 314, Visits: 677
Paul, thanks for the reply, delayed response but to answer your two questions:

1. Both batch and row-by-row queries feature hard-coded numeric values, which are presumably keys for the table concerned. It is much more common to use a dynamic selection query, or to use keys from a table (usually a temporary one). Do the queries concerned really look like this?

Yes! The queries actually do look like this, basically someone else will hand me an excel spreadsheet of IDs they've reviewed and want the changes made fo - I'll just concatenate these IDs in Excel and run the update.

2. Neither query checks that the condition that caused them to be selected for change, still applies. Your example is a change from 'Mr' to 'Mrs' - in that context, the update queries ought to check that the source record still contains 'Mr', otherwise you risk a lost update (where someone else has changed the data in the meantime, their change is lost).

This is true and the actual queries would do that check, for similicity I omitted that here.

The defensive approach you mention is very interesting and introduces a few ideas I'd either not heard of or not thought of, I don't think we're at the stage that these would be necessary at the moment but it's something I'd like to play with before they do become a necessity, thanks for the information there :)
Post #1423074
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse