Best option to perform an update without transaction log growth

Question

Best option to perform an update without transaction log growth

rbuzzard1

SSC Veteran

Points: 201
More actions
September 26, 2011 at 1:30 pm

#248462

I have a table that I need to run an update on one column for every record in the db. Essentially, I need to add a prefix value to the existing column data for every record. The update statement would be something like this:
update table1 set colA = 'XY' + colA
My concern is that I don't want to cause the transaction log to expand during the transaction. It's currently set to Full. What is the best way to go about doing the update without causing a big expansion of the log?

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

rbuzzard1 (9/26/2011)
My concern is that I don't want to cause the transaction log to expand during the transaction. It's currently set to Full. What is the best way to go about doing the update without causing a big expansion of the log?

You don't. It's not built to allow you to do that. Even on 'simple' you'd blow the log out, as it'd be a single transaction.

What you need to do is piecemeal the update (a few 100,000 rows at a time), then force a log backup between each pass. It's really the only way you can try to keep this from blowing out the log space.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

rbuzzard1 SSC Veteran Points: 201 More actions · Answer 2

OK. I can buy that. What is the best way to break it up like you suggest? I've been trying to look into that but see mixed messages on how to do it. I'd appreciate any input.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 3

Depends, can you script and paste your schema here? It's gonna depend on your available keys and indexes.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

rbuzzard1 SSC Veteran Points: 201 More actions · Answer 4

It's pretty simple structure and no index on column being updated.

CREATE TABLE [dbo].[Product](

[ProductUID] [int] IDENTITY(1,1) NOT NULL,

[EntityUID] [int] NOT NULL,

[ProductID] [varchar](30) NULL,

[LastUpdate] [datetime] NOT NULL CONSTRAINT [DF_Visit_LastUpdate_1] DEFAULT (getdate()),

CONSTRAINT [PK_Product] PRIMARY KEY CLUSTERED

(

[ProductUID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

) ON [PRIMARY]

I'm wanting to update the ProductID value as stated in the initial post to simply add a two-char prefix to the existing value for all rows in the table.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 5

Can you post the current update statement? Will there be an easy way to tell an updated from from one not yet updated?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

rbuzzard1 SSC Veteran Points: 201 More actions · Answer 6

The update statement would look like this:

update product set productid = 'XY' + productid

As is, that would update all records, but apparently I need to know how to break that up into smaller groups, and add a trans log backup after each group of records is updated, in order to keep the log size down.

As far as being able to tell which have been updated, anything with left(productid, 2) = 'XY' has been updated. So, presumably, that would change the update statement to this:

update product set productid = 'XY' + productid

where left(productid, 2) <> 'XY'

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 7

Untested. Test it yourself first!!!!!

DECLARE @Done bit;

SET @Done = 0;

WHILE (@Done = 0)

BEGIN

UPDATE TOP (X) Product

SET Productid = 'XY' + Productid

WHERE ProductID NOT LIKE 'XY%';

IF @@RowCount = 0

SET @Done = 1;

-- log backup at this point

END

How large X should be (the value in the TOP) I leave to you. Small enough you don't grow the log, large enough you're not updating for a week.

Stick an index on ProductID first (yes, it'll slow the update, but speed finding the rows to update)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

rbuzzard1 SSC Veteran Points: 201 More actions · Answer 8

Thanks very much. I think that will do the trick. I was getting there but was uncertain about it and that really helps me out.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 9

I would approach this a little differently:

DECLARE @c INT,

@LastID INT

SET @c = 0

SELECT @LastID = MAX(ProductUID) FROM Product

WHILE @c <= @LastID

BEGIN

UPDATE Product

SETProductID = 'XY' + ProductID

WHERE

ProductUID Between @c AND @c + 2500 - 1 -- avoid overlaps and make it obvious

SET @c = @c + 2500

IF @c%200000 = 0

BEGIN

-- Reference in BoL

-- ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_6tsql/html/89a4658a-62f1-4289-8982-f072229720a1.htm

-- See section (B)

BACKUP Log <DatabaseName>

TO <SaveDevice> --Lookup sp_addumpdevice if necessary

END

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

homebrew01 SSC Guru Points: 55566 More actions · Answer 10

Evil Kraig F (9/26/2011)
I would approach this a little differently:

Is there an advantage ? Gail's seems easier to read, and I think hers could be restarted where it left off if the process had to be stopped at some point, without risking a double update.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 11

homebrew01 (9/27/2011)
Evil Kraig F (9/26/2011)
I would approach this a little differently:
Is there an advantage ? Gail's seems easier to read, and I think hers could be restarted where it left off if the process had to be stopped at some point, without risking a double update.

Mine runs down a clustered index and hopefully directly seeks the records in question, without any searching, and should be working with less rows before the top hits (per pass). Gail's will need to scan each pass looking for valid contenders. Hers can be rerun in case of an error, mine should go faster because it is not checking on secondary fields for where conditions and is merely running through a section of the clustered index at a time.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 12

Evil Kraig F (9/27/2011)
homebrew01 (9/27/2011)
Evil Kraig F (9/26/2011)
I would approach this a little differently:
Is there an advantage ? Gail's seems easier to read, and I think hers could be restarted where it left off if the process had to be stopped at some point, without risking a double update.
Mine runs down a clustered index and hopefully directly seeks the records in question, without any searching, and should be working with less rows before the top hits (per pass). Gail's will need to scan each pass looking for valid contenders.

Not if the index I recommended is created.

Also, with an update this simple, only the specified number of rows will come back from the index seek. It won't be a fetch all, discard all but the X required.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 13

GilaMonster (9/27/2011)
Evil Kraig F (9/27/2011)
homebrew01 (9/27/2011)
Evil Kraig F (9/26/2011)
I would approach this a little differently:
Is there an advantage ? Gail's seems easier to read, and I think hers could be restarted where it left off if the process had to be stopped at some point, without risking a double update.
Mine runs down a clustered index and hopefully directly seeks the records in question, without any searching, and should be working with less rows before the top hits (per pass). Gail's will need to scan each pass looking for valid contenders.
Not if the index I recommended is created.
Also, with an update this simple, only the specified number of rows will come back from the index seek. It won't be a fetch all, discard all but the X required.

Sorry, I'd missed that you recommended a new index be put into play. That's something else I usually avoid, but you're right (of course! 😉 ), it will avoid the other issues.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA