Best option to perform an update without transaction log growth

  • I have a table that I need to run an update on one column for every record in the db. Essentially, I need to add a prefix value to the existing column data for every record. The update statement would be something like this:

    update table1 set colA = 'XY' + colA

    My concern is that I don't want to cause the transaction log to expand during the transaction. It's currently set to Full. What is the best way to go about doing the update without causing a big expansion of the log?

  • rbuzzard1 (9/26/2011)


    My concern is that I don't want to cause the transaction log to expand during the transaction. It's currently set to Full. What is the best way to go about doing the update without causing a big expansion of the log?

    You don't. It's not built to allow you to do that. Even on 'simple' you'd blow the log out, as it'd be a single transaction.

    What you need to do is piecemeal the update (a few 100,000 rows at a time), then force a log backup between each pass. It's really the only way you can try to keep this from blowing out the log space.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • OK. I can buy that. What is the best way to break it up like you suggest? I've been trying to look into that but see mixed messages on how to do it. I'd appreciate any input.

  • Depends, can you script and paste your schema here? It's gonna depend on your available keys and indexes.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • It's pretty simple structure and no index on column being updated.

    CREATE TABLE [dbo].[Product](

    [ProductUID] [int] IDENTITY(1,1) NOT NULL,

    [EntityUID] [int] NOT NULL,

    [ProductID] [varchar](30) NULL,

    [LastUpdate] [datetime] NOT NULL CONSTRAINT [DF_Visit_LastUpdate_1] DEFAULT (getdate()),

    CONSTRAINT [PK_Product] PRIMARY KEY CLUSTERED

    (

    [ProductUID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    ) ON [PRIMARY]

    I'm wanting to update the ProductID value as stated in the initial post to simply add a two-char prefix to the existing value for all rows in the table.

  • Can you post the current update statement? Will there be an easy way to tell an updated from from one not yet updated?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The update statement would look like this:

    update product set productid = 'XY' + productid

    As is, that would update all records, but apparently I need to know how to break that up into smaller groups, and add a trans log backup after each group of records is updated, in order to keep the log size down.

    As far as being able to tell which have been updated, anything with left(productid, 2) = 'XY' has been updated. So, presumably, that would change the update statement to this:

    update product set productid = 'XY' + productid

    where left(productid, 2) <> 'XY'

  • Untested. Test it yourself first!!!!!

    DECLARE @Done bit;

    SET @Done = 0;

    WHILE (@Done = 0)

    BEGIN

    UPDATE TOP (X) Product

    SET Productid = 'XY' + Productid

    WHERE ProductID NOT LIKE 'XY%';

    IF @@RowCount = 0

    SET @Done = 1;

    -- log backup at this point

    END

    How large X should be (the value in the TOP) I leave to you. Small enough you don't grow the log, large enough you're not updating for a week.

    Stick an index on ProductID first (yes, it'll slow the update, but speed finding the rows to update)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks very much. I think that will do the trick. I was getting there but was uncertain about it and that really helps me out.

  • I would approach this a little differently:

    DECLARE @c INT,

    @LastID INT

    SET @c = 0

    SELECT @LastID = MAX(ProductUID) FROM Product

    WHILE @c <= @LastID

    BEGIN

    UPDATE Product

    SETProductID = 'XY' + ProductID

    WHERE

    ProductUID Between @c AND @c + 2500 - 1 -- avoid overlaps and make it obvious

    SET @c = @c + 2500

    IF @c%200000 = 0

    BEGIN

    -- Reference in BoL

    -- ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_6tsql/html/89a4658a-62f1-4289-8982-f072229720a1.htm

    -- See section (B)

    BACKUP Log <DatabaseName>

    TO <SaveDevice> --Lookup sp_addumpdevice if necessary

    END

    END


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (9/26/2011)


    I would approach this a little differently:

    Is there an advantage ? Gail's seems easier to read, and I think hers could be restarted where it left off if the process had to be stopped at some point, without risking a double update.

  • homebrew01 (9/27/2011)


    Evil Kraig F (9/26/2011)


    I would approach this a little differently:

    Is there an advantage ? Gail's seems easier to read, and I think hers could be restarted where it left off if the process had to be stopped at some point, without risking a double update.

    Mine runs down a clustered index and hopefully directly seeks the records in question, without any searching, and should be working with less rows before the top hits (per pass). Gail's will need to scan each pass looking for valid contenders. Hers can be rerun in case of an error, mine should go faster because it is not checking on secondary fields for where conditions and is merely running through a section of the clustered index at a time.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (9/27/2011)


    homebrew01 (9/27/2011)


    Evil Kraig F (9/26/2011)


    I would approach this a little differently:

    Is there an advantage ? Gail's seems easier to read, and I think hers could be restarted where it left off if the process had to be stopped at some point, without risking a double update.

    Mine runs down a clustered index and hopefully directly seeks the records in question, without any searching, and should be working with less rows before the top hits (per pass). Gail's will need to scan each pass looking for valid contenders.

    Not if the index I recommended is created.

    Also, with an update this simple, only the specified number of rows will come back from the index seek. It won't be a fetch all, discard all but the X required.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/27/2011)


    Evil Kraig F (9/27/2011)


    homebrew01 (9/27/2011)


    Evil Kraig F (9/26/2011)


    I would approach this a little differently:

    Is there an advantage ? Gail's seems easier to read, and I think hers could be restarted where it left off if the process had to be stopped at some point, without risking a double update.

    Mine runs down a clustered index and hopefully directly seeks the records in question, without any searching, and should be working with less rows before the top hits (per pass). Gail's will need to scan each pass looking for valid contenders.

    Not if the index I recommended is created.

    Also, with an update this simple, only the specified number of rows will come back from the index seek. It won't be a fetch all, discard all but the X required.

    Sorry, I'd missed that you recommended a new index be put into play. That's something else I usually avoid, but you're right (of course! 😉 ), it will avoid the other issues.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply