Updating a column in batches

  • Background:

    I've added a new column to a table with some 800 million rows. I need the column populated with data from another table in the db. I do this in batches using a where clause between two boundary values. I do this in a loop incrementing the boundary values by a batch size. I do this to keep the batch size within managable levels (preventing paging etc).

    I set the batch size to 100,000 and each loop was taking 4 minutes. At that rate, it would take almost 4 days to complete. So I started to investigate, first up, take the query and look at the execution plan. So I took the guts of the query, replace the variables with scalar constants and ran it, it takes under a second to run - wierd - ok maybe it was cached, so I run it over a record batch I know I haven't updated yet, same result.

    I run the loop again, and look at the execution plan and surprise, they're different. Update statistics, no change. Can someone please tell me why this is occuring?

    UPDATE TranItem

    SET [TransactionDate] = [Transaction].[TransTime]

    FROM TranItem

    INNER JOIN [Transaction] on [TranItem].TransactionId = [Transaction].TransactionId

    WHERE TranItemID between 671726 and 771726

    Yeilds the first execution plan - 2013-05-14 13_09_15.png

    DECLARE @BatchSize int

    DECLARE @IDStart int

    DECLARE @IDEnd int

    SET @IDStart = 1

    SET @IDEnd = 900000000

    SET @BatchSize = 100000

    While @IDStart < @IDEnd BEGIN

    UPDATE TranItem

    SET [TransactionDate] = [Transaction].[TransTime]

    FROM TranItem

    INNER JOIN [Transaction] on [TranItem].TransactionId = [Transaction].TransactionId

    WHERE TranItemID between @IDStart and (@IDStart + @BatchSize)

    set @IDStart = @IDStart + @BatchSize

    END

    GO

    Yeilds the second execution plan : 2013-05-14 13_21_12.png

    Could someone shed some light on this seemingly weird behaviour?

    Cheers.

  • Sounds like parameter sniffing and in this case it might be what Grant Fritchey calls bad parameter sniffing.

    In the plan with the parameters right-click on the Update operator and select properties. Then look at the Parameter List which will show the compiled value and the RunTime value for each parameter. The compiled value is the sniffed value.

    You might get better performance by using the OPTION(RECOMPILE) hint.

  • Thanks Jack,

    I couldn't see the parameter list in the Update node properties, but the Option (RECOMPILE) did the trick.

    Thanks for the cue, I'll read up on bad parameter sniffing.

  • I notice you are using BETWEEN.

    BETWEEN has a weak inequality (>= and <=) at both the upper and lower bound of the range.

    This way, as you traverse the ranges, you will process twice values that are equal to either the upper or lower bound of each range.

    Ideally you should use a strong inequality on one bound of the range and a strong inequality on the other.

    I try to avoid using BETWEEN because it does not explicitly define the type of inequality.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply