UPDATE query performance tuning

  • Hello,

    I have 2 update queries relatively simple but acting on around 10 million rows hence the performance and temp db is severely impacted.

    --- query 1

    Update dbo. test

    set lasttime = ( select max(time) from dbo.test)

    --- query 2

    update dbo.test

    set col A = case when status = 'open' and date < DateAdd(d,-1,LastModifiedMaxDateTime) then DATEDIFF(d, date, LastModifiedMaxDateTime) END

    set col B = case when status = 'open' and date < DateAdd(d,-1,LastModifiedMaxDateTime) then 'Y' else 'N' END,

    set col C = case when status = 'open' then datediff(d,date,maxdate) end

    These queries are getting executed through SSIS execute SQL task and taking 45 min to execute.

    The table has a clustered index

    Any pointers towards the improvement of performance will help.

     

     

     

    • This topic was modified 1 day, 21 hours ago by  kirti.
  • What columns are on the clustered index?

    Do you have any other indexes on the table?

    You know you could write the two update statements to be just one update statement?

  • It has a clustered index but clustered index column is not included in where clause nor in the list of columns getting updated.

    Does dropping a clustered index before update helps in performance?

  • kirti wrote:

    It has a clustered index but clustered index column is not included in where clause nor in the list of columns getting updated.

    Does dropping a clustered index before update helps in performance?

     

    I doubt the clustered index is the cause of the problem, you might want to rebuild it just so it's a bit more efficient when the query does a full scan of it.

    Do you have any other indexes on the table?

    The query should just do a scan of the clustered index to get the max(time) and another scan to update the table.

    I would try rebuilding the clustered index and combining the two queries into one.

    It also might be worth putting a where clause on the query to make sure rows that would be updated to be the same value they were before the update are not updated.

     

  • Are you able to capture the execution plan for these?  That could help determine what is actually happening.  Do the queries run just as slow if you run them in SSMS rather than SSIS?  If they run faster in SSMS, it may not hurt to convert those queries to stored procedures.

     

    One thing that MAY help query 1's performance (may not) would be to toss the SELECT MAX result into a variable and do an update using the variable.  I don't expect this to make a big difference (if any), but I've seen SQL do some weird things by tossing a single value into a variable rather than having it in the statement.

    With query number 2, you MAY get a performance boost by adding a nonclustered index on the STATUS column (if you don't have that already).  You could also move the "status='open'" to a WHERE clause on that UPDATE which may help (or may make no difference).

    I would try these things on a test version of the database and not live as it may make things worse.

  • thanks for the suggestions, I will try these and update the results.

    Can we do updates in batches? If I have total record count of 10 million than what should be approx batch size for above update statements?

    • This reply was modified 1 day, 5 hours ago by  kirti.
  • Is there a reason you have 2 queries - both look to be updating the same table and updating all rows in that table.  Could be done using a single update statement.

    What is the purpose of updating these columns?  Where do these updates occur in the workflow - at the beginning or end of the load process?

    In the second query - the code is only setting 'open' status rows.  Is there a reason to reset non 'open' status rows to some value?  The code would be setting those values to NULL because the CASE expression doesn't have an ELSE condition (on 2 of the 3 columns).  If there is no reason to update the non 'open' rows then maybe exclude those rows from the update process.

    With that said - if this is a post-process update then I would recommend having the insert/update process in the SSIS workflow provide the correct values when the rows are updated/inserted at that time, instead of using a post-process update.

    And finally, if none of that is possible - then yes, perform the updates in batches.  I would start around 200,000 rows and adjust up or down as needed.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • kirti wrote:

    thanks for the suggestions, I will try these and update the results.

    Can we do updates in batches? If I have total record count of 10 million than what should be approx batch size for above update statements?

    Would be helpful if you could supply the DDL for the table and details of all the indexes on the table. To do the job in batches you really need to search on an index to do it efficiently which is why I'm asking for these details.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply