How to update rows faster

  • Hi Guys,

    Need help from SQL experts for my scenario. I have a Inventory table that has:

    - 2 million current month transactions

    - 38 million for previous months translations

    I have multiple UPDATE statements each UPDATE is updating this Inventory table but only for current month and each update uses join with a different tables to get values to set.

    The issue I am having is that the updates are taking too long as each time the Inventory table is scanned for current month which means scanning all those 30 million rows for other months also to get current month. Though in update I am using the condition where month= <current month>.

    How can I make this work faster. I am thinking about non-clustered index but that I am not sure if index works for updates

    Any other suggestion would be appreciated

  • Could you post your query and execution plan, do you have an index on the month column , is the index being used ?

    Jayanth Kurup[/url]

  • I agree with Jayanth, for specific advice, the query and the existing execution plan.

    But, general advice, yes, a non-clustered index can be used in an UPDATE statement to help the query find the data that you're attempting to update.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • rhans (12/9/2015)


    Hi Guys,

    Need help from SQL experts for my scenario. I have a Inventory table that has:

    - 2 million current month transactions

    - 38 million for previous months translations

    I have multiple UPDATE statements each UPDATE is updating this Inventory table but only for current month and each update uses join with a different tables to get values to set.

    The issue I am having is that the updates are taking too long as each time the Inventory table is scanned for current month which means scanning all those 30 million rows for other months also to get current month. Though in update I am using the condition where month= <current month>.

    How can I make this work faster. I am thinking about non-clustered index but that I am not sure if index works for updates

    Any other suggestion would be appreciated

    Some suggestions:

    -- Get a faster machine.

    -- Get more memory.

    -- Delete the 38 millions rows for previous months.

    -- Make the rows less wide.

    -- Sometimes even removing indexes will help.

    On the more serious side, some more information is needed.

    That informaton is lacking.

    So I am going to make some assumptions and give advise based on that.

    Assumption:

    1. The table is not clustered on the specific date field.

    2. The table is not partitioned on the specific date field.

    SQL-server does not know where the 2 million current rows are. They could be anywhere in the table, so needing 2 rows of every 38, will result in a full table scan.

    If the table is clustered on something different than the date field, the 'current' rows will be spread over the table. So almost all segments and most pages will contain some of the current rows. Even an index will not really help in such a situation.

    If it is a heap and the current month is written as last. Most of the rows will be gathered together at the 'current' part of the heap. An index can help in such a situation, the current rows can be 'grouped' near each other, so less segments/pages have to be read. Be carefull the current rows can still be written in the 'older' parts of the heap, this depends of the usage of the table.

    Suggestions:

    1. Make a separate table of the current month.

    2. Cluster the table on the date.

    3. (Mentioned for completeness, I would not suggest this: partition on the months).

    4. Create a field which holds the status of a row and use this field as the first field in a clusterd index. (For example the status could be a 'short' field for new/to be processed/done/completely done/history).

    Without some more detailed information it is not possible to give an specific advise.

    Ben

  • Thanks Folks. Some info.

    Table currently has no clustered or non clustered Index. It has some FK constraints for the tables used in joins. Also, I can't delete old months data. It is needed.

  • Then the first index I'd put on the table is the clustered index. And, if you're going to be using this month field for lots of deletes, it might make sense to make that the clustered key, but that's just a guess. I'd want to know what the most common access path to the data is, which column(s) is used most frequently to filter the data. That's usually the best choice for a clustered index.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the feedback. Looks like the advice is to create index on transaction date column. Question ?

    1) Should I create clustered or nonclusterd Index. I understand their differenc between them but, in this scenario how to decided between the two ?

    2) There will be duplicate values in the columns for each months. I know clustered index doesn't enforce uniqueness but will it be ideal in this case ?

    3) I do inserts first then updates to other fields (so not to transaction date field. If it just in where clause). Do I need to drop and re-create index at any point. If yes when should I drop and when re-created.

  • Without seeing the actual structure and queries, these are, at best, suggestions for testing.

    rhans (12/10/2015)


    Thanks for the feedback. Looks like the advice is to create index on transaction date column. Question ?

    1) Should I create clustered or nonclusterd Index. I understand their differenc between them but, in this scenario how to decided between the two ?

    I'd say a clustered index since this table doesn't have one for some reason. Almost every table you have should have a clustered index.

    2) There will be duplicate values in the columns for each months. I know clustered index doesn't enforce uniqueness but will it be ideal in this case ?

    For many reasons a unique clustered index is preferred, but it's not necessary. The main point of the cluster would be to have it be the primary method of accessing your data. If this column is not, you may want to put the clustered index on a different column(s)

    3) I do inserts first then updates to other fields (so not to transaction date field. If it just in where clause). Do I need to drop and re-create index at any point. If yes when should I drop and when re-created.

    For most tables, most of the time, I don't recommend dropping and recreating the index. This is especially true of the clustered index. That said, it really depends on your processing. Remember, indexes can help UPDATE operations.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • ... I have multiple UPDATE statements each UPDATE is updating this Inventory table but only for current month and each update uses join with a different tables to get values to set....

    Explain what is meant by "multiple UPDATE statements". If you can possibly perform all required column updates within context of a single statement, then that will prevent multiple passes. Also, in addition to indexing WHERE clause on month, confirm that your table joins are indexed. Also, assuming tables are properly indexed, for mass updates (ie: 1 million+ rows), the majority of processing time is typically I/O required for transaction logging.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Cluster the table on the month.

    I'd include at least the year with the month. Best is to use a column of type date, but if necessary you can use an int column in format YYYYMM. In fact, if the table is only from 2000 on, you could technically use a smallint in format YYMM. I don't recommend that, but it is possible.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thank you guys for all your suggestions. Yes, the column format is YYYYMM. I will add a clustered index on transaction month. I was thinking about filtered Index where month is current month. But, then I read that for filtered Index condition has to be exact and not like a date function to get current date. Current month will change each month. Can't think of a way filtered Index will work here. So, just the Clustered Index on date column.

    To answer other questions....I can't avoid multiple update statements. Each update statement has a join with some table to get the value to fetch, hence multiple updates. It's existing code and don't want to mess with that.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply