How to improve Delete Performance

  • I have an SSIS pkg, where I have three tables to load, with the first step to perform a delete with a date set delete, right now deletes the current and previous month, where the table holds 2.5 yrs of data. These tables currently do not have any keys, constraints of indexes.
    The delete performance seems to take forever, I've googled around and there seems to be a wide range of ways to improve performance, I'm looking for the easiest low hanging fruit approach to give these transactions a boost.
    Thanks

  • You don't provide rowcounts, but a large delete may be faster if done in chunks.

    WHILE 1=1 BEGIN
        DELETE TOP (10000) FROM table WHERE condition;
        IF @@ROWCOUNT = 0        BREAK;
    END

    You also don't provide the recovery model.  Breaking a large DELETE into pieces is especially helpful in SIMPLE recovery databases, as i reduces the strain on the transaction log.  It also helps if "condition" is something that can be satisfied with a clustered index, such as a date range.

    If you're deleting a large percentage of the table, say 90+ percent, it may be faster to copy the rows you want to keep to a temp table, truncate the main table, and then reload from the temp table.

  • Scott Coleman - Monday, July 16, 2018 9:41 AM

    You don't provide rowcounts, but a large delete may be faster if done in chunks.

    WHILE 1=1 BEGIN
        DELETE TOP (10000) FROM table WHERE condition;
        IF @@ROWCOUNT = 0        BREAK;
    END

    You also don't provide the recovery model.  Breaking a large DELETE into pieces is especially helpful in SIMPLE recovery databases, as i reduces the strain on the transaction log.  It also helps if "condition" is something that can be satisfied with a clustered index, such as a date range.

    If you're deleting a large percentage of the table, say 90+ percent, it may be faster to copy the rows you want to keep to a temp table, truncate the main table, and then reload from the temp table.

    36.5M rows for each of the 3 tables.

  • quinn.jay - Monday, July 16, 2018 9:57 AM

    Scott Coleman - Monday, July 16, 2018 9:41 AM

    You don't provide rowcounts, but a large delete may be faster if done in chunks.

    WHILE 1=1 BEGIN
        DELETE TOP (10000) FROM table WHERE condition;
        IF @@ROWCOUNT = 0        BREAK;
    END

    You also don't provide the recovery model.  Breaking a large DELETE into pieces is especially helpful in SIMPLE recovery databases, as i reduces the strain on the transaction log.  It also helps if "condition" is something that can be satisfied with a clustered index, such as a date range.

    If you're deleting a large percentage of the table, say 90+ percent, it may be faster to copy the rows you want to keep to a temp table, truncate the main table, and then reload from the temp table.

    36.5M rows for each of the 3 tables.

    in addition,thats two months of data, the table hold 2.5 yrs worth at this time

  • You have no indexes.  So for EVERY select/update/delete, SQL has to do a full scan of the entire table in order to find the records to action.
    In order to speed up the delete, create a clustered index on the date field.

  • quinn.jay - Monday, July 16, 2018 10:08 AM

    quinn.jay - Monday, July 16, 2018 9:57 AM

    Scott Coleman - Monday, July 16, 2018 9:41 AM

    You don't provide rowcounts, but a large delete may be faster if done in chunks.

    WHILE 1=1 BEGIN
        DELETE TOP (10000) FROM table WHERE condition;
        IF @@ROWCOUNT = 0        BREAK;
    END

    You also don't provide the recovery model.  Breaking a large DELETE into pieces is especially helpful in SIMPLE recovery databases, as i reduces the strain on the transaction log.  It also helps if "condition" is something that can be satisfied with a clustered index, such as a date range.

    If you're deleting a large percentage of the table, say 90+ percent, it may be faster to copy the rows you want to keep to a temp table, truncate the main table, and then reload from the temp table.

    36.5M rows for each of the 3 tables.

    in addition,thats two months of data, the table hold 2.5 yrs worth at this time

    With that many records, partitioning might be a good option:

    https://docs.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-2017

    I believe that's a little more involved than a low hanging fruit solution though.

  • I don't quite understand your process.

    You say the table has 2 1/2 years of data, so for simplicity let's say all of 2016, 2017 and the first half of 2018.  Maybe that's off by a month or two in either direction, but for the sake of argument it doesn't matter.

    Then you say your process is to delete the current and past month, for simplicity June and July 2018.  Then I assume you reload this data as part of your process from some source.

    Doesn't that mean all data from 2016 through May 2018 never gets touched?  Is that the intention?  Why not archive the non-current data into another table where it can reside untouched by your process.  As you roll over from month to month, you'd move the oldest data from "current" to "archive".  Then your delete and reload process for the current data simply becomes a truncate and load.

    You'd need to revise any queries that access the data to get it from both "archive" and "current" tables.  That might be a considerable task.  It depends on how many processes act upon all those rows of "archive" data.

  • quinn.jay - Monday, July 16, 2018 9:29 AM

    I have an SSIS pkg, where I have three tables to load, with the first step to perform a delete with a date set delete, right now deletes the current and previous month, where the table holds 2.5 yrs of data. These tables currently do not have any keys, constraints of indexes.
    The delete performance seems to take forever, I've googled around and there seems to be a wide range of ways to improve performance, I'm looking for the easiest low hanging fruit approach to give these transactions a boost.
    Thanks

    You have 36.5M rows for each of the 3 tables and these tables have no indexes?  I'm thinking that you've identified the tool you need to impart to make this process faster.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • right now deletes the current and previous month, where the table holds 2.5 yrs of data. These tables currently do not have any keys, constraints o[r] indexes. 

    Cluster the tables on that date.  That will drastically speed up the DELETEs.  You don't really need to partition the table.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Better yet.... stop doing the deletes and learn how to do a proper "UPSERT" on a properly clustered/index table!  Do you really think that many rows from the previous month are going to change where it's more effective to delete rather than to selectively update and insert?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I ended up going with a tweak on the DELETE statement coupled with clustered and non clustered indexes on the tables, and drastically slashed the delete time. For this cube and size, this is livable, For the next cube where I'll copy this and mod, there is far greater volume of data. So I'm going to incorporate much of what's been recommended outside this easy solution, like partition the table, cluster/noncluster indexes, but also Upsert and avoid the Delete. Thank you all for the help and advise

  • quinn.jay - Tuesday, July 17, 2018 12:43 PM

    I ended up going with a tweak on the DELETE statement coupled with clustered and non clustered indexes on the tables, and drastically slashed the delete time. For this cube and size, this is livable, For the next cube where I'll copy this and mod, there is far greater volume of data. So I'm going to incorporate much of what's been recommended outside this easy solution, like partition the table, cluster/noncluster indexes, but also Upsert and avoid the Delete. Thank you all for the help and advise

    Cool.  Thanks for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • quinn.jay - Tuesday, July 17, 2018 12:43 PM

    I ended up going with a tweak on the DELETE statement coupled with clustered and non clustered indexes on the tables, and drastically slashed the delete time. For this cube and size, this is livable, For the next cube where I'll copy this and mod, there is far greater volume of data. So I'm going to incorporate much of what's been recommended outside this easy solution, like partition the table, cluster/noncluster indexes, but also Upsert and avoid the Delete. Thank you all for the help and advise

    Are you sure you need a non-clustered index, let alone multiple ones?  If your query specifies a reasonable date range, you won't need a non-clus index to support it.

    Are you on an edition of SQL that allows you to use data compression?  If so, that is a huge help with large+ tables.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply