How to improve Delete Performance

Question

How to improve Delete Performance

quinn.jay

SSCertifiable

Points: 5505
More actions
July 16, 2018 at 9:30 am

#354509

I have an SSIS pkg, where I have three tables to load, with the first step to perform a delete with a date set delete, right now deletes the current and previous month, where the table holds 2.5 yrs of data. These tables currently do not have any keys, constraints of indexes.
The delete performance seems to take forever, I've googled around and there seems to be a wide range of ways to improve performance, I'm looking for the easiest low hanging fruit approach to give these transactions a boost.
Thanks

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Scott Coleman One Orange Chip Points: 27452 More actions · Answer 1

You don't provide rowcounts, but a large delete may be faster if done in chunks.
WHILE 1=1 BEGIN DELETE TOP (10000) FROM table WHERE condition; IF @@ROWCOUNT = 0 BREAK; END
You also don't provide the recovery model. Breaking a large DELETE into pieces is especially helpful in SIMPLE recovery databases, as i reduces the strain on the transaction log. It also helps if "condition" is something that can be satisfied with a clustered index, such as a date range.

If you're deleting a large percentage of the table, say 90+ percent, it may be faster to copy the rows you want to keep to a temp table, truncate the main table, and then reload from the temp table.

quinn.jay SSCertifiable Points: 5505 More actions · Answer 2

Scott Coleman - Monday, July 16, 2018 9:41 AM
You don't provide rowcounts, but a large delete may be faster if done in chunks.
WHILE 1=1 BEGIN DELETE TOP (10000) FROM table WHERE condition; IF @@ROWCOUNT = 0 BREAK; END
You also don't provide the recovery model. Breaking a large DELETE into pieces is especially helpful in SIMPLE recovery databases, as i reduces the strain on the transaction log. It also helps if "condition" is something that can be satisfied with a clustered index, such as a date range.
If you're deleting a large percentage of the table, say 90+ percent, it may be faster to copy the rows you want to keep to a temp table, truncate the main table, and then reload from the temp table.

36.5M rows for each of the 3 tables.

quinn.jay SSCertifiable Points: 5505 More actions · Answer 3

quinn.jay - Monday, July 16, 2018 9:57 AM
Scott Coleman - Monday, July 16, 2018 9:41 AM
You don't provide rowcounts, but a large delete may be faster if done in chunks.
WHILE 1=1 BEGIN DELETE TOP (10000) FROM table WHERE condition; IF @@ROWCOUNT = 0 BREAK; END
You also don't provide the recovery model. Breaking a large DELETE into pieces is especially helpful in SIMPLE recovery databases, as i reduces the strain on the transaction log. It also helps if "condition" is something that can be satisfied with a clustered index, such as a date range.
If you're deleting a large percentage of the table, say 90+ percent, it may be faster to copy the rows you want to keep to a temp table, truncate the main table, and then reload from the temp table.
36.5M rows for each of the 3 tables.

in addition,thats two months of data, the table hold 2.5 yrs worth at this time

DesNorton SSC-Insane Points: 24278 More actions · Answer 4

You have no indexes. So for EVERY select/update/delete, SQL has to do a full scan of the entire table in order to find the records to action.
In order to speed up the delete, create a clustered index on the date field.

How to post data/code on a forum to get the best help.

sestell1 SSChampion Points: 10230 More actions · Answer 5

quinn.jay - Monday, July 16, 2018 10:08 AM
quinn.jay - Monday, July 16, 2018 9:57 AM
Scott Coleman - Monday, July 16, 2018 9:41 AM
You don't provide rowcounts, but a large delete may be faster if done in chunks.
WHILE 1=1 BEGIN DELETE TOP (10000) FROM table WHERE condition; IF @@ROWCOUNT = 0 BREAK; END
You also don't provide the recovery model. Breaking a large DELETE into pieces is especially helpful in SIMPLE recovery databases, as i reduces the strain on the transaction log. It also helps if "condition" is something that can be satisfied with a clustered index, such as a date range.
If you're deleting a large percentage of the table, say 90+ percent, it may be faster to copy the rows you want to keep to a temp table, truncate the main table, and then reload from the temp table.
36.5M rows for each of the 3 tables.
in addition,thats two months of data, the table hold 2.5 yrs worth at this time

With that many records, partitioning might be a good option:

https://docs.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-2017

I believe that's a little more involved than a low hanging fruit solution though.

GaryV Hall of Fame Points: 3307 More actions · Answer 6

I don't quite understand your process.

You say the table has 2 1/2 years of data, so for simplicity let's say all of 2016, 2017 and the first half of 2018. Maybe that's off by a month or two in either direction, but for the sake of argument it doesn't matter.

Then you say your process is to delete the current and past month, for simplicity June and July 2018. Then I assume you reload this data as part of your process from some source.

Doesn't that mean all data from 2016 through May 2018 never gets touched? Is that the intention? Why not archive the non-current data into another table where it can reside untouched by your process. As you roll over from month to month, you'd move the oldest data from "current" to "archive". Then your delete and reload process for the current data simply becomes a truncate and load.

You'd need to revise any queries that access the data to get it from both "archive" and "current" tables. That might be a considerable task. It depends on how many processes act upon all those rows of "archive" data.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 7

quinn.jay - Monday, July 16, 2018 9:29 AM
I have an SSIS pkg, where I have three tables to load, with the first step to perform a delete with a date set delete, right now deletes the current and previous month, where the table holds 2.5 yrs of data. These tables currently do not have any keys, constraints of indexes.
The delete performance seems to take forever, I've googled around and there seems to be a wide range of ways to improve performance, I'm looking for the easiest low hanging fruit approach to give these transactions a boost.
Thanks

You have 36.5M rows for each of the 3 tables and these tables have no indexes? I'm thinking that you've identified the tool you need to impart to make this process faster.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 100948 More actions · Answer 8

right now deletes the current and previous month, where the table holds 2.5 yrs of data. These tables currently do not have any keys, constraints o[r] indexes.

Cluster the tables on that date. That will drastically speed up the DELETEs. You don't really need to partition the table.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 9

Better yet.... stop doing the deletes and learn how to do a proper "UPSERT" on a properly clustered/index table! Do you really think that many rows from the previous month are going to change where it's more effective to delete rather than to selectively update and insert?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

quinn.jay SSCertifiable Points: 5505 More actions · Answer 10

I ended up going with a tweak on the DELETE statement coupled with clustered and non clustered indexes on the tables, and drastically slashed the delete time. For this cube and size, this is livable, For the next cube where I'll copy this and mod, there is far greater volume of data. So I'm going to incorporate much of what's been recommended outside this easy solution, like partition the table, cluster/noncluster indexes, but also Upsert and avoid the Delete. Thank you all for the help and advise

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 11

quinn.jay - Tuesday, July 17, 2018 12:43 PM
I ended up going with a tweak on the DELETE statement coupled with clustered and non clustered indexes on the tables, and drastically slashed the delete time. For this cube and size, this is livable, For the next cube where I'll copy this and mod, there is far greater volume of data. So I'm going to incorporate much of what's been recommended outside this easy solution, like partition the table, cluster/noncluster indexes, but also Upsert and avoid the Delete. Thank you all for the help and advise

Cool. Thanks for the feedback.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 100948 More actions · Answer 12

quinn.jay - Tuesday, July 17, 2018 12:43 PM
I ended up going with a tweak on the DELETE statement coupled with clustered and non clustered indexes on the tables, and drastically slashed the delete time. For this cube and size, this is livable, For the next cube where I'll copy this and mod, there is far greater volume of data. So I'm going to incorporate much of what's been recommended outside this easy solution, like partition the table, cluster/noncluster indexes, but also Upsert and avoid the Delete. Thank you all for the help and advise

Are you sure you need a non-clustered index, let alone multiple ones? If your query specifies a reasonable date range, you won't need a non-clus index to support it.

Are you on an edition of SQL that allows you to use data compression? If so, that is a huge help with large+ tables.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.