SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Deleting a large amount of data from a table


Deleting a large amount of data from a table

Author
Message
MyDoggieJessie
MyDoggieJessie
One Orange Chip
One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)

Group: General Forum Members
Points: 27078 Visits: 7488
First of all, I inherited this problem and had no part in how it was originally designed...just left with trying to resolve the issue!)

I have a rather wide table (137 columns) that has several years of data totalling 56+ million records. One of the indexes on this particular table is 71GB and due to other large amounts of data in similar tables/databases, we're running out of space on the server and cannot rebuild this index (note: adding additional storage at this point is not an option).

I've been given permisson to remove any data beyond 25 months (approx. 13.5 million records) and have built the following script which will delete data in 1,000,000 row increments, but it's still taking FOREVER to run (deletions in batches of 10,000 rows)

Any constructive suggestions anyone has to prune historical data more efficiently is GREATLY appreciated

Is there a more efficient way to prune historical data or is this method pretty much it and I just have to wait it out?

SET NOCOUNT ON
GO

DECLARE
@batchsize int, @ArchiveDate varchar(50), @Error int, @RowCount int, @LoopCount int

SELECT
@ArchiveDate = CONVERT(varchar(50), DATEADD(mm, -25, GETDATE()) , 101),
@BatchSize = 10000, @RowCount = 0, @LoopCount = 0

WHILE @batchsize <> 0
BEGIN
/* Delete the data */
DELETE TOP (10000) FROM DW_LOAD.dbo.PR_TRANS_IMPORT WITH(ROWLOCK)
WHERE [pos-date] < @ArchiveDate

SELECT @Error = @@ERROR, @batchsize = @@ROWCOUNT
SELECT @RowCount = @RowCount + @batchsize, @LoopCount = @LoopCount + 1

IF @LoopCount = 10
BEGIN
CHECKPOINT
PRINT('CHECKPOINT REACHED (100,000 rows deleted)')
SET @LoopCount = 0
END
PRINT ('Records Deleted: ' + CAST(@Batchsize as varchar(25)) + ', Total Count: ' + CAST(@RowCount as varchar(25)))
IF @RowCount = 1000000
BEGIN
BREAK
END
ELSE
BEGIN
CONTINUE
END
END

SET NOCOUNT OFF
GO

______________________________________________________________________________
Never argue with an idiot; Theyll drag you down to their level and beat you with experience

UMG Developer
UMG Developer
SSCertifiable
SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)

Group: General Forum Members
Points: 7296 Visits: 2204
It would help to know the table and index structures.

Some important things to know/help:
* How long is "FOREVER"?
* Is the table a heap or does it have a clustered index?
* Is [pos-date] indexed?
* Are you running Enterprise edition?
* What is the recovery model?
* What is the storage for the DB and Log files? (RAID and number of spindles)
* Is there other significant load at the same time as you are deleting?
MyDoggieJessie
MyDoggieJessie
One Orange Chip
One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)

Group: General Forum Members
Points: 27078 Visits: 7488
Oops, great questions...sorry I forgot to include more relevant information

* How long is "FOREVER"?
- 33 minutes to delete about 290,000 records (a long way to 13+ million)

* Is the table a heap or does it have a clustered index?
- it has a clustered index (which is 71GB)

* Is [pos-date] indexed?
- Yes, it is

* Are you running Enterprise edition?
- SQL 2005, SP2, Standard

* What is the recovery model?
- Simple

* What is the storage for the DB and Log files? (RAID and number of spindles)
- Unsure about spindles but it's all on a RAID 5 storage, there is ample room from growth within the data and log files

* Is there other significant load at the same time as you are deleting?
- No, this is the only activity

______________________________________________________________________________
Never argue with an idiot; Theyll drag you down to their level and beat you with experience

UMG Developer
UMG Developer
SSCertifiable
SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)

Group: General Forum Members
Points: 7296 Visits: 2204
Since there are no other users during the delete try removing the row lock hint, you might even put a table lock hint. (Less work if it doesn't have to maintain as many locks.)

How about an actual execution plan for the DELETE query? If the optimizer has gone for a table scan most of your time might be spent finding the rows to delete, in which case a larger batch (say 500,000 records) might work out to be faster. (In some cases I have used a batch size of 1,000,000 rows.)

Can you drop all the non-clustered indexes other than the one on pos-date, and then re-build them afterwords? If the optimizer isn't using the pos-date index, then you can drop it too. (Saves all of the time and disk I/O of keeping them up-to-date.) (Or you can just disable all of the indexes, and then rebuild/re-eanble them afterwards. That way you don't have to script out the index definitions.)

I was going to suggest using a filtered index on pos-date since you posted in a SQL Server 2008 forum, but since you only have 2005 that isn't an option.

You could always go the route of exporting all the data and bringing it back in, sometimes that is faster than deleting. (If you had the space creating a new table and inserting the records into it and dropping the old table would be the fastest way to go.)
MyDoggieJessie
MyDoggieJessie
One Orange Chip
One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)

Group: General Forum Members
Points: 27078 Visits: 7488
Thanks for the tips! I have exceeded my maintenance window for tonight but will try this tomorrow afternoon once corporate reporting has finished for the day


How about an actual execution plan for the DELETE query?
* I didn't noticed many table scans but did notice index updates costing 30% for each iteration in the loop...so I think your thought on disabling the index may be a decent route to go - I will try this and see how that works first

You could always go the route of exporting all the data and bringing it back in, sometimes that is faster than deleting. (If you had the space creating a new table and inserting the records into it and dropping the old table would be the fastest way to go.)
* If I did choose this route, I'd first have to delete all the non-clustered indexes (to free up the space) then recreate them all on the new table after succesful importing into the new table - correct?

Thanks, you've been a great help. I will try these two things tomorrow to see how it works.

______________________________________________________________________________
Never argue with an idiot; Theyll drag you down to their level and beat you with experience

a.rajmane
a.rajmane
SSC-Enthusiastic
SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)

Group: General Forum Members
Points: 175 Visits: 40
partitioning would help greatly to delete large amount of data.

www.sqlsuperfast.com
MyDoggieJessie
MyDoggieJessie
One Orange Chip
One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)

Group: General Forum Members
Points: 27078 Visits: 7488
I have very little experience with partitioning...could you provide me with some good links on either instructional videos or good articles so I can learn about it more quickly?

______________________________________________________________________________
Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Oliiii
Oliiii
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2810 Visits: 777
You could try different things:
Remove all indexes beside clustered and on pos-date
Try smaller batches (100, 1000, 5000, 10000) and see which is fastest (on some system i had much better performance deleting by tiny batches rather than deleting by big batches).

If you have enough space on another drive (or another server) do a bcp queryout (only the rows you need to keep), truncate your table and bcp in. No need to get rid of the indexes, they'll be rebuilt once the bcp in is done (or after each batches if you set a batch size).
If you have a tight schedule you can remove all indexes but the clustered and then recreate them starting with the most important and if you run out of time recreate the less important at a later time.

You'll need an Enterprise Edition to try out partitioning.
MyDoggieJessie
MyDoggieJessie
One Orange Chip
One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)

Group: General Forum Members
Points: 27078 Visits: 7488
"If you have enough space on another drive (or another server) do a bcp queryout (only the rows you need to keep), truncate your table and bcp in. No need to get rid of the indexes, they'll be rebuilt once the bcp in is done (or after each batches if you set a batch size)."

This method has been working really well, however when i got to a larger table (i.e 21,000,000+ rows), while the out ran quickly, the in hung near the end ... and has been sitting there for about 45 minutes.

In checking the log file, I see considerable growth (it grew about 28GB).

I assume that after it inserted the data, it's rebuilding all the indexes?

Would a better approach be to script out the indexes first, then truncate the table, bcp data back in, then recreate all the indexes?

Would that bypass this excessive logging?

______________________________________________________________________________
Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Oliiii
Oliiii
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2810 Visits: 777
Yes it's rebuilding all the non-clutered indexes.
Rebuilding the indexes by hand would remove the logging (it would only log which pages have been allocated but that doesn't take much room).
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search