Deleting a lot of data

, 2018-04-12

I recently had the task of deleting a bit over a billion rows from a table. Now I could have done just this:

DELETE FROM tablename WHERE createdate >= '1/1/2017'

But I have a few problems here. The table has no index on createdate, potentially causing problems with tempdb (the sort on createdate). Although in this case tempdb is pretty large because of some large batch work done at various times. I’m also going to be deleting > billion rows of ~6 billion which is probably going to fill up the log of the database (which fortunately isn’t in use yet) and end up rolling back my delete anyway. Even if I don’t fill up the log, I’m still going to bloat it pretty badly (autogrowth). And last, and anything but least, this is on a production server. Even if this database was on its own drive (meaning growth of the log can’t cause a problem with any other databases) that tempdb thing (let alone other resource usage) is going to be an issue.

So what do I do?

If I was deleting most of the table there would be several options. For example, dumping just the rows I’m interested in keeping, truncating the table, and putting what I’m keeping back. But since I’m keeping such a large number of rows it would be even worse than just trying to delete the rows I want to get rid of. So since I can’t delete it all, how about I delete a bit of it at a time?

DELETE TOP (2500000) /*2.5 mil*/ FROM tablename WHERE createdate >= '1/1/2017'

I’m going to have to run that a few times though right?

1,000,000,000 / 2,500,000 = 400

The first time I did this I tried using a WHILE loop and @@rowcount.

DELETE TOP (2500000) /*2.5 mil*/ FROM #tablename WHERE createdate >= '1/1/2017'
	DELETE TOP (2500000) /*2.5 mil*/ FROM #tablename WHERE createdate >= '1/1/2017'

Unfortunately, that ran into a slight problem. Since it’s a single batch it counts as a single transaction (as best I can tell) and so I ran into the problems above. In this case, the transaction log ran out of space and all of the deletes had to roll back.

So instead I did this:

DELETE TOP (2500000) /*2.5 mil*/ FROM #tablename WHERE createdate >= '1/1/2017'
GO 401

Well, really I started out doing 50 batches at a time to make sure things went well. Then after doing that I did a quick row count and divided by the 2.5 mil to see how many times I still needed to run the delete. Last but not least the 2.5 million was an educated guess as to the size of the delete that would run fastest. Too large a value and each delete takes too long, too small and you have to run too many deletes. I was also very lucky here that the database is in SIMPLE recovery. Otherwise, I would have had to do a number of batches, log backup, number of batches, log backup etc.

Regardless, it takes a while to delete a billion rows so have fun!





Related content

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.


1,567 reads

Networking - Part 4

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...


1,530 reads

Speaking at Community Events - More Thoughts

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.


360 reads