Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Deleting batches of rows with TOP Expand / Collapse
Author
Message
Posted Thursday, July 03, 2008 3:21 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, June 15, 2012 3:48 AM
Points: 17, Visits: 63
I thought the same... "Why would I ever need to delete 5,000,000 rows from a huge database table?" But then I thought "you never know!" It's not the sort of thing you would do every day (I hope) but it's more likely you'd need to do it in response to some disaster or other. And knowing a way to do so effectively without making the situation worse might save your skin one day.
Post #527872
Posted Thursday, July 03, 2008 5:48 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 4:46 PM
Points: 100, Visits: 390
I like the following a bit better as you don't have to find out how many rows you're going to affect before affecting them. YMMV.
declare @rowcount int
set @rowcount = 1

declare @batchsize int
set @batchsize = 5000

while(@rowcount > 0)
begin
delete top @batchsize from table where foo=1
set @rowcount = @@rowcount
end

Post #527953
Posted Thursday, July 03, 2008 6:08 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, September 04, 2012 2:04 AM
Points: 263, Visits: 324
Anirban Paul (7/3/2008)
Is these one practically useful? I have a doubt. Any comments on these?


I actually did use this about a month ago... we had a new policy implemented that allowed us to delete a couple of years worth of data out of several audit tables. The largest delete was approx 2.5 million rows. Agreed probably not an every day use, but does get us less experienced DBA's thinking beyond getting something to just work.
Post #527963
Posted Thursday, July 03, 2008 6:42 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, March 27, 2014 5:55 AM
Points: 449, Visits: 1,830
On top of these one off deletions there will undoubtedly be cases when such a delete needs to be done. Off the top of my head, I would say a logging table may well need this type of delete to be done.

The original programmer of the TOP option in t-sql had a reason back then, that was a good enough reason to allow this behaviour :P

EDIT: I find the TOP option good when doing updates on production tables which can otherwise cause blocking. It's not always nice, but this is real life and not a walt disney film !



Regards,

WilliamD
Post #527998
Posted Thursday, July 03, 2008 6:53 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, January 08, 2014 4:58 AM
Points: 227, Visits: 268
cjeremy74 (7/3/2008)
I actually did use this about a month ago... we had a new policy implemented that allowed us to delete a couple of years worth of data out of several audit tables. The largest delete was approx 2.5 million rows. Agreed probably not an every day use, but does get us less experienced DBA's thinking beyond getting something to just work.


Same with me, we do have a logging table where all the errors are stored. If there is a small bug in the system, there can be aprox. 3000 rows per second. So if you fix the bug in a few minutes there are still tons of rows in the table and there is an insert for sure if you want to delete the rows all at once.

So it is useful but as I wrote a few rows earlier, I do prefer rowcount but it doesn't matter how you solve a problem as long as you can solve it with as less work for the server as possible.
Post #528010
Posted Thursday, July 03, 2008 7:08 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:46 PM
Points: 35,959, Visits: 30,250
GermanDBA (7/3/2008)
Also, would this not be even better using a numbers table á la Jeff Moden http://www.sqlservercentral.com/articles/TSQL/62867/ ?


Thank you for the reference... but the Tally table isn't a panacea for all looping problems. The loop here is a process control loop that controls how many times a DELETE is used... I don't believe a Tally table could be used here. If someone can think of a way, please post it because THAT would be truly useful. :) Heck, if you can think of a way, write an article on it! :D

The reasons why someone would want to do a delete like this has been totally missed...

The purpose of doing looped Deletes is usually two fold...
1. Do the deletes and still allow the table to be used.
2. Keep the transaction log small.

Item #2 above can only be accomplished if the DB is in the simple recovery mode or there is some additional code in the loop that truncates the log while it is running.

The big thing is Item #1... how to do the (massive) deletes and still allow the table to be used by others. For that, you have to have a delay in the loop and you can do that using WAITFOR DELAY '00:00:10' where the number 10 is some number of seconds. The loop that's listed in the article (and in the thread), whether ROWCOUNT or TOP is used, is fairly agressive and won't necessarily allow other processes to "get in". It's so agressive that you might as well just do the delete all at once. In order for the loop to actually allow other processes to work, the TOP or ROWCOUNT should be small enough to limit any locking to about a second and then the delay should be long enough to allow most other processes to get in and get out... I would think that something between 3 and 10 seconds would be good enough.

Shifting gears a bit... I'm really torn... I very much encourage folks to write articles on this forum... the simple sharing of ideas is what the forum is all about... and, I very much encourage folks that read them to remember that not everyone who can write code can also write a masterpiece article. But, I do feel that there were a large number of points missed in this article and, perhaps, some things in the code that violate what some would consider to be a best practice (you don't need to count loops to do this).

It was pointed out that "For larger tables, where this technique is most effective, you would obviously use larger values for @N and probably constrain on a date column." What a prime opportunity for several explanations... Why is this technique most effective for larger tables? Why would you probably constrain ona a date column? Why would you want to use larger values for @N?

So, like I said, I'm really torn... I really want people to be encouraged to write without fear of being bashed, but I do have to agree with what some have already said... a bit more thought, a bit more research, and a bit more explanation of the "Why" of "implied points made" would have gone far into turning what should have been a casual script submission into a well written and informative article.

Heh... of course, sometimes it's not actually the article that counts... sometimes the discussions that follow are more useful than the original intent of the article. In that case, this article has more than done it's job. ;)


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #528021
Posted Thursday, July 03, 2008 7:09 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, May 11, 2011 1:33 PM
Points: 85, Visits: 299
Thanks for all of your responses.

This method has been helpful to us for purging log tables and history in our data warehouse. I realize that partitioned tables would have been a better way to handle history purges, but we were not able to implement partitioned tables for various reasons. (Maybe Phase II...)

As for why @cnt had to be DECIMAL: Say you had 11 rows to delete, and you set a TOP value of 5. Obviously, you would need 3 iterations of the DELETE to capture the last (1) record. However, when figuring out how many iterations you need, 11/5 = 2.2, which is a DECIMAL (which you use CEILING() on to get the next largest INT). AFAIK, in order to get a decimal result (2.2), the dividend itself must be of type DECIMAL.


---------------------------
|Ted Pin >>
Post #528024
Posted Thursday, July 03, 2008 7:18 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, May 11, 2011 1:33 PM
Points: 85, Visits: 299
@Jeff:

Thanks for your feedback. I will try to create a more well thought-out article for my next submission based on the comments in this discussion. I looked around for an article that guides article writing, but couldn't find one. Would a prolific contributor like to write for us?


---------------------------
|Ted Pin >>
Post #528028
Posted Thursday, July 03, 2008 7:20 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, June 15, 2012 3:48 AM
Points: 17, Visits: 63
Hey- I've learned a whole lot from the discussion - more than I would have done if I'd had to skim read a much longer article.

I'm clearly a much less experienced database person that all you lot (I'm a .Net analyst / developer that seems to have to do more and more database work with every passing day). So short, basic articles that gloss over some of the finer details may not work quite so well for you guys (who already know the basics and more) but they're a big help for me.

(I guess DBAs don't really do "glossing over details" though, eh? That's how millions of rows get deleted by accident! ;))

I've had a quick look but couldn't see if the articles have a 'level' or 'suitable for...' category attached to them. Maybe that would help?
Post #528032
Posted Thursday, July 03, 2008 7:25 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, May 11, 2011 1:33 PM
Points: 85, Visits: 299
david.gerrard (7/3/2008)
Hey- I've learned a whole lot from the discussion - more than I would have done if I'd had to skim read a much longer article.


David, thank you for being the sole clapper in a tough crowd ;) I'm glad this little piece was able to help someone (which was the most I had hoped for). You'll help me get back up on stage for more punishment =D


---------------------------
|Ted Pin >>
Post #528036
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse