Deleting batches of rows with TOP

  • GermanDBA (7/3/2008)


    Also, would this not be even better using a numbers table รก la Jeff Moden http://www.sqlservercentral.com/articles/TSQL/62867/ ?

    Thank you for the reference... but the Tally table isn't a panacea for all looping problems. The loop here is a process control loop that controls how many times a DELETE is used... I don't believe a Tally table could be used here. If someone can think of a way, please post it because THAT would be truly useful. ๐Ÿ™‚ Heck, if you can think of a way, write an article on it! ๐Ÿ˜€

    The reasons why someone would want to do a delete like this has been totally missed...

    The purpose of doing looped Deletes is usually two fold...

    1. Do the deletes and still allow the table to be used.

    2. Keep the transaction log small.

    Item #2 above can only be accomplished if the DB is in the simple recovery mode or there is some additional code in the loop that truncates the log while it is running.

    The big thing is Item #1... how to do the (massive) deletes and still allow the table to be used by others. For that, you have to have a delay in the loop and you can do that using WAITFOR DELAY '00:00:10' where the number 10 is some number of seconds. The loop that's listed in the article (and in the thread), whether ROWCOUNT or TOP is used, is fairly agressive and won't necessarily allow other processes to "get in". It's so agressive that you might as well just do the delete all at once. In order for the loop to actually allow other processes to work, the TOP or ROWCOUNT should be small enough to limit any locking to about a second and then the delay should be long enough to allow most other processes to get in and get out... I would think that something between 3 and 10 seconds would be good enough.

    Shifting gears a bit... I'm really torn... I very much encourage folks to write articles on this forum... the simple sharing of ideas is what the forum is all about... and, I very much encourage folks that read them to remember that not everyone who can write code can also write a masterpiece article. But, I do feel that there were a large number of points missed in this article and, perhaps, some things in the code that violate what some would consider to be a best practice (you don't need to count loops to do this).

    It was pointed out that "For larger tables, where this technique is most effective, you would obviously use larger values for @N and probably constrain on a date column." What a prime opportunity for several explanations... Why is this technique most effective for larger tables? Why would you probably constrain ona a date column? Why would you want to use larger values for @N?

    So, like I said, I'm really torn... I really want people to be encouraged to write without fear of being bashed, but I do have to agree with what some have already said... a bit more thought, a bit more research, and a bit more explanation of the "Why" of "implied points made" would have gone far into turning what should have been a casual script submission into a well written and informative article.

    Heh... of course, sometimes it's not actually the article that counts... sometimes the discussions that follow are more useful than the original intent of the article. In that case, this article has more than done it's job. ๐Ÿ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for all of your responses.

    This method has been helpful to us for purging log tables and history in our data warehouse. I realize that partitioned tables would have been a better way to handle history purges, but we were not able to implement partitioned tables for various reasons. (Maybe Phase II...)

    As for why @cnt had to be DECIMAL: Say you had 11 rows to delete, and you set a TOP value of 5. Obviously, you would need 3 iterations of the DELETE to capture the last (1) record. However, when figuring out how many iterations you need, 11/5 = 2.2, which is a DECIMAL (which you use CEILING() on to get the next largest INT). AFAIK, in order to get a decimal result (2.2), the dividend itself must be of type DECIMAL.

    ---------------------------
    |Ted Pin >>

  • @jeff:

    Thanks for your feedback. I will try to create a more well thought-out article for my next submission based on the comments in this discussion. I looked around for an article that guides article writing, but couldn't find one. Would a prolific contributor like to write for us? :hehe:

    ---------------------------
    |Ted Pin >>

  • Hey- I've learned a whole lot from the discussion - more than I would have done if I'd had to skim read a much longer article.

    I'm clearly a much less experienced database person that all you lot (I'm a .Net analyst / developer that seems to have to do more and more database work with every passing day). So short, basic articles that gloss over some of the finer details may not work quite so well for you guys (who already know the basics and more) but they're a big help for me.

    (I guess DBAs don't really do "glossing over details" though, eh? That's how millions of rows get deleted by accident! ;))

    I've had a quick look but couldn't see if the articles have a 'level' or 'suitable for...' category attached to them. Maybe that would help?

  • david.gerrard (7/3/2008)


    Hey- I've learned a whole lot from the discussion - more than I would have done if I'd had to skim read a much longer article.

    David, thank you for being the sole clapper in a tough crowd ๐Ÿ˜‰ I'm glad this little piece was able to help someone (which was the most I had hoped for). You'll help me get back up on stage for more punishment =D

    ---------------------------
    |Ted Pin >>

  • Hi Ted,

    please don't take the comments as punishment. I think that all comments/questions with regard to your article were to clarify any points that your target audience couldn't get from just reading the article.

    ..... but then again. If you enjoy punishment we can do that too ๐Ÿ˜€

    Regards,

    WilliamD

  • Ted Pin (7/3/2008)


    @Jeff:

    Thanks for your feedback. I will try to create a more well thought-out article for my next submission based on the comments in this discussion. I looked around for an article that guides article writing, but couldn't find one. Would a prolific contributor like to write for us? :hehe:

    You mean the "Sexy Enough for Dinner Conversation?" site? No, I'll pass... I like to get all my "punishment" through a single source. ๐Ÿ™‚ But I very much appreciate the offer.

    You seem to write a lot on that site, Ted... in fact, quite the writer you are. So, I'm curious... Why did you shortcut this article on "deletes" so badly?

    Also, just to be sure... the intent of my comments was not to punish... they were meant to be suggestions. But, just like one of your articles said, "Because IT lacks an inherent human element,..." it is sometimes difficult to make a suggestion or comment on something without coming across that way. Sorry...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • david.gerrard (7/3/2008)


    Hey- I've learned a whole lot from the discussion - more than I would have done if I'd had to skim read a much longer article.

    I'm not sure I understand that... you'll read the discussions, the total of which is a dozen times longer than the article, and interact on some of those discussions... yet, you would only "skim read" a much longer article? Then, you may want to scan the "scripts" section of this forum... most everything there is about the same length as this "article".

    Just to clarify... my main objection has nothing to do with the length of the article. My objection is because the code doesn't necessarily provide the "nice availability and logging benefits" stated in the first paragraph of the article and additional research before writing the article would have shown that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/3/2008)


    Ted Pin (7/3/2008)[hrI looked around for an article that guides article writing, but couldn't find one. Would a prolific contributor like to write for us? :hehe:

    You mean the "Sexy Enough for Dinner Conversation?" site? No, I'll pass... I like to get all my "punishment" through a single source. ๐Ÿ™‚ But I very much appreciate the offer.

    You seem to write a lot on that site, Ted... in fact, quite the writer you are. So, I'm curious... Why did you shortcut this article on "deletes" so badly?

    Also, just to be sure... the intent of my comments was not to punish... they were meant to be suggestions. But, just like one of your articles said, "Because IT lacks an inherent human element,..." it is sometimes difficult to make a suggestion or comment on something without coming across that way. Sorry...

    Oops, I meant "Would a prolific contributor like to write [a guide to writing articles] for [the SQLServerCentral.com community]?"

    Thanks for looking at my blog. To answer your question, I'm not really sure why I missed the mark so badly. I suppose I've had a lot of practice writing editorial/essay-style articles but remain very ignorant of technical article standards. This I hope to rectify, thanks in no small part to your, and others', well-directed feedback.

    GermanDBA, my wife thinks I am a stickler for punishment but maybe that's because it works to her advantage =D In all truth, I very much appreciate the comments made herein. I, like many "writers," continually learn to take criticism and try to improve. The response to this article has really been a wake up call for me and that can only be a good thing.

    ---------------------------
    |Ted Pin >>

  • The purpose of doing looped Deletes is usually two fold...

    1. Do the deletes and still allow the table to be used.

    2. Keep the transaction log small.

    Item #2 above can only be accomplished if the DB is in the simple recovery mode or there is some additional code in the loop that truncates the log while it is running.

    Going with #2, if the database is in Bulk-Logging or full recovery model, a transaction log backup is what really needs to be there to limit the growth of the t-log. Just wanted to expand on this (even though backup log with truncate only is being depreciated, someone may do that instead of actually backing up the log).

    ๐Ÿ˜Ž

  • The example was a bit confusing, mostly because "DELETE TOP(2)" threw me off, but this would be very useful in deleting large amount of data on a production system. I deal with this kind of thing regularly, for archiving tracking information. Keeping tables like this clean is essential in having a smoothly running mechanism with lightening fast inserts/updates.

    What about using something like this? It may be a bit slower using the EXISTS clause, but it would still avoid many of the locking pitfalls of deleting large record sets simultaneously:

    while exists (select col1 from tab1) delete top(2) from tab1

  • I meant that (as a relative begginer - certainly in terms of Database Admin if not design) it's actually easier to be introduced to a topic quickly at a high level than have to work through a long article that's packed with detail, where I would probably not have had the time to take all the finer points in.

    As I'm not a DBA I hadn't even considered the issue of having to delete millions of rows at once. (I'm the person whos buggy code fills the table with rubbish, and haven't had to go thru the pain of taking it all out again... :)) However - today I've not only learned that there's a real performance benefit to deleting in batches, but I've also learned a great number of the finer points of actually doing so because of the discussion all you more knowlegeable guys have had afterwards.

    In my latest job, we've potentially got a lot of cleansing work to do, so now I can make sure your best practices are brought to the table when we do it.

  • Ted Pin (7/3/2008)


    Oops, I meant "Would a prolific contributor like to write [a guide to writing articles] for [the SQLServerCentral.com community]?"

    Nah... I'd get it wrong... I think I need to find a couple of URLs to point folks to on how to write technical articles.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff: You comments about using the wait has me thinking. (Always dangerous.) I understood why one would break up a delete of a lot of rows, but I didn't think one would need a 'wait' to let other statement 'get in'.

    Here's how I would think it would work: There is a que of requests to get things done and the requests are taken one at a time. The delete statements are adding requests very quickly, but if any other requests come in during that time, they get added to the que and would run in their turn, making other delete requests wait their turn.

    So, why would a 'wait' statement really help anything? If there is say a 10 second delay between each delete statement and no other requests come in during that time, then it is 10 seconds of wasted time. And if another request had come in during say delete iteration #2, even if delete iteration #3 pops up right away, the other request would run between iteration #2 and #3. I would think.

    I really don't know how it works. (Please don't anyone take the above statements as a description of the way SQL Server works.) I'm just trying to explain why your recommendation to use a wait command is confusing to me. It doesn't fit with my understanding of how SQL Server works.

    Thanks,

    - JJ

  • JJ B (7/3/2008)


    Jeff: You comments about using the wait has me thinking. (Always dangerous.) I understood why one would break up a delete of a lot of rows, but I didn't think one would need a 'wait' to let other statement 'get in'.

    Here's how I would think it would work: There is a que of requests to get things done and the requests are taken one at a time. The delete statements are adding requests very quickly, but if any other requests come in during that time, they get added to the que and would run in their turn, making other delete requests wait their turn.

    So, why would a 'wait' statement really help anything? If there is say a 10 second delay between each delete statement and no other requests come in during that time, then it is 10 seconds of wasted time. And if another request had come in during say delete iteration #2, even if delete iteration #3 pops up right away, the other request would run between iteration #2 and #3. I would think.

    I really don't know how it works. (Please don't anyone take the above statements as a description of the way SQL Server works.) I'm just trying to explain why your recommendation to use a wait command is confusing to me. It doesn't fit with my understanding of how SQL Server works.

    Thanks,

    - JJ

    Good points, Jennifer (I got the name right, didn't I?)... you could run sp_locks to see if anyone else had locks on the table to see if you want to wait the delay. But, normally, people make the deletes large enough and the table is in a condition to lock the whole table during the delete. The delay keeps the delete from becoming rather exclusive. Like anything else, "it depends".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 46 total)

You must be logged in to reply to this topic. Login to reply