• Tao Klerks (9/15/2009)


    Hi Lynn,

    Good article concept, this is a problem that I've had to deal with many times, and I suspect most DBAs do at one point or another.

    A couple concerns though:

    Okay, let's take a look.

    1) Would those transaction log backup files really be manageable? If you chose to make the batch size smaller (say 1,000 instead of 10,000) to minimize duration of locks (esp. if the deletion operations are expensive, for example containing substantial BLOB columns), then you rapidly end up with 2,000 transaction log files for your 2,000,000 records... Imagine having to recover those! :ermm: I guess it's a minor point, but I suspect in many (most?) cases the transaction log backup logic should be disconnected from the deletion loop itself.

    There will always be a pain point some where. Part of the issue I was trying to deal with here is controlling the size of the transaction log. This is one of the areas we see frequently on the forums where the OP is concerned with the transaction log growing excessively during the deletion process, taking much of the available disk space regardless of recovery model in use.

    When the database is using the SIMPLE recovery model, then all that is needed is to batch the deletes. By doing this, as the transaction log is checkpointed the space will be reused in the transaction log keeping the file under control.

    Unfortunately, if the database is using either BULK-LOGGED or FULL recovery model, you actually need to run periodic transaction log backups during the delete process. This is what my code allows one to do. You have to decide how big the batch should be for each delete, how large the transaction log should be allowed to grow (if necessary), how many transaction log files are you going to create. It is possible to further modify this code fairly easily to also incorprate periodic differential backups as well. If you want a differential backup after every 100 transaction log backups, it wouldn't be hard to do that with this code.

    2) Another point is the deletion mechanism - in SQL Server 2000 I have found the SET ROWCOUNT method to be quite painful on large tables. I've not looked deeply into it, but in the environments I've encountered it's MUCH more efficient to identify an indexed numeric key (not necessarily unique, as long as it's smoothly distributed), ideally with a clustered index, and loop through that instead. I imagine that there's a constant cost to evaluating the WHERE clause, even when the ROWCOUNT is only going to act on a subset of the matching records.

    I can't really address this particular issue regarding SET ROWCOUNT in SQL Server 2000 has I haven't really had to do this myself. I'm just trying to provide people with a method of accomplishing deleting a large number of records while trying to maintain the size the transaction log and keep the table relatively open for normal processing by hopefully keeping SQL Server 2000 from putting and holding a table lock. As I have always said on these forums, when you are using anything suggested on any forum be sure to test, test, and test some more.

    This is just one option I am offering, it may not always be the best. It may require some tweaks and modifications depending on the environment it is used in. If there is a suitable index as you suggest, it would be easy to modify this code to force it to use that index.