Deleting Large Number of Records

  • Ric Sierra (9/15/2009)


    Lynn Pettis (9/15/2009)


    Ric Sierra (9/15/2009)


    Just... keep simple!

    If you need to delete a large number of rows and the scenario don't need to keep the database log.

    In this case I want to delete all the rows between 2009/01/01 and 2009/10/23, about 100000000 rows.

    delete top (10000) from myTable where colDate between '20090101' and '20091023'

    backup log myDatabase with no_log

    go 10000

    What you think?

    P.S. The recovery model is SIMPLE.

    If your database is using the SIMPLE recovery model you can't do BACKUP LOG, nor is it needed. The only other issue I'd have with your simple process is why run it more than needed. Using the GO 10000, your delete statement is going to run 10000 times regardless of how many times it really needs to run.

    If your database is using BULK-LOGGED or FULL recovery model, then you do need the transaction log backups in case a recovery is needed. The code I have provided can be incorporated into a stored procedure where periodic deletes of large nmber of records need to be accomplished.

    In this scenario I need to delete 100,000,000 rows, by each batch I delete 10,000 rows but to ensure to delete all the rows I need to run 10,000 times the batch.

    10,000 x 10,000 = 100,000,000

    That was not obvious based on your original post. There was nothing there to indicate how many rows were to be deleted. Please remember, my code makes no determination regarding the total number of rows to be deleted. The delete in the example code is based on a data range, and if you look at the number of records generated in the sample dataset, there is no specific number of records for each year, it is rendomly generated which is much more likely in a real world environment. You aren't always going to be deleting a set number of rows each time.

  • Lynn Pettis (9/15/2009)


    That was not obvious based on your original post. There was nothing there to indicate how many rows were to be deleted.

    He did say there was a set number of known rows... 🙂

    Aside from that though, thoroughly agree - there's not even that much point in counting the rows first - you might as well just get on with it...

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Lynn Pettis (9/15/2009)


    Ric Sierra (9/15/2009)


    Lynn Pettis (9/15/2009)


    Ric Sierra (9/15/2009)


    Just... keep simple!

    If you need to delete a large number of rows and the scenario don't need to keep the database log.

    In this case I want to delete all the rows between 2009/01/01 and 2009/10/23, about 100000000 rows.

    delete top (10000) from myTable where colDate between '20090101' and '20091023'

    backup log myDatabase with no_log

    go 10000

    What you think?

    P.S. The recovery model is SIMPLE.

    If your database is using the SIMPLE recovery model you can't do BACKUP LOG, nor is it needed. The only other issue I'd have with your simple process is why run it more than needed. Using the GO 10000, your delete statement is going to run 10000 times regardless of how many times it really needs to run.

    If your database is using BULK-LOGGED or FULL recovery model, then you do need the transaction log backups in case a recovery is needed. The code I have provided can be incorporated into a stored procedure where periodic deletes of large nmber of records need to be accomplished.

    In this scenario I need to delete 100,000,000 rows, by each batch I delete 10,000 rows but to ensure to delete all the rows I need to run 10,000 times the batch.

    10,000 x 10,000 = 100,000,000

    That was not obvious based on your original post. There was nothing there to indicate how many rows were to be deleted. Please remember, my code makes no determination regarding the total number of rows to be deleted. The delete in the example code is based on a data range, and if you look at the number of records generated in the sample dataset, there is no specific number of records for each year, it is rendomly generated which is much more likely in a real world environment. You aren't always going to be deleting a set number of rows each time.

    I agree with your code, my POV is only an alternative (a fast track) to do something similar.

  • I just had to deal with this. However, I need to have an hourly sliding window where I delete records older than 3 hours for working with .net viewstate.

    I had the transaction log in simple mode and was deleting in small batches. However, the deletes were not keeping up with the inserts. Since it is viewstate the records are very big.

    What I did was set up a partitioned view and just truncate the table in the view thereby deleting 200,000 records in less than a second and not affect the inserts.

    IMHO, truncating partitions or tables in a partitioned view is much cleaner and works faster than this method.

  • robblackin (9/15/2009)


    I just had to deal with this. However, I need to have an hourly sliding window where I delete records older than 3 hours for working with .net viewstate.

    I had the transaction log in simple mode and was deleting in small batches. However, the deletes were not keeping up with the inserts. Since it is viewstate the records are very big.

    What I did was set up a partitioned view and just truncate the table in the view thereby deleting 200,000 records in less than a second and not affect the inserts.

    IMHO, truncating partitions or tables in a partitioned view is much cleaner and works faster than this method.

    If you are in a position to use paritioned tables in SQL Server 2005/2008, then yes I agree. Unfortunately, this may not be possibile for several reasons. One, you may be using SQL Server 2005/2008 Standard Edition where partitioned tables aren't available. Or, your data may not be setup using partitioned views (and therefore separate tables). If that is the case, then your option isn't available. I am just providing an option. As with all things in the SQL Server world, it depends. This is just one option available to people to use when in this position.

  • Matt Whitfield (9/15/2009)


    Lynn Pettis (9/15/2009)


    That was not obvious based on your original post. There was nothing there to indicate how many rows were to be deleted.

    He did say there was a set number of known rows... 🙂

    Aside from that though, thoroughly agree - there's not even that much point in counting the rows first - you might as well just get on with it...

    Okay, I missed it the first time, but he did say ABOUT 100,000,000. That leaves the actual number of rows to be deleted open for interpretation. 😉

  • Great article, Lynn.

    One area that you mentioned, but that I think can be elaborated on, is the number of log backups this can generate. If the space is available, it may make more sense to do a log backup right before so there is plenty of free space available in the log file and then do one right after. This of course assumes that there is enough space available and that you can tolerate the transaction log growing that large, but if you can it will be substantially faster and with a lot fewer log files to deal with.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • timothyawiseman (9/15/2009)


    Great article, Lynn.

    One area that you mentioned, but that I think can be elaborated on, is the number of log backups this can generate. If the space is available, it may make more sense to do a log backup right before so there is plenty of free space available in the log file and then do one right after. This of course assumes that there is enough space available and that you can tolerate the transaction log growing that large, but if you can it will be substantially faster and with a lot fewer log files to deal with.

    I agree, but the problem I am trying to address here is the one that we frequently see on the forums where people are concerned about the excessive growth of the transaction log file and want to keep the size of the t-log down to a managle size when having to accomplish this mass delete. In some cases, the delete actually ends up failing as the t-log consumes all available disk space and can't expand further. Or there are other databases on the server that also require disk space be available for those t-log files as well.

    It is possible that the t-log backup files could over run the available disk space as well. That is another pain point in this process, but it too can be dealt with in a variety of ways. One would be to use a third party backup solution to zip the t-log backups on the fly so that they don't take as much space. In that regard, we use HyperBac on our SQL Server systems.

  • Lynn Pettis (9/15/2009)


    Okay, I missed it the first time, but he did say ABOUT 100,000,000. That leaves the actual number of rows to be deleted open for interpretation. 😉

    Those pesky 4 left over 😀

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Matt Whitfield (9/15/2009)


    I would put money on the fact that when you used SET ROWCOUNT one of the two following possibilities was true:

    1) There was no where clause on the statement

    2) The where clause was easily satisfied by an index

    ...

    #2 was definitely true. We had indexes on the fields in the WHERE clause. If you didn't, I'd create one and then delete the rows. Likely would be worth it.

    Quote sometimes freaks out when there's a post that occurs between the time you get the page and the time you click quote. Weird bug.

  • I have to make two observations about this article. As of just checking the main page, there have been 7,992 views recorded for this article. Compare this to where it was originally published, 963 views in 90 days. Plus, there has been a nice discussion along with it, where there was none on the other site.

    I am quite happy! :w00t:

  • Where appropriate, disabling indexes can help boost delete speeds as well. I frequently create a "pared-down" database with a small subset of records. Since no one is querying it while I'm creating it, I don't have to worry about the indexes being offline and rebuilding them at the end cleans up fragmentation from the delete. Again though, as with everything, whether or not this helps you will depend on your situation.

    Funny note - I used "it depends" so often during a training session the other day that one of my coworkers threatened to go out and buy me a box of... Depends. 🙂

    Chad

  • Use table partitions feature of sql 2005 to arrange your table and then you just simply drop a partition, which creates very minimal logging and way better in performance than deleting batch records.

  • kailas.tare (9/15/2009)


    Use table partitions feature of sql 2005 to arrange your table and then you just simply drop a partition, which creates very minimal logging and way better in performance than deleting batch records.

    As I mentioned earlier in the discussion, table partitions are only available in the Enterprise Edtion of SQL Server 2005/2008. If you are using the Standard Edition or below, that is not an option.

  • Chad Crawford (9/15/2009)


    Where appropriate, disabling indexes can help boost delete speeds as well. I frequently create a "pared-down" database with a small subset of records. Since no one is querying it while I'm creating it, I don't have to worry about the indexes being offline and rebuilding them at the end cleans up fragmentation from the delete. Again though, as with everything, whether or not this helps you will depend on your situation.

    Very good point - but I wouldn't limit that to deletes, personally. If you're in a situation where on-line access isn't a concern, then index building at the end of any large-scale DML operation is a generally good thing, imho. I use the word generally because there are obviously exceptions to the rule... but hey, that's life! 🙂

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

Viewing 15 posts - 16 through 30 (of 72 total)

You must be logged in to reply to this topic. Login to reply