Deleting 47 million records from a table

  • I am new to this forums so, for give my bad post.

    I have a table that has 90 million records when I did a count. I am asked to delete 47 million by year example 2007. The table has 7 indexes, 1 clusted and 6 non-clusted.

    I need to know the best performance way to delete these records. I can not move the data to a different table and truncate. Also if i first drop the 7 indexes before deleting will i gain any performance, and does anyone know how long it would take to re-indexes the remaining records when i re-add the indexes if dropped.

    Thanks.

  • Check out this article: Deleting Large Number of Records[/url].

    If you still have questions, just ask.

  • Thanks.

  • I found a nice calm way to delete specific chunks of 200,000+ rows from a table, getting around the mass-logging issue you would run into if you just ran "delete xxx".

    1. I wrote a VBS script using ADO and the connection.execute method in a loop to do something like this:

    start = min ( get lowest row id you want to delete )

    end = min + 100000 ( delete 100000 rows at once)

    for i = start to end

    adoconnection.execute ('delete from table where id = i') - concatenate id into string

    next i (add one to i and loop)

    2. then I scheduled the script using windows scheduler to execute every hour

    This camly deleted 200,000 rows every hour - which you could easily change

    to 500,000 rows and manage the log file sizes.

    This would safely delete 47m selected rows in 94 hours.

  • endeavour (6/21/2010)


    I found a nice calm way to delete specific chunks of 200,000+ rows from a table, getting around the mass-logging issue you would run into if you just ran "delete xxx".

    1. I wrote a VBS script using ADO and the connection.execute method in a loop to do something like this:

    start = min ( get lowest row id you want to delete )

    end = min + 100000 ( delete 100000 rows at once)

    for i = start to end

    adoconnection.execute ('delete from table where id = i') - concatenate id into string

    next i (add one to i and loop)

    2. then I scheduled the script using windows scheduler to execute every hour

    This camly deleted 200,000 rows every hour - which you could easily change

    to 500,000 rows and manage the log file sizes.

    This would safely delete 47m selected rows in 94 hours.

    Why do it this way when it can just as easily be handled in T-SQL? You may want to check out the article I referenced above as well.

  • Depends on the db usage - mine was used 24/7 with 20million site users so I needed to spread the load out -

    and deleting 50m rows in a single shot with full recovery would make my trans log backups (100gb+) massive very quickly,

    rather than small increases to each backup over a few days.

    Anyhow, using this I deleted 50m rows in a couple of days with high web traffic and transactions coming in fast.

    I'm sure it's not the best way *technically* - but I tried everything else and it worked best for me.

  • endeavour (6/21/2010)


    Depends on the db usage - mine was used 24/7 with 20million site users so I needed to spread the load out -

    and deleting 50m rows in a single shot with full recovery would make my trans log backups (100gb+) massive very quickly,

    rather than small increases to each backup over a few days.

    Anyhow, using this I deleted 50m rows in a couple of days with high web traffic and transactions coming in fast.

    I'm sure it's not the best way *technically* - but I tried everything else and it worked best for me.

    If you check out the article, you will see that what I proposed batches the delete, doesn't do it in one shot, and allows you to maintain the t-log as well by incorporating t-log backups as well.

    You can also incorporate delays if needed as well.

  • I reviewed both, and they are both solid options. Thanks for the response.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply