How do gradually trim a big table

  • I have a table that contains an audit trail (changes made to the DB), this is a big size table. I created a job to delete all records except for the last 180 days. When the job ran the transaction log grew too big it filled the drive space hence the job failed.

    I tried to use the same script to run it for specific date ranges but the number of records vary. For one day there could be 100 records but for other day there could be 10000000. Which it can lead back to the original issue.

    Is there another way to do this or to trim specific number of records as long as I keep the latest 180 days? I am on SQL Server 2008. Thanks

    /*

    This script will delete all records except those added to the audit trail for the number of days on the set line.

    */

    DECLARE @days2keep INT

    SET @days2keep =180

    DELETE

    FROM rev.REV_AUDIT_TRAIL_HIS

    WHERE ADD_DATE_TIME_STAMP < dateadd(day,-@days2keep,getdate());

  • gjuarez (5/23/2016)


    I have a table that contains an audit trail (changes made to the DB), this is a big size table. I created a job to delete all records except for the last 180 days. When the job ran the transaction log grew too big it filled the drive space hence the job failed.

    I tried to use the same script to run it for specific date ranges but the number of records vary. For one day there could be 100 records but for other day there could be 10000000. Which it can lead back to the original issue.

    Is there another way to do this or to trim specific number of records as long as I keep the latest 180 days? I am on SQL Server 2008. Thanks

    /*

    This script will delete all records except those added to the audit trail for the number of days on the set line.

    */

    DECLARE @days2keep INT

    SET @days2keep =180

    DELETE

    FROM rev.REV_AUDIT_TRAIL_HIS

    WHERE ADD_DATE_TIME_STAMP < dateadd(day,-@days2keep,getdate());

    This is one of those times where looping is acceptable. Create a while loop and delete xxx number of rows per iteration. Just use @@ROWCOUNT > 0 as the control value of your loop.

    Something like this.

    while @@ROWCOUNT > 0

    begin

    delete top 10000

    WHERE ADD_DATE_TIME_STAMP < dateadd(day, - @days2keep, getdate());

    end

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thank you for your quickly reply. So just to confirm the entire script would look like this, right?. Thanks again

    DECLARE @days2keep INT

    SET @days2keep =180

    while @@ROWCOUNT > 0

    begin

    delete top 10000

    FROM rev.REV_AUDIT_TRAIL_HIS

    WHERE ADD_DATE_TIME_STAMP < dateadd(day, - @days2keep, getdate())

    end

  • gjuarez (5/23/2016)


    Thank you for your quickly reply. So just to confirm the entire script would look like this, right?. Thanks again

    DECLARE @days2keep INT

    SET @days2keep =180

    while @@ROWCOUNT > 0

    begin

    delete top 10000

    FROM rev.REV_AUDIT_TRAIL_HIS

    WHERE ADD_DATE_TIME_STAMP < dateadd(day, - @days2keep, getdate())

    end

    Yeah something along those lines. You will need to play around with the number of rows to delete per batch as it suits your system. There is a balance between processing and writing to the trans log that you need to dial in. The window is usually a pretty big sweet spot and 10,000 is probably a good starting point.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • If you're deleting a total of 100 million rows, then deleting in batches of 10,000 would be too small, because there will be 10,000 iterations of the loop. Each loop is a separate table scan. However many total rows you're intending to delete, I would set the batch size so that you're looping 10 or maybe 100 times at most. Also, if this is a staging database (not a production transactional database), then you may want to consider doing a CHECKPOINT and log truncation for each loop.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • gjuarez (5/23/2016)


    Thank you for your quickly reply. So just to confirm the entire script would look like this, right?. Thanks again

    DECLARE @days2keep INT

    SET @days2keep =180

    while @@ROWCOUNT > 0

    begin

    delete top 10000

    FROM rev.REV_AUDIT_TRAIL_HIS

    WHERE ADD_DATE_TIME_STAMP < dateadd(day, - @days2keep, getdate())

    end

    When used in a DELETE, the TOP clause requires parentheses around the expression.

    You can also combine the DECLARE and SET into one statement.

    You might also want to compare to midnight instead of the current time. (I would just create a date variable.)

    DECLARE @date2keep DATE = DATEADD(DAY, -180, GETDATE());

    while @@ROWCOUNT > 0

    begin

    delete top (10000)

    FROM rev.REV_AUDIT_TRAIL_HIS

    WHERE ADD_DATE_TIME_STAMP < @date2keep

    end

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Below is a bulk delete script a wrote a couple of years ago. At the time, I had database recovery model set to SIMPLE (truncate log on checkpoint). It also uses RAISERROR..WITH NOWAIT to print a status message between each loop.

    set nocount on;

    declare @batch_rows int = 0, @total_rows int = 0;

    while 1 = 1

    begin

    -- to minimize transaction log growth, checkpointing and pausing each batch will help re-use space:

    waitfor delay '00:00:05';

    checkpoint;

    -- delete batch of rows:

    delete top (1000000) from MyBigTable;

    select @batch_rows = @@rowcount;

    select @total_rows = @total_rows + @batch_rows;

    -- print status message:

    raiserror('Rows affected: %d', 0, 1, @total_rows) with nowait;

    -- if no rows were deleted, then break from loop:

    if @batch_rows = 0 break;

    end;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • How many rows of data need to be deleted versus how many are to be kept?

  • The table has 67404675, i need to keep 180 days, I calculated about 10,000000 for a year.

  • gjuarez (5/24/2016)


    The table has 67404675, i need to keep 180 days, I calculated about 10,000000 for a year.

    So you are only going to keep 180 rows out of 67,404,675 rows? If that is the case you shouldn't bother trying to delete anything. Just copy the 180 rows into another table. Then drop the existing massive table. Then rename the newly created table to the original table name.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (5/25/2016)


    gjuarez (5/24/2016)


    The table has 67404675, i need to keep 180 days, I calculated about 10,000000 for a year.

    So you are only going to keep 180 rows out of 67,404,675 rows? If that is the case you shouldn't bother trying to delete anything. Just copy the 180 rows into another table. Then drop the existing massive table. Then rename the newly created table to the original table name.

    Actually, I think it may be about 5,000,000 rows for 180 days. Still, should look at a maintenance window so you can copy out the data you want to retain into another table, rename the original table to an archive name, rename the new table to the original table name. At that point you only need to archive a single day every day which will be much more manageable.

  • Eric M Russell (5/23/2016)


    If you're deleting a total of 100 million rows, then deleting in batches of 10,000 would be too small, because there will be 10,000 iterations of the loop. Each loop is a separate table scan. However many total rows you're intending to delete, I would set the batch size so that you're looping 10 or maybe 100 times at most. Also, if this is a staging database (not a production transactional database), then you may want to consider doing a CHECKPOINT and log truncation for each loop.

    As long as their ADD_DATE_TIME_STAMP field had an index on it there would not be a table scan and hence the looping would not be that bad.

  • Sean Lange (5/25/2016)


    gjuarez (5/24/2016)


    The table has 67404675, i need to keep 180 days, I calculated about 10,000000 for a year.

    So you are only going to keep 180 rows out of 67,404,675 rows? If that is the case you shouldn't bother trying to delete anything. Just copy the 180 rows into another table. Then drop the existing massive table. Then rename the newly created table to the original table name.

    That seems quite a dodgy solution.

    They say that they have set up a job to do this. To me that implies this will be run repeatedly as part of normal operation of the system.

    What happens if, for whatever reason, the rename of the new table back to the normal table name fails?

    Whatever application is using the database will error whenever it tries to write audit information; changing DB schema as part of normal operation seems like a road to pain.

  • peter.row (5/26/2016)


    Sean Lange (5/25/2016)


    gjuarez (5/24/2016)


    The table has 67404675, i need to keep 180 days, I calculated about 10,000000 for a year.

    So you are only going to keep 180 rows out of 67,404,675 rows? If that is the case you shouldn't bother trying to delete anything. Just copy the 180 rows into another table. Then drop the existing massive table. Then rename the newly created table to the original table name.

    That seems quite a dodgy solution.

    They say that they have set up a job to do this. To me that implies this will be run repeatedly as part of normal operation of the system.

    What happens if, for whatever reason, the rename of the new table back to the normal table name fails?

    Whatever application is using the database will error whenever it tries to write audit information; changing DB schema as part of normal operation seems like a road to pain.

    If, based on Sean's guess of 180 rows to be kept out of 64+ million it is actually quite doable. As I estimate that it is closer to 5 million rows of data to be kept, I'd look for a maintenance window where the system could actually be down for a period time while the 5 million rows of data to be kept are copied out to a second table, the original table renamed so the data can be archived appropriately, and the new table renamed to the original name. At that point, since only 180 days of data is needed to be retained, data can than be archived from the live table on a daily basis with much less impact on the system.

  • Lynn Pettis (5/26/2016)


    peter.row (5/26/2016)


    Sean Lange (5/25/2016)


    gjuarez (5/24/2016)


    The table has 67404675, i need to keep 180 days, I calculated about 10,000000 for a year.

    So you are only going to keep 180 rows out of 67,404,675 rows? If that is the case you shouldn't bother trying to delete anything. Just copy the 180 rows into another table. Then drop the existing massive table. Then rename the newly created table to the original table name.

    That seems quite a dodgy solution.

    They say that they have set up a job to do this. To me that implies this will be run repeatedly as part of normal operation of the system.

    What happens if, for whatever reason, the rename of the new table back to the normal table name fails?

    Whatever application is using the database will error whenever it tries to write audit information; changing DB schema as part of normal operation seems like a road to pain.

    If, based on Sean's guess of 180 rows to be kept out of 64+ million it is actually quite doable. As I estimate that it is closer to 5 million rows of data to be kept, I'd look for a maintenance window where the system could actually be down for a period time while the 5 million rows of data to be kept are copied out to a second table, the original table renamed so the data can be archived appropriately, and the new table renamed to the original name. At that point, since only 180 days of data is needed to be retained, data can than be archived from the live table on a daily basis with much less impact on the system.

    If it is closer to the 5 million mark then my suggestion would not likely be a very good one. I am thinking that Lynn's approach is probably the most likely to be the best choice.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply