Deleting a large amount of data from a table

  • First of all, I inherited this problem and had no part in how it was originally designed...just left with trying to resolve the issue!)

    I have a rather wide table (137 columns) that has several years of data totalling 56+ million records. One of the indexes on this particular table is 71GB and due to other large amounts of data in similar tables/databases, we're running out of space on the server and cannot rebuild this index (note: adding additional storage at this point is not an option).

    I've been given permisson to remove any data beyond 25 months (approx. 13.5 million records) and have built the following script which will delete data in 1,000,000 row increments, but it's still taking FOREVER to run (deletions in batches of 10,000 rows)

    Any constructive suggestions anyone has to prune historical data more efficiently is GREATLY appreciated

    Is there a more efficient way to prune historical data or is this method pretty much it and I just have to wait it out?

    SET NOCOUNT ON

    GO

    DECLARE

    @batchsize int, @ArchiveDate varchar(50), @Error int, @RowCount int, @LoopCount int

    SELECT

    @ArchiveDate = CONVERT(varchar(50), DATEADD(mm, -25, GETDATE()) , 101),

    @BatchSize = 10000, @RowCount = 0, @LoopCount = 0

    WHILE @batchsize <> 0

    BEGIN

    /* Delete the data */

    DELETE TOP (10000) FROM DW_LOAD.dbo.PR_TRANS_IMPORT WITH(ROWLOCK)

    WHERE [pos-date] < @ArchiveDate

    SELECT @Error = @@ERROR, @batchsize = @@ROWCOUNT

    SELECT @RowCount = @RowCount + @batchsize, @LoopCount = @LoopCount + 1

    IF @LoopCount = 10

    BEGIN

    CHECKPOINT

    PRINT('CHECKPOINT REACHED (100,000 rows deleted)')

    SET @LoopCount = 0

    END

    PRINT ('Records Deleted: ' + CAST(@Batchsize as varchar(25)) + ', Total Count: ' + CAST(@RowCount as varchar(25)))

    IF @RowCount = 1000000

    BEGIN

    BREAK

    END

    ELSE

    BEGIN

    CONTINUE

    END

    END

    SET NOCOUNT OFF

    GO

    ______________________________________________________________________________
    Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • It would help to know the table and index structures.

    Some important things to know/help:

    * How long is "FOREVER"?

    * Is the table a heap or does it have a clustered index?

    * Is [pos-date] indexed?

    * Are you running Enterprise edition?

    * What is the recovery model?

    * What is the storage for the DB and Log files? (RAID and number of spindles)

    * Is there other significant load at the same time as you are deleting?

  • Oops, great questions...sorry I forgot to include more relevant information

    * How long is "FOREVER"?

    - 33 minutes to delete about 290,000 records (a long way to 13+ million)

    * Is the table a heap or does it have a clustered index?

    - it has a clustered index (which is 71GB)

    * Is [pos-date] indexed?

    - Yes, it is

    * Are you running Enterprise edition?

    - SQL 2005, SP2, Standard

    * What is the recovery model?

    - Simple

    * What is the storage for the DB and Log files? (RAID and number of spindles)

    - Unsure about spindles but it's all on a RAID 5 storage, there is ample room from growth within the data and log files

    * Is there other significant load at the same time as you are deleting?

    - No, this is the only activity

    ______________________________________________________________________________
    Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Since there are no other users during the delete try removing the row lock hint, you might even put a table lock hint. (Less work if it doesn't have to maintain as many locks.)

    How about an actual execution plan for the DELETE query? If the optimizer has gone for a table scan most of your time might be spent finding the rows to delete, in which case a larger batch (say 500,000 records) might work out to be faster. (In some cases I have used a batch size of 1,000,000 rows.)

    Can you drop all the non-clustered indexes other than the one on pos-date, and then re-build them afterwords? If the optimizer isn't using the pos-date index, then you can drop it too. (Saves all of the time and disk I/O of keeping them up-to-date.) (Or you can just disable all of the indexes, and then rebuild/re-eanble them afterwards. That way you don't have to script out the index definitions.)

    I was going to suggest using a filtered index on pos-date since you posted in a SQL Server 2008 forum, but since you only have 2005 that isn't an option.

    You could always go the route of exporting all the data and bringing it back in, sometimes that is faster than deleting. (If you had the space creating a new table and inserting the records into it and dropping the old table would be the fastest way to go.)

  • Thanks for the tips! I have exceeded my maintenance window for tonight but will try this tomorrow afternoon once corporate reporting has finished for the day

    How about an actual execution plan for the DELETE query?

    * I didn't noticed many table scans but did notice index updates costing 30% for each iteration in the loop...so I think your thought on disabling the index may be a decent route to go - I will try this and see how that works first

    You could always go the route of exporting all the data and bringing it back in, sometimes that is faster than deleting. (If you had the space creating a new table and inserting the records into it and dropping the old table would be the fastest way to go.)

    * If I did choose this route, I'd first have to delete all the non-clustered indexes (to free up the space) then recreate them all on the new table after succesful importing into the new table - correct?

    Thanks, you've been a great help. I will try these two things tomorrow to see how it works.

    ______________________________________________________________________________
    Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • partitioning would help greatly to delete large amount of data.

    www.sqlsuperfast.com

  • I have very little experience with partitioning...could you provide me with some good links on either instructional videos or good articles so I can learn about it more quickly?

    ______________________________________________________________________________
    Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • You could try different things:

    Remove all indexes beside clustered and on pos-date

    Try smaller batches (100, 1000, 5000, 10000) and see which is fastest (on some system i had much better performance deleting by tiny batches rather than deleting by big batches).

    If you have enough space on another drive (or another server) do a bcp queryout (only the rows you need to keep), truncate your table and bcp in. No need to get rid of the indexes, they'll be rebuilt once the bcp in is done (or after each batches if you set a batch size).

    If you have a tight schedule you can remove all indexes but the clustered and then recreate them starting with the most important and if you run out of time recreate the less important at a later time.

    You'll need an Enterprise Edition to try out partitioning.

  • "If you have enough space on another drive (or another server) do a bcp queryout (only the rows you need to keep), truncate your table and bcp in. No need to get rid of the indexes, they'll be rebuilt once the bcp in is done (or after each batches if you set a batch size)."

    This method has been working really well, however when i got to a larger table (i.e 21,000,000+ rows), while the out ran quickly, the in hung near the end ... and has been sitting there for about 45 minutes.

    In checking the log file, I see considerable growth (it grew about 28GB).

    I assume that after it inserted the data, it's rebuilding all the indexes?

    Would a better approach be to script out the indexes first, then truncate the table, bcp data back in, then recreate all the indexes?

    Would that bypass this excessive logging?

    ______________________________________________________________________________
    Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Yes it's rebuilding all the non-clutered indexes.

    Rebuilding the indexes by hand would remove the logging (it would only log which pages have been allocated but that doesn't take much room).

  • I ran out of space yesterday evening when all the rebuilding was running so it failed and went into a lovely 3 hour rollback process. When I first ran this yesterday it took a little over 3 hours to bcp 20.5 million rows into the table (this is when there was a clustered and 4 non-clustered indexes on the table).

    After everything rolled back, dropped all NCI's and then dropped the clustered index, I started the bcp off again...it's been running for the past 11 hours now and I'm only at 17.82 millions records! It's considerably slower...

    Should I have kept the clustered index on and rebuilt it after all the data was loaded? Or was what I did the correct thing to do in this case?

    I have several more of these types of data pruning to go (all larger than this one) so I really want to get this process down pat so it runs as efficiently as possible.

    Thanks!

    ______________________________________________________________________________
    Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • If you keep the clustered index (to avoid having to rebuild it later), try setting a batchsize (like 10000) in your bcp in, to keep your log from growing too much.

    If you remove all the indexes, including the clustered index, try with the TABLOCK hint, that should make the whole import minimaly logged, which should be quicker.

    Keep in mind that recreating a clustered index takes a lot of space (over 1x the table size).

    Are there any text/ntext/image field in your table?

  • Yes, I wish I had honed in on using the /b flag before I ran this last night...now everything's running as one giant transaction and God on ly knows how long I'll have to wait for it to committ. The bcp in has already been running for 13 hours and it's only at 18.85 million records. It seems to take anywhere between 2 - 22 seconds to get "1000 rows sent to SQL Server. Total sent: 18890000"

    I blew all indexes away last night...hoping that the absence of having to rebuild indexes would help speed things up! I would really like to know if that is actually slowing down the process or if it has no effect (as I loaded this data yesterday afternoon in just over 3 hours with the indexes still in tact)

    Not sure what you mean about trying the TABLOCK hint during the bcp in...when executing:

    bcp "dw_load.dbo.DW_Fleet_Load_Progress" in f:\DW_Fleet_Load_Progress.dat -T -c -q

    There's not really an option. Did you mean when dumping the contents "out" to the flat-file?

    FYI - there are no text/ntext/image field in your table

    ______________________________________________________________________________
    Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • This is probably a little late for you, but I really wouldn't have used a loop for this delete. I'd have gone with the straight delete, but changed the Archive date (assuming I knew when point zero for the data was).

    So my code would have looked something like this:

    DELETE FROM DW_LOAD.dbo.PR_TRANS_IMPORT

    WHERE [pos-date] < '01/01/2001';

    Select '01/01/2001 data deleted';

    GO

    DELETE FROM DW_LOAD.dbo.PR_TRANS_IMPORT

    WHERE [pos-date] < '03/01/2001';

    Select '03/01/2001 data deleted';

    GO

    DELETE FROM DW_LOAD.dbo.PR_TRANS_IMPORT

    WHERE [pos-date] < '05/01/2001';

    Select '05/01/2001 data deleted';

    GO

    Now, I would have altered those dates according to how big a recordset I thought it was pulling with that delete.

    OR

    Another option, that was sort of mentioned, but not the way I think about it.

    Create Table dbo.TempPR_TRANS_IMPORT (columnlist datatype);

    Insert into TempPR_TRANS_IMPORT (columnlist)

    Select <columnlist>

    FROM DW_LOAD.dbo.PR_TRANS_IMPORT

    WHERE [pos-date] > @ArchiveDate;

    Truncate table DW_LOAD.dbo.PR_TRANS_IMPORT;

    Insert into DW_LOAD.dbo.PR_TRANS_IMPORT (columnlist)

    Select <columnlist>

    FROM TempPR_TRANS_IMPORT;

    To answer your question on partitioning, Pinal Dave has a good partitioning tutorial at: http://blog.sqlauthority.com/2008/01/25/sql-server-2005-database-table-partitioning-tutorial-how-to-horizontal-partition-database-table/[/url]

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Thanks Brandie, but I didn't go with that method, I actually opted for bcp....like so:

    bcp " select * from dw_load.dbo.DW_Fleet_Load_Progress with(readuncommitted) where [pos-date] >= DATEADD(mm, -12, GETDATE()) " queryout "f:\DW_Fleet_Load_Progress.dat" -T -c

    TRUNCATE TABLE dw_load.dbo.DW_Fleet_Load_Progress

    bcp "dw_load.dbo.DW_Fleet_Load_Progress" in f:\DW_Fleet_Load_Progress.dat -T -c -q

    And when I run this again for a new (larger) table I will include:

    bcp "dw_load.dbo.DW_Fleet_Load_Progress" in f:\DW_Fleet_Load_Progress.dat -T -c -q ***/b 10000***

    To reference what you were saying about using the loop...when deleting 8,000,000+ records, you want a loop with transactional commits/rollbacks, and some sort of batching or else you'll blow the heck out of your log files (even in simple mode). I actually tried that before deciding to go with the loop...and ended up adding the checkpoint in later to assist with writing the pages to disk. It works fine, just painfully slow on the server it's running on.

    ______________________________________________________________________________
    Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply