Problem with large delete (sql 2000)

  • Hi there,

     

    We are currently having a problem with a large delete query on a table that has 250m records and is about 80GB in size. Added to that it also has clustered index and a couple of other indexes.

     

    Even though recovery mode is Simple, it takes forever and then runs out of disk space while trying to remove about 70% of records. I guess we should probably drop all indexes except ones necessary for delete and then re-apply after. I suppose deleteing in steps would probably help. Any advice or experiences would be appreciated?

     

    Many thanks

    David

     

     

  • Why not break the delete into smaller batches?

    -SQLBill

  • Extract the 30% you want to keep.  Drop and recreate the table and the clustered index.  Bulk insert the extracted rows and rebuild the other indexes.  I'd be willing to bet you'll experience less down time that way.

     

    Or break up the delete as has been suggested earlier.


    And then again, I might be wrong ...
    David Webb

  • Hi David

    I do not have time to test all this

    but let's hope it gives you a start

    All the best

    Alain

    -- ----------------------------------------------------------------------------

    -- 1) Create a new table and inserts the rows/records you want to keep into it.

    -- 2) Store the next value/seed to restart at + the incremental value

    -- 3) Do a fast delete without any logging but keeping the table intact

    -- 4) Disable the automatic unique IDs numbering

    -- 5) Re-insert the saved rows/records including the unique IDs

    -- 6) Drop the work table.

    -- 7) Re-initialise the automatic unique ID numbering

    -- 8) re-enable the automatic unique IDs numbering

    -- ----------------------------------------------------------------------------

    -- ----------------------------------------------------------------------------

    -- 1) Create a new table and inserts the rows/records you want to keep into it.

    IF OBJECT_ID ( N'dbo.[new_table]',N'U') IS NOT NULL

     DROP TABLE dbo.[new_table];

    GO

    SELECT * 

    INTO dbo.[new_table]

    FROM dbo.[table_source]

    WHERE ...search_condition...

    -- Remarks:

    -- The user that executes a SELECT statement with the INTO clause

    -- must have CREATE TABLE permission in the destination database.

    -- When an indentity column (IDENTITY or PRIMARY KEY) is included in the select,

    -- the corresponding column in the new table is not an identity column.

     

    -- SELECT...INTO cannot be used with COMPUTE.

    -- When a computed column is included in the select list,

    -- the corresponding column in the new table is not a computed column.

    -- The values in the new column are the values

    -- that were computed at the time SELECT...INTO was executed.

    -- ----------------------------------------------------------------------------

    -- 2) If your [table_source] contain an identity column created this way:

    -- CREATE TABLE dbo.[table_source] (column_id int IDENTITY(1,1));

    -- Store the next value/seed to restart at + the incremental value

    SELECT IDENT_INCR ('dbo.[table_source]')

    AS table_source_identity_increment;

    SELECT IDENT_CURRENT ('dbo.[table_source]')

     + table_source_identity_increment

    AS table_source_identity_restart_value;

    -- ----------------------------------------------------------------------------

    -- 3) Do a fast delete without any logging but keeping the table intact

    --     including all permissions and associated index/indices,

    --  child tables, stored procedures, views, etc...

    TRUNCATE dbo.[table_source]

    -- ----------------------------------------------------------------------------

    -- 4) Disable the automatic unique IDs numbering

    -- If your [table_source] contain an identity column created this way:

    -- CREATE TABLE dbo.[table_source] (column_id int IDENTITY(1,1));

    -- SET IDENTITY_INSERT can be used to disable the IDENTITY property

    -- of a column by enabling values to be explicitly inserted.

    -- In other words, you need to turn off the automatic identity generation

    -- so that you can re-insert the existing values from your [new_table]

    SET IDENTITY_INSERT dbo.[table_source] ON

     

    -- Remarks: At any time, only one table in a session can have

    -- -------- the IDENTITY_INSERT property set to ON.

    -- ----------------------------------------------------------------------------

    -- 5) Re-insert the saved rows/records including the unique IDs

    -- 5a) Insert selected columns omitting computed columns

    INSERT

    INTO dbo.[table_source]

    (column_id, column_1, column_2)

    SELECT 

     column_id, column_1, column_2

    FROM dbo.[new_table]

    -- do not ORDER BY column_id

    GO

    -- 5b) OR insert all columns if no computed columns

    -- INSERT

    -- INTO dbo.[table_source]

    -- SELECT *

    -- FROM dbo.[new_table]

    -- do not ORDER BY column_id

    GO

    -- ----------------------------------------------------------------------------

    -- 6) Drop the work table.

    DROP TABLE dbo.[new_table]

    GO

    -- ----------------------------------------------------------------------------

    -- 7) Re-initialise the automatic unique ID numbering

    ALTER TABLE dbo.[table_source]

    ALTER COLUMN column_id 

    IDENTITY (table_source_identity_restart_value, table_source_identity_increment)

    -- ----------------------------------------------------------------------------

    -- 8) re-enable the automatic unique IDs numbering

    SET IDENTITY_INSERT dbo.[table_source] OFF

     

    -- ----------------------------------------------------------------------------


    Best Regards,

    Alain

  • Documentation? ... ... Documentation??? ... ... DOCUMENTATION!!!!  Now THAT's what I'm talkin' 'bout!!!

    Very nicely done, Alain!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you all very much for the advice. And thank you Alain for the very detailed reply.

     

    Regards

    David

  • Alain, I have t go buy a hat now so I can take it off to you !

    Most Excellent !

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • Now you've found a solution, it's worth understanding why you had the issue.

    Your way of deleting was totally correct - the right way to do it in a relational database.  BUT

    But...since your single delete is(correctly) logically treated as a single transaction, you end up writing every row to the log, and the entire log MUST be maintained until the ENTIRE delete succeeds.  If the last row fails to delete, it needs to roll back the whole thing.

    In a perfect world, this mechanical reality wouldn't sully our theoretical relational space, writes to disk would be not consume ANY time.   But they do, so if you don't NEED this consistancy for a big delete in a maintanence situation, you do something take the deletion of those records OUT of the context of a transaction.

    If you don't care what happens if it fails half way through - if you can just restart it and finish - if on failure, having them all put BACK would just make matters worse, then this makes sense.

    That's what each suggestion did - break them into smaller batches, or select the records you want to keep and drop the original table.  The deletion itself is no longer atomic.   Most of the time that would be a problem (it means other users can get an inconsistant view of the data).  But big deletes are often an exception.

    Kind of makes you want something like 'insert into' for delete syntax: a "bulk insert" syntax for delete that notes the deletion as a set, without any rollback ability.   It's a hack, but S.Q.L. is a hack anyway....

    Roger L Reid

  • I also applaud Alain's detailed comments, but I have a quibble.  This is a SQL 7, 2000 forum and the ALTER TABLE ... ALTER COLUMN IDENTITY() statement was introduced in SQL 2005.  It doesn't work in SQL 2000.

     

  • Since we are not quibbling about details ... ...there are a few post deletion steps that need to be added to maintain the present database performance levels after a deletion of this magnitude ...

    • UPDATE STATISTICS
    • sp_recompile dbo.table_name
    • sp_refreshview dbo.view_name  -- for each view that references the table unless SCHEMABINDING is used

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  •  

    We do something like this:

     

    SET NOCOUNT ON --turns off network chatter

    -- this will batch deletes into 5000/transaction

    -- change rowcount to the value that you wish to delete in a batch

    SET ROWCOUNT 5000;

    WHILE 1 = 1

    BEGIN

      DELETE FROM ESN_Date_Range WHERE In_Time_Stamp >= @cut_off 

      IF @@rowcount < 5000 BREAK;

    END

    SET ROWCOUNT 0;

  • That'll definitely work although it doesn't do much for the amout of time it will take with all the indexes the orginal poster portrayed.  Not sure it'll do much for the transaction log either.  A SELECT INTO of good rows can be made to minimize logging (recovery mode needs to change unless you just happen to have a scratchpad database set to SIMPLE).  TRUNCATE and DROP don't take much, either

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Deleting 5000 rows at a time, no matter how long it takes, is a vast improvement over failing and rolling back because the log completely fills the disk.  The database is already in simple recovery mode so the transaction log won't grow out of control.

    But since you're deleting 70% of the table, I would use BCP QUERYOUT on the data to be kept, drop nonclustered indexes, truncate the table, use BCP IN (with TABLOCK hint) to reload it, and recreate the indexes.

  • We use the rowcount technique to delete about 30% of a date range 'state' table every night prior to rebuilding with the days inventory transactions.  The fully loaded table is approximately 14 million rows, so the delete is 4 to 5 million rows.  The tran log stays in a managable size and on a quad processor server (8 gig ram) the delete takes place in under 15 minutes.

    I would agree with Scott Coleman, that if you are removing 70% of the records, you are probably better off with a truncate and a rebuild.

     

  • We also use the looping delete technique to get rid of approx. 33% of our data in high volume OLTP tables.  The advantage not mentioned is avoidance of lock escalation, thereby allowing concurrent inserts/updates to the table while the deletes are taking place.  Of course we don't have as many indexes, so it's not quite the same.  Still, especially with Simple recovery, the looping deletes avoid most problems and provide continued access to the records not being deleted.


    Have Fun!
    Ronzo

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply