Problem with large delete (sql 2000)

Question

Post reply

Problem with large delete (sql 2000)

David Finnegan-403714

SSChasing Mays

Points: 602
More actions
May 7, 2007 at 1:59 pm

#69802

Hi there,

We are currently having a problem with a large delete query on a table that has 250m records and is about 80GB in size. Added to that it also has clustered index and a couple of other indexes.

Even though recovery mode is Simple, it takes forever and then runs out of disk space while trying to remove about 70% of records. I guess we should probably drop all indexes except ones necessary for delete and then re-apply after. I suppose deleteing in steps would probably help. Any advice or experiences would be appreciated?

Many thanks
David

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

SQLBill SSC Guru Points: 51440 More actions · Answer 1

Why not break the delete into smaller batches?

-SQLBill

David Webb-CDS SSCoach Points: 17398 More actions · Answer 2

Extract the 30% you want to keep. Drop and recreate the table and the clustered index. Bulk insert the extracted rows and rebuild the other indexes. I'd be willing to bet you'll experience less down time that way.

Or break up the delete as has been suggested earlier.

And then again, I might be wrong ...
David Webb

A Plockyn SSC Veteran Points: 247 More actions · Answer 3

Hi David

I do not have time to test all this

but let's hope it gives you a start

All the best

Alain

-- ----------------------------------------------------------------------------

-- 1) Create a new table and inserts the rows/records you want to keep into it.

-- 2) Store the next value/seed to restart at + the incremental value

-- 3) Do a fast delete without any logging but keeping the table intact

-- 4) Disable the automatic unique IDs numbering

-- 5) Re-insert the saved rows/records including the unique IDs

-- 6) Drop the work table.

-- 7) Re-initialise the automatic unique ID numbering

-- 8) re-enable the automatic unique IDs numbering

-- ----------------------------------------------------------------------------

-- 1) Create a new table and inserts the rows/records you want to keep into it.

IF OBJECT_ID ( N'dbo.[new_table]',N'U') IS NOT NULL

DROP TABLE dbo.[new_table];

GO

SELECT *

INTO dbo.[new_table]

FROM dbo.[table_source]

WHERE ...search_condition...

-- Remarks:

-- The user that executes a SELECT statement with the INTO clause

-- must have CREATE TABLE permission in the destination database.

-- When an indentity column (IDENTITY or PRIMARY KEY) is included in the select,

-- the corresponding column in the new table is not an identity column.

-- SELECT...INTO cannot be used with COMPUTE.

-- When a computed column is included in the select list,

-- the corresponding column in the new table is not a computed column.

-- The values in the new column are the values

-- that were computed at the time SELECT...INTO was executed.

-- ----------------------------------------------------------------------------

-- 2) If your [table_source] contain an identity column created this way:

-- CREATE TABLE dbo.[table_source] (column_id int IDENTITY(1,1));

-- Store the next value/seed to restart at + the incremental value

SELECT IDENT_INCR ('dbo.[table_source]')

AS table_source_identity_increment;

SELECT IDENT_CURRENT ('dbo.[table_source]')

+ table_source_identity_increment

AS table_source_identity_restart_value;

-- ----------------------------------------------------------------------------

-- 3) Do a fast delete without any logging but keeping the table intact

-- including all permissions and associated index/indices,

-- child tables, stored procedures, views, etc...

TRUNCATE dbo.[table_source]

-- ----------------------------------------------------------------------------

-- 4) Disable the automatic unique IDs numbering

-- If your [table_source] contain an identity column created this way:

-- CREATE TABLE dbo.[table_source] (column_id int IDENTITY(1,1));

-- SET IDENTITY_INSERT can be used to disable the IDENTITY property

-- of a column by enabling values to be explicitly inserted.

-- In other words, you need to turn off the automatic identity generation

-- so that you can re-insert the existing values from your [new_table]

SET IDENTITY_INSERT dbo.[table_source] ON

-- Remarks: At any time, only one table in a session can have

-- -------- the IDENTITY_INSERT property set to ON.

-- ----------------------------------------------------------------------------

-- 5) Re-insert the saved rows/records including the unique IDs

-- 5a) Insert selected columns omitting computed columns

INSERT

INTO dbo.[table_source]

(column_id, column_1, column_2)

SELECT

column_id, column_1, column_2

FROM dbo.[new_table]

-- do not ORDER BY column_id

GO

-- 5b) OR insert all columns if no computed columns

-- INSERT

-- INTO dbo.[table_source]

-- SELECT *

-- FROM dbo.[new_table]

-- do not ORDER BY column_id

GO

-- ----------------------------------------------------------------------------

-- 6) Drop the work table.

DROP TABLE dbo.[new_table]

GO

-- ----------------------------------------------------------------------------

-- 7) Re-initialise the automatic unique ID numbering

ALTER TABLE dbo.[table_source]

ALTER COLUMN column_id

IDENTITY (table_source_identity_restart_value, table_source_identity_increment)

-- ----------------------------------------------------------------------------

-- 8) re-enable the automatic unique IDs numbering

SET IDENTITY_INSERT dbo.[table_source] OFF

-- ----------------------------------------------------------------------------

Best Regards,

Alain

Jeff Moden SSC Guru Points: 1003927 More actions · Answer 4

Documentation? ... ... Documentation??? ... ... DOCUMENTATION!!!! Now THAT's what I'm talkin' 'bout!!!

Very nicely done, Alain!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

David Finnegan-403714 SSChasing Mays Points: 602 More actions · Answer 5

Thank you all very much for the advice. And thank you Alain for the very detailed reply.

Regards

David

Rudyx - the Doctor SSC-Forever Points: 43695 More actions · Answer 6

Alain, I have t go buy a hat now so I can take it off to you !

Most Excellent !

RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Roger L Reid SSCrazy Points: 2004 More actions · Answer 7

Now you've found a solution, it's worth understanding why you had the issue.

Your way of deleting was totally correct - the right way to do it in a relational database. BUT

But...since your single delete is(correctly) logically treated as a single transaction, you end up writing every row to the log, and the entire log MUST be maintained until the ENTIRE delete succeeds. If the last row fails to delete, it needs to roll back the whole thing.

In a perfect world, this mechanical reality wouldn't sully our theoretical relational space, writes to disk would be not consume ANY time. But they do, so if you don't NEED this consistancy for a big delete in a maintanence situation, you do something take the deletion of those records OUT of the context of a transaction.

If you don't care what happens if it fails half way through - if you can just restart it and finish - if on failure, having them all put BACK would just make matters worse, then this makes sense.

That's what each suggestion did - break them into smaller batches, or select the records you want to keep and drop the original table. The deletion itself is no longer atomic. Most of the time that would be a problem (it means other users can get an inconsistant view of the data). But big deletes are often an exception.

Kind of makes you want something like 'insert into' for delete syntax: a "bulk insert" syntax for delete that notes the deletion as a set, without any rollback ability. It's a hack, but S.Q.L. is a hack anyway....

Roger L Reid

Scott Coleman One Orange Chip Points: 27452 More actions · Answer 8

I also applaud Alain's detailed comments, but I have a quibble. This is a SQL 7, 2000 forum and the ALTER TABLE ... ALTER COLUMN IDENTITY() statement was introduced in SQL 2005. It doesn't work in SQL 2000.

Rudyx - the Doctor SSC-Forever Points: 43695 More actions · Answer 9

Since we are not quibbling about details ... ...there are a few post deletion steps that need to be added to maintain the present database performance levels after a deletion of this magnitude ...

UPDATE STATISTICS
sp_recompile dbo.table_name
sp_refreshview dbo.view_name -- for each view that references the table unless SCHEMABINDING is used

RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Jim Howell SSC Eights! Points: 990 More actions · Answer 10

We do something like this:

SET NOCOUNT ON --turns off network chatter

-- this will batch deletes into 5000/transaction

-- change rowcount to the value that you wish to delete in a batch

SET ROWCOUNT 5000;

WHILE 1 = 1

BEGIN

DELETE FROM ESN_Date_Range WHERE In_Time_Stamp >= @cut_off

IF @@rowcount < 5000 BREAK;

END

SET ROWCOUNT 0;

Jeff Moden SSC Guru Points: 1003927 More actions · Answer 11

That'll definitely work although it doesn't do much for the amout of time it will take with all the indexes the orginal poster portrayed. Not sure it'll do much for the transaction log either. A SELECT INTO of good rows can be made to minimize logging (recovery mode needs to change unless you just happen to have a scratchpad database set to SIMPLE). TRUNCATE and DROP don't take much, either

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Scott Coleman One Orange Chip Points: 27452 More actions · Answer 12

Deleting 5000 rows at a time, no matter how long it takes, is a vast improvement over failing and rolling back because the log completely fills the disk. The database is already in simple recovery mode so the transaction log won't grow out of control.

But since you're deleting 70% of the table, I would use BCP QUERYOUT on the data to be kept, drop nonclustered indexes, truncate the table, use BCP IN (with TABLOCK hint) to reload it, and recreate the indexes.

Jim Howell SSC Eights! Points: 990 More actions · Answer 13

We use the rowcount technique to delete about 30% of a date range 'state' table every night prior to rebuilding with the days inventory transactions. The fully loaded table is approximately 14 million rows, so the delete is 4 to 5 million rows. The tran log stays in a managable size and on a quad processor server (8 gig ram) the delete takes place in under 15 minutes.

I would agree with Scott Coleman, that if you are removing 70% of the records, you are probably better off with a truncate and a rebuild.

Ronzo Hall of Fame Points: 3917 More actions · Answer 14

We also use the looping delete technique to get rid of approx. 33% of our data in high volume OLTP tables. The advantage not mentioned is avoidance of lock escalation, thereby allowing concurrent inserts/updates to the table while the deletes are taking place. Of course we don't have as many indexes, so it's not quite the same. Still, especially with Simple recovery, the looping deletes avoid most problems and provide continued access to the records not being deleted.

Have Fun!
Ronzo