Deleting records from a large table

Question

Deleting records from a large table

skaggs.andrew

SSCrazy

Points: 2474
More actions
May 8, 2014 at 7:54 am

#291127

I wanted to find out the fastest way to delete records from a large table. I currently have an index (non-clustered) built on the table using the column that I am also using in my where clause from the delete TSQL.
First question is what type of index should I be using clustered or non-clustered?
Here is the TSQL that I am using to delete from this table:
SELECT 1
WHILE @@ROWCOUNT > 0
BEGIN
DELETE TOP (10000)
FROM [Table Name]
WHERE [Column Name] is null
END
This table has roughly 30 mill records and with my deletion process, I am removing just short of 4 mill records.
Right now it's taking 25-30 mins to delete this. Any suggestions on how I can speed this process up?

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

gbritton1 SSCertifiable Points: 6522 More actions · Answer 1

Have you checked the execution plan to see it contains scans?

You could disable or drop the indices then rebuild them after the delete operation. You could add a NC index on the column in the where clause.

skaggs.andrew SSCrazy Points: 2474 More actions · Answer 2

When I tested without an index, it took longer to delete the records. I created an index with the column I have in my where clause, but its still really long to process.

skaggs.andrew SSCrazy Points: 2474 More actions · Answer 3

Also, I am processing this TSQL in an execute SQL task (SSIS) and it is the first step in my sequence container.

gbritton1 SSCertifiable Points: 6522 More actions · Answer 4

skaggs.andrew (5/8/2014)
When I tested without an index, it took longer to delete the records. I created an index with the column I have in my where clause, but its still really long to process.

That makes sense. without an index on the predicate column, sql has to read the whole table. with an index, it only has to read 4 million rows (still a lot!). If there are no other indexes (no PK?) than this might be the best you can do.

Have you tried a query that just selects the records to be deleted? If so, how does that perform?

skaggs.andrew SSCrazy Points: 2474 More actions · Answer 5

Yea, that is why i don't know which index type I should be using for this example, clustered or non. If I just run a select statement on the table with the same where clause, the records return back to me in roughly 6-7 mins if I remember correctly from yesterday.

Jeff Moden SSC Guru Points: 1003888 More actions · Answer 6

Is there a column that could be used as a PK on the table?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

skaggs.andrew SSCrazy Points: 2474 More actions · Answer 7

I don't work a whole lot with table design, so excuse me if I sound dumb. I don't have one single column that can act as the PK. In order for me to get an unique record, I have to draw by 3 columns. Should I be including those 3 columns (as my PK) as well as the column that is used in my where clause in the index?

To try and make this a little less vague, here are the columns that I am referring.

Student_ID, Response_ID, Question_ID would give me a distinct record.

School_Yr is the column that I am using in my where clause. Currently, I have one index on this table (non-clustered) using this column.

This is the loop I am using for deleting:

SELECT 1

WHILE @@ROWCOUNT > 0

BEGIN

DELETE TOP (10000)

FROM [Table Name]

WHERE School_Yr is null

END

skaggs.andrew SSCrazy Points: 2474 More actions · Answer 8

I changed my index to a clustered index with just the School_Yr column and reran my TSQL. It completed in 10 mins.