Deleting records from a large table

  • I wanted to find out the fastest way to delete records from a large table. I currently have an index (non-clustered) built on the table using the column that I am also using in my where clause from the delete TSQL.

    First question is what type of index should I be using clustered or non-clustered?

    Here is the TSQL that I am using to delete from this table:

    SELECT 1

    WHILE @@ROWCOUNT > 0

    BEGIN

    DELETE TOP (10000)

    FROM [Table Name]

    WHERE [Column Name] is null

    END

    This table has roughly 30 mill records and with my deletion process, I am removing just short of 4 mill records.

    Right now it's taking 25-30 mins to delete this. Any suggestions on how I can speed this process up?

  • Have you checked the execution plan to see it contains scans?

    You could disable or drop the indices then rebuild them after the delete operation. You could add a NC index on the column in the where clause.

  • When I tested without an index, it took longer to delete the records. I created an index with the column I have in my where clause, but its still really long to process.

  • Also, I am processing this TSQL in an execute SQL task (SSIS) and it is the first step in my sequence container.

  • skaggs.andrew (5/8/2014)


    When I tested without an index, it took longer to delete the records. I created an index with the column I have in my where clause, but its still really long to process.

    That makes sense. without an index on the predicate column, sql has to read the whole table. with an index, it only has to read 4 million rows (still a lot!). If there are no other indexes (no PK?) than this might be the best you can do.

    Have you tried a query that just selects the records to be deleted? If so, how does that perform?

  • Yea, that is why i don't know which index type I should be using for this example, clustered or non. If I just run a select statement on the table with the same where clause, the records return back to me in roughly 6-7 mins if I remember correctly from yesterday.

  • Is there a column that could be used as a PK on the table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I don't work a whole lot with table design, so excuse me if I sound dumb. I don't have one single column that can act as the PK. In order for me to get an unique record, I have to draw by 3 columns. Should I be including those 3 columns (as my PK) as well as the column that is used in my where clause in the index?

    To try and make this a little less vague, here are the columns that I am referring.

    Student_ID, Response_ID, Question_ID would give me a distinct record.

    School_Yr is the column that I am using in my where clause. Currently, I have one index on this table (non-clustered) using this column.

    This is the loop I am using for deleting:

    SELECT 1

    WHILE @@ROWCOUNT > 0

    BEGIN

    DELETE TOP (10000)

    FROM [Table Name]

    WHERE School_Yr is null

    END

  • I changed my index to a clustered index with just the School_Yr column and reran my TSQL. It completed in 10 mins.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply