Deleting records from a huge table

  • I have a huge table in my database. I need to delete records from this table. There are no indexes on this table. If I create an index on one of the columns the deletion process will run faster but the index creation will take a long time.

    I need to know the best method to delete rows from this table.

    Thanx in Advance !

    Sanz
  • If you have enterprise edition maybe partitioning the table will work for you, else you probably want to go with a crawler delete to avoid blowing the log and reduce contention.

    If you are deleting the majority of the table could be faster to insert data you want to keep into another table, drop original, then rename new table.

    We really need to know the table schema, a bit of example data and the criteria you wish to delete by.

    ---------------------------------------------------------------------

  • Sanz (2/8/2010)


    I have a huge table in my database. I need to delete records from this table. There are no indexes on this table. If I create an index on one of the columns the deletion process will run faster but the index creation will take a long time.

    I need to know the best method to delete rows from this table.

    Thanx in Advance !

    How many records approx. How huge the table you are talking about? Delete chunks of data.

  • The table is 135 GB in size. There are 5 columns all of which have datatype int and varchar(15).

    I would keep George's suggestion as an option (Inserting into a new table) as I will be deleting nearly 50% of the data.

    Any better idea ??

    Sanz
  • copying to another table would require another 135 GB on the disk ..... do we have that much space ...

    you could try a BCP with queryout to output to a file (smaller in size, faster as well) with the data that you eventually want to keep.

    then you truncate the table

    then you BCP in the file from step 1

    just some thoughts....

    -------------------------------------------------
    -Amit
    Give a man a fish and he'll ask for a lemon. Teach a man to fish and he wont get paged on weekends !! :w00t: - desparately trying to fish [/size]

  • 1. Assuming your back up is FULL RECOVERY. Perform a complete backup

    2. Use Set ROWCOUNT to a resonable size (Start low, watch your log file size, and the time to process the batch, display the estimated execution plan and review it before starting your first batch)

    3. At the end of each batch of deleting, backup your log file

    4. Be sure to save the Full back up and all log backup, you do not want to break the chain in case you need to perform a restore.

    And test, test, and test again before using on your Production DB.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • I would create the index on the columns in your delete statement's where clause.

    It won't take that long if the disk is fairly fast and your delete will be far far faster.

    If you have enterprise edition you can create the index with online=on.

    If you do go that route, make sure to calculate the space the index will consume when you add it.

  • I would create the index on the columns in your delete statement's where clause.

    It won't take that long if the disk is fairly fast and your delete will be far far faster.

    If you have enterprise edition you can create the index with online=on.

    If you do go that route, make sure to calculate the space the index will consume when you add it.

  • Amit Singh (2/8/2010)


    copying to another table would require another 135 GB on the disk ..... do we have that much space ...

    you could try a BCP with queryout to output to a file (smaller in size, faster as well) with the data that you eventually want to keep.

    then you truncate the table

    then you BCP in the file from step 1

    just some thoughts....

    That would actually take more disk space because the data would materialize as characters instead of datatypes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sanz (2/8/2010)


    I have a huge table in my database. I need to delete records from this table. There are no indexes on this table. If I create an index on one of the columns the deletion process will run faster but the index creation will take a long time.

    I need to know the best method to delete rows from this table.

    Thanx in Advance !

    Do you have a primary key on this table? I'm asking because a lot of folks don't know that a Primary Key also makes an index.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff the funniest thing is this table does not have any indexes on it.. Nor does it have a primary key... Having no indexes makes the queries run slowly...

    This is an archive database...

    Sanz
  • 1) create smallest index necessary to cover your where clause. If you have lots of fields in the where clause, pick a small column or two that provide the greatest chance of isolating rows that need to be deleted.

    2) do the delete in batches small enough to guarantee an index seek to get the records to delete. This can be a fairly big number for very large tables (50K, 100K, bigger?). This will do several things that are helpful such as a) avoiding table lock and blocking all access to the table b) avoid blowing up tlog if you delete a lot of records and c) avoid blowing up tempdb and flushing out RAM pages for a huge scan. It also allows for error handling/rollbacks after each batch, interim tlog backups (if you aren't in simple mode or want to do backup log with truncate_only in FULL recovery mode, etc).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Here is what we do with our huge tables.

    Select *

    INTO Table2

    FROM TABLE1

    WHERE YourStatement (something like Account_Start_Date < '01/01/1980'

    Drop table Table1

    EXEC sp_rename 'Table2', 'Table1'

    Runs much faster than Delete because there is no LOGing is involved


    Jacob Milter

  • mar1jxm (2/11/2010)


    Here is what we do with our huge tables.

    Select *

    INTO Table2

    FROM TABLE1

    WHERE YourStatement (something like Account_Start_Date < '01/01/1980'

    Drop table Table1

    EXEC sp_rename 'Table2', 'Table1'

    Runs much faster than Delete because there is no LOGing is involved

    1) there are numerous requirements for minimal logging, and some of them are usually not available on production databases

    2) many systems don't have enough space to do that for their largest tables

    3) that select will block ALL DML activity to the table while it is running, which could be a long time.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (2/11/2010)


    3) [font="Arial Black"]that select [/font]will block ALL DML activity to [font="Arial Black"]the table [/font]while it is running, which could be a long time.

    Kevin... a couple of questions on the above...

    1. Which SELECT are you talking about when you say "that select" in the above? Jacob Milter's current SELECT or the SELECT that would be necessary to minimize logging?

    2. Which table are you talking about when you say "the table" in the above? The table in the FROM clause or the table in the INTO clause or both?

    I just want to be clear on this because there are a lot of left-over myths from the way SELECT/INTO used to operate before SQL Server 6.5 SP1.

    To answer my own questions from above...

    If you're talking about the Jacob Milter's current SELECT and it's a script run from SSMS and not a stored procedure, then only the destination table will be blocked from DML... the source table will not be blocked. If you're talking about a SELECT that uses the required TABLOCK hint to get minimal logging, then the source table will also be blocked from DML no matter what.

    Then again, those observations may change a bit when you do the same thing in a stored procedure with some conditional logic. I'm trying to work up an example of what happens then.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply