Deleting data in large table

  • I have a large table with 17M+ rows for our package record audits.  I've decided to clear out anything older than 3 months which would clear all but 1.2M records.  The table doesn't currently have any indexes.  What's the recommended way to clear the records from the table?  Should i consider adding an index to make clearing it faster?

  • If you're going to remove 25% or more of the data from a HEAP or a Table (Clustered Index Present), the best thing to do all the way around is create a empty copy of the table with only the Clustered Index (or nothing if you want it to be a HEAP), and do a minimally logged insert of only what you want to keep into the new table.  If that works, the rename the tables so the original table is renamed with _Old as the object name and the new table as the original name.

    Then, add the Non-Clustered indexes and any keys back to the new table.

    If you have scratch and tmp schemas, you can even change which schema the Primary Key constraint is on so that you don't have to violate any naming conventions there (Constraints must be uniquely named in a given schema just like other objects must be.  Non-clustered indexes are an exception because they're slaves to the table or heap).

    A lot of people have different names for this type of thing... I just refer to it as a "Swap'n'Drop".

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I also read about using 'SWITCH' to map the data to the backup table, then insert the records back into the original table.   Once the data dump is done, I'll still have over 1M records in the original table.  Would you suggest an index to help with performance of future daily table cleanup?   We rarely ever query the data.

  • Jackie Lowery wrote:

    I also read about using 'SWITCH' to map the data to the backup table, then insert the records back into the original table. Once the data dump is done, I'll still have over 1M records in the original table. Would you suggest an index to help with performance of future daily table cleanup? We rarely ever query the data.

    SWITCH only works on either a "whole table" or a partition of a Partitioned Table. And, you'd have to make an empty copy of the table to do the SWITCH as well.  I'm thinking that's not an option here (and so didn't originally recommend it) because adding the Clustered Index to Partition the table would require the space to hold an additional copy of the table until the Clustered Index is done and then you'd have the huge amount of empty space that the original table (heap) would leave behind.

    So take a log file backup just before you switch to BULK LOGGED recovery and, when you're done with the "Swap'n'Drop", switch back to the FULL Recovery Model and take another log file backup to limit the "no point in time restores allowed when a bulk operation like "Minimal Logging" has occurred" period of time.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As for the old table, you were going to delete all but a million rows.  When you sure the copy worked correctly, just drop the old table.

    Also, decide the maximum amount of time you want to keep data in this table.  Add an index on the "Date_Created" or other temporal column and use the same "keys" where you decide what the cutoff for the million rows was.  That should be a Clustered Index for this history table... try to make it "unique" as a "Primary Key", as well.  It'll help you do the rare lookups , as well.

    Then do nightly "rolloffs" from the logical and, hopefully, physical beginning of the table to delete any rows that have exceeded the desired "keep it until" age.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks a lot.  I can make a plan from that.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply