Delete from very large multiple tables

  • We have a situation where we would like to delete from multiple tables (60-70 tables) and keep data from only 2 years and current financial year . They all are very large (millions of rows).
    Based on date from one table we need to find appropriate ID value from all 60-70 tables.
    ID is the PK in all tables. In additional, data are replicated through replication.
    What would be the best way to delete the records from all tables without affecting replication?
    Thanks in advance.

  • EasyBoy - Monday, February 5, 2018 11:38 AM

    We have a situation where we would like to delete from multiple tables (60-70 tables) and keep data from only 2 years and current financial year . They all are very large (millions of rows).
    Based on date from one table we need to find appropriate ID value from all 60-70 tables.
    ID is the PK in all tables. In additional, data are replicated through replication.
    What would be the best way to delete the records from all tables without affecting replication?
    Thanks in advance.

    Well, replication throws a wrench in what I would consider unless it is something you could tear down and rebuild.

    Regarding the deletion of data my first question is to ask what is the ration of data being kept to data being deleted.  That alone will have an impact in how you approach the task.

  • Lynn Pettis - Monday, February 5, 2018 12:37 PM

    Well, replication throws a wrench in what I would consider unless it is something you could tear down and rebuild.

    Regarding the deletion of data my first question is to ask what is the ration of data being kept to data being deleted.  That alone will have an impact in how you approach the task.

    Agree with the question from Lynn. The ratio of data volume to remain TO the data volume to remove becomes crucial . It then helps to favour other options , like extracting the required data (if its significantly less than the data to remove) to a staging DB and then dropping this database altogether.

  • Arsh - Tuesday, February 6, 2018 1:21 PM

    Lynn Pettis - Monday, February 5, 2018 12:37 PM

    Well, replication throws a wrench in what I would consider unless it is something you could tear down and rebuild.

    Regarding the deletion of data my first question is to ask what is the ration of data being kept to data being deleted.  That alone will have an impact in how you approach the task.

    Agree with the question from Lynn. The ratio of data volume to remain TO the data volume to remove becomes crucial . It then helps to favour other options , like extracting the required data (if its significantly less than the data to remove) to a staging DB and then dropping this database altogether.

    I've found that, especially for large tables, even if only 30-50% of the table is going to be deleted, it's usually much more effective to use the copy-swap-and-drop method because, done correctly, it can still be twice (or more) as fast and a whole lot easier on the transaction log if you make an excursion to the BULK LOGGED Recovery Model and take advantage of "Minimal Logging" with the Clustered Index already in place.

    IMHO, the jury is still out on the use of TF 610 for other indexes.  It's usually faster just to rebuild the NCIs while in the BULK LOGGED Recovery Model.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply