I need to delete 180 million records from a table

  • I have a very large table in one of my databases in Azure SQL and I have been asked to delete the old data which I have NEVER done before. This table has 5 years of data, but we only need to keep 4 or 5 months of data. I did create the Archive DB from this existing database so we have a backup. Now I just tested this script and it took me 12 minutes to delete 500 records. I am not sure how fast it will be if I drop the index on the ID column. Is there a different method I can use to delete approximately 180 million records?

    DECLARE @ChunkSize INT = 100;
    DECLARE @MinID INT = 1000, @MaxID INT = 1500;
    DECLARE @RowsDeleted INT;

    SET @RowsDeleted = 1;

    WHILE @RowsDeleted > 0
    BEGIN
    DELETE TOP (@ChunkSize)
    FROM YourTableName
    WHERE ID BETWEEN @MinID AND @MaxID;

    SET @RowsDeleted = @@ROWCOUNT;
    END

    "He who learns for the sake of haughtiness, dies ignorant. He who learns only to talk, rather than to act, dies a hyprocite. He who learns for the mere sake of debating, dies irreligious. He who learns only to accumulate wealth, dies an atheist. And he who learns for the sake of action, dies a mystic."[/i]

  • a few other things will affect performance of delete.

    does it have triggers

    does it have links to other tables (and is cascade delete enabled) e.g. is part of a parent/child foreign key relationship

    are there other indexes on the table

    majority of cases, and if there aren't child dependencies on the table, for high volume of data such as this one, where you are deleting over 75% of the table content the best way, assuming you can have a small downtime window is to do following within a single explicit transaction

      <li class="text--left" style="text-align: left;">rename original table

      <li class="text--left" style="text-align: left;">insert records that are to be kept onto a new table with same name as original table

      <li class="text--left" style="text-align: left;">drop renamed table

      <li class="text--left" style="text-align: left;">add required indexes to table

      <li class="text--left" style="text-align: left;">commit

    an easy way to get the SQL to do this is to go to SSMS table design, move a column up one position and generate the change script.

    then on the change script, put the column on the correct position again on the create table statement, and add your filtering criteria for "records to keep" - based on your example it would be where ID >= yyy where yyy is the earliest ID you need to keep

  • How many rows are you keeping?

    It may be faster & easier to insert retained rows into a new table, rename or drop the old table, and rename the new table to old table name.... Or via partition switching/truncation if you want to get fancy, and build in support for more efficient management in the future..

  • Whoof! 500 rows took 12 minutes? Honestly, that should be something that occurs in, at most, a few seconds.

    Indexes may help (they can also add overhead to deletes as Frederico has already pointed out). Look at the execution plan for the delete statement. Also look at blocking through the process. What other processes are running? Lots to chew on here.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I did suggest creating a new table, dumping only Feb 2024 data, renaming the new one and old one. No response yet from the user.

    Also, this table has 4 indexes.

    1 clustered index on ID column

    3 non clustered indexes on different columns

    No triggers.

     

    • This reply was modified 1 month, 3 weeks ago by  LearningDBA.

    "He who learns for the sake of haughtiness, dies ignorant. He who learns only to talk, rather than to act, dies a hyprocite. He who learns for the mere sake of debating, dies irreligious. He who learns only to accumulate wealth, dies an atheist. And he who learns for the sake of action, dies a mystic."[/i]

  • Is the ID column the clustered index?  If not - then you should change the delete so it is deleting in clustered index order.  But - as others have already stated, create a new table and copy the rows you want to keep into the new table then rename the tables.  Once you have validated everything is working as expected - then you can just drop the old table.

    Looking to the future - investigate partitioning the table and enabling page compression.  For future management - instead of deleting rows you can just truncate the oldest partition and then merge it with the next partition.  This is much faster than attempting to delete millions of rows.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Grant Fritchey wrote:

    Whoof! 500 rows took 12 minutes? Honestly, that should be something that occurs in, at most, a few seconds.

    Indexes may help (they can also add overhead to deletes as Frederico has already pointed out). Look at the execution plan for the delete statement. Also look at blocking through the process. What other processes are running? Lots to chew on here.

    We have a table where it used to take about 6 minutes to delete 1 row.  The problem was that there are about 30 tables with FK's on the table pointing to other tables.  The fix was unfortunate... we had to add an NCI to each column that was in an FK pointing to a different table.

    Positive lesson in proper normalization of a table instead of letting it grow column wise.  The one I'm talking about has more than 100 columns.  It's just stupid but would take forever to properly normalize.  IF MS had proper BEFORE triggers instead of those bloody "INSTEAD OF" triggers, it would be an easy thing to normalize the table and put a view on top of the resulting tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • blocking issues ?

    Is someone else also using that table at the same time?

    Double check for cascading deletes and triggers !

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply