large table cleanup using Delete

  • Or - is this a cleanup effort to remove all of the NULL values and then set the column to NOT NULL?

    Yes.

    Thanks.

     

  • Bruin wrote:

    RowCnt              NullCnt 949718305     391584820

    That means that you're trying to delete about 41% of your table.

    DELETEs are fully logged even in the simple recovery model.  They also have to delete from all indexes on the table.  When it's all done and said with, you going to need to rebuild any indexes (which ALWAYS includes the Clustered Index, at the very least).  Any FKs pointing at the table will need to be checked as the DELETEs occur.  And then there's the NOT NULL constraint you want to add and that means yet another full scan of the table to verify that none of the rows present after the deletes contain any deletes.

    As Michael John stated early in this thread and considering all of the above, it the table can be "offline" for a bit, it sounds to me that copying the good data to another table and then doing the "swap'n'drop" things will work much more quickly and less impact all around depending on the number of FKs you'd have to drop and repoint.  To be sure, I wouldn't drop anything from the original table in this process.  I agree it should be renamed though.

    Once you're sure that all of the good data has been transferred correctly and the FKs (if any... are you sure that any actually exist?) have been correctly repointed, just drop the original renamed table.

    All of this will be much faster if you do it in a minimally logged fashion.  You'll need to slip into the BULK LOGGED Recovery Model (unless the database is in the SIMPLE Recovery Model) to do it and that will, unfortunately, be a very bad thing to do if you one of many forms of replication or AG in place because none of that can tolerate anything but the FULL Recovery Model.

    Of course, there's also possible  "Paralysis by Analysis" here.  If you're comfortable with doing limited DELETEs in a loop and you understand how to do that and not the "Swap'n'Drop" method (especially using Minimal Logging) or you need to keep the table online for the whole time, go for it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm going for the Delete method...

    Thanks for comments...

  • How about a Batch delete with batches to delete.



    DECLARE @batch INT
    SET @batch = 10000 /* SIZE OF BATCH */
    DECLARE @cnt INT
    SET @cnt = 0 /* START OF COUNTER */
    DECLARE @cntmax INT
    SET @cntmax = 10 /* NUMBER OF BATCHES */
    WHILE @cnt < @cntmax
    BEGIN
    SET @cnt = @cnt + 1
    DELETE TOP(@batch) FROM YOUR_TABLE WHERE... /* ACTUAL DELETE QUERY - ADJUST AS NEEDED*/
    END
  • Why would you do that?  That would just mean rerunning this code over and over until all rows matching the criteria have been deleted.  What purpose would that serve?

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Bruin wrote:

    How about a Batch delete with batches to delete.

    DECLARE @batch INT
    SET @batch = 10000 /* SIZE OF BATCH */ DECLARE @cnt INT
    SET @cnt = 0 /* START OF COUNTER */DECLARE @cntmax INT
    SET @cntmax = 10 /* NUMBER OF BATCHES */WHILE @cnt < @cntmax
    BEGIN
    SET @cnt = @cnt + 1
    DELETE TOP(@batch) FROM YOUR_TABLE WHERE... /* ACTUAL DELETE QUERY - ADJUST AS NEEDED*/END

    Here's the general layout that I use for such things. (not at my work machine so doing this from memory).

    --===== Local Variables
    DECLARE @BatchSize INT = 10000
    ,@Now CHAR(23)
    ,@RowCount INT
    ;
    --===== Ensure the operator knows that the run is starting.
    SELECT @Now = CONVERT(CHAR(23),GETDATE(),121);
    RAISERROR('***** DELETES STARTED AT %s *****',0,0,@Now) WITH NOWAIT
    ;
    WHILE 1=1 --Loop until other conditions occur.
    BEGIN
    --===== Do a DELETE according to the desired batch size.
    DELETE TOP (@BatchSize)
    FROM dbo.YourTableNameHere
    WHERE SomeColumn SomeRelationship SomeCondition
    ;
    --===== Report what just happened so people have some indication as to whether the
    -- code is stilly running or not.
    SELECT @RowCount = @@ROWCOUNT
    ,@Now = CONVERT(CHAR(23),GETDATE(),121);
    RAISERROR('%u Rows Deleted at %s.',0,0,@RowCount,@Now) WITH NOWAIT
    ;
    --===== If we deleted fewer rows than the batch size, then that was the last batch
    -- and there's nothing left to do so we can exit the loop now. The way we've
    -- done this saves on the final unnecessary scan to check if the number of
    -- rows to be deletediss zero. It's can be a real time saver on larger tables.
    -- Otherwise, the loop repeats.
    IF @RowCount < @BatchSize BREAK
    ;
    END
    ;
    --===== Ensure the operator knows that the run is complete.
    SELECT @Now = CONVERT(CHAR(23),GETDATE(),121);
    RAISERROR('***** DELETES COMPLETED AT %s *****',0,0,@Now) WITH NOWAIT
    ;

    Just in case it's not obvious, you're going to need to change some things in the DELETE statement in that.

    You could also add some code to let you know how many batches have been done and how many  there are to go.  I do that at work but wanted you to have some of the fun. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks !!!!

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply