Large Table Delete 105+ Million Rows...

  • Well I have gotten into the middle of a pleasant situation...we have a MOM db server that the administrator for has incorrectly configured or out right broke and the result is the database has ballooned up to a 58 gig monster when it should only be around 15. The drive are full and he has finally decided to shut down most of the alerts and perf mon stuff until we can get it under wraps size wise...

    He has requested that a couple sets of tables have data purged out of them (122 tables all together)...that hold the alerts and performance monitor data. (Events01...61 & SampledNumericDataAll_01...61 are the table groupings)

    Well between the two tables and the parameters it totals about 105 million rows...so I was going to do the following

    1. put into simple mode

    2. run the following query

    SET ROWCOUNT 10000

    WHILE 1 = 1

    BEGIN

    --Run only one statement at a time until completion

    -------------------------------------------------

    --DELETE TABLE_A WHERE DATE <= GETDATE() - 30

    --DELETE TABLE_B WHERE DATE <= GETDATE() - 30

    IF @@ROWCOUNT = 0

    BREAK

    END

    SET ROWCOUNT 0

    I can't use an index as there is only one for the keys on each table...and creating an index for the delete would take too long and probably run out of space in the process.

    I have never done a delete this large before and wanted to know if there are other things I should be looking out for...

    Thanks,

    Lee

  • --here's documentation I've prepared for my group--

    (btw: no need to change the database mode from Full to Simple)

    Large transactions cause the transaction log to grow.

    When you do a clean of a large table, the delete statement can cause a very long transaction.

    Sometimes it's necessary to prevent this.

    So we divide one big delete into several little ones.

    A table in the Manager database requires 900,000 rows to be deleted. The example below shows how to run a script to delete records and reduce table locking.

    To prevent large transactions we do a commit every 1000 records. This is slower we know that but it keeps the Tlog small and maximizes concurrency.

    -- Delete records with commit every 1000 deletes

    -- Display start and finish times with duration

    -- sp_rowcount

    Declare @num int

    -- In this example we are deleting 1,000 rows at a time.

    Set @num = (Select Count(*) from transactions

    where datediff(hh,dt_tm_created, getdate()) > 72 and OrderId =0)

    While @num <> 0

    Begin

    begin transaction

    set rowcount 1000

    Delete from transactions

    where datediff(hh,dt_tm_created, getdate()) > 72 and OrderId =0

    set @num = @@rowcount

    commit transaction

    end

  • Janet - unless you change to simple recovery - your TLOG is going to grow no matter how small or large the transaction. It's going to keep growing until you perform a BACKUP LOG transaction. That's what Full recovery does (keeps all transactions logged until backup).

    I'm not sure how you're coming to that conclusion.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (10/30/2007)


    Janet - unless you change to simple recovery - your TLOG is going to grow no matter how small or large the transaction. It's going to keep growing until you perform a BACKUP LOG transaction. That's what Full recovery does (keeps all transactions logged until backup).

    This is why I chose simple mode and to also limit the ROWCOUNT to 10000...

    I'm not sure how you're coming to that conclusion.

    Well I ran into my first roadblock 😀 You cannot do a delete via a view with a UNION ALL...

    So it looks like it will have to be table by table.

  • Lee - process looks fine. You will still need to deal with reorganizing the data so that you don't have page fragments everywhere, and then shrink the database.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (10/30/2007)


    Lee - process looks fine. You will still need to deal with reorganizing the data so that you don't have page fragments everywhere, and then shrink the database.

    Cool...I figured that much as 1/3 of the data is going to be deleted from the table groupings...

    There is a weekly re-index that occurs which I can execute after this completes...3 months from now 😀

  • About the full vs simple. I've never worried about changing this. If my database is in production it is set to full mode. If my database is in Full mode, I am doing transaction log backups (simply because it's prod and in full mode)

    If I change my database to simple and leave the database in use while I am performing my maintenance (I prefer leaving prodcution in use), then I can't have log backups. For this reason, I leave my database in full mode, leave production up, leave my backup transaction logs running.

    So, no I would not change a prodcution database to simple. And, with the number of records you are deleting, you may not want to take production off-line while the work is being performed.

  • Janet Keith (10/30/2007)


    About the full vs simple. I've never worried about changing this. If my database is in production it is set to full mode. If my database is in Full mode, I am doing transaction log backups (simply because it's prod and in full mode)

    If I change my database to simple and leave the database in use while I am performing my maintenance (I prefer leaving prodcution in use), then I can't have log backups. For this reason, I leave my database in full mode, leave production up, leave my backup transaction logs running.

    So, no I would not change a prodcution database to simple. And, with the number of records you are deleting, you may not want to take production off-line while the work is being performed.

    I agree with you in regards to how you handle production...however in this case it is specialized as we were requested to do local backups on the database server and using a backup agent to tape...vs. backing up to a backup server (like we normally do in our place of business). This was done due to space concerns of the MOM system and the sizing of the databases (Production & Reporting)...

    SO...since they allowed the drive to fill up I can't do a local backup...and if I allow the database to be in full mode during the deletions I will blow up the log cause it won't be able to grow (not that I would want to log over 105 million deletions)...

    The end user has done this to them self unfortunately and it is not the first time...they had way too many alerts and performance monitoring indicators turned on and filled up the drive even after I repeatedly warned against this...

    Honestly I probably could have truncated the tables and everything would be fine but this is the political approach IMO...

  • Well I would go by the approach of putting the database into simple recovery and then performing the deletes. This way the log won't grow to huge proportions. once the deletes are done you can put the database abck to the full recovery mode.

    But make sure that you do not fire all the deletes at the same time and you spread it across well.

  • How many rows are you keeping relative to the number you're deleting?

    What I often suggest for deletes that affect most of the table is to first copy the rows you want to keep to a new table. Then drop any foreign keys you have and truncate the table.

    Since truncate just deallocates the pages, it's very fast. Since it's minimally logged, you don't have problems with the tran log growing

    With the table empty, you can replace the foreign keys and copy the data back from the table you stored it in.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • p.s. If you don't have space in the DB, you can also BCP the data that you want to keep out to a network drive/removeable drive, then BCP in once you've done the truncate.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/31/2007)


    How many rows are you keeping relative to the number you're deleting?

    What I often suggest for deletes that affect most of the table is to first copy the rows you want to keep to a new table. Then drop any foreign keys you have and truncate the table.

    Since truncate just deallocates the pages, it's very fast. Since it's minimally logged, you don't have problems with the tran log growing

    With the table empty, you can replace the foreign keys and copy the data back from the table you stored it in.

    The whole situation is a soup sandwich...the remaining rowcount in the table will be 200+ million and I can't imagine how long it would take to BCP out the data to a network drive as it is spread across 61 tables...

  • Then go with your loop.

    Just add

    WAITFOR DELAY '00:00:02'

    to let other Server to pay attention to other processes.

    You may also include something like this:

    [Code]-- Before entering the loop

    CREATE TABLE #LogSpace (

    Database_Name sysname NOT NULL,

    LogSize int NOT NULL,

    LogSpaceUsed FLOAT NOT NULL,

    Status tinyint

    )

    ......

    -- Inside of the loop

    INSERT INTO #LogSpace

    EXEC('DBCC sqlperf(logspace)')

    [/Code]

    When LogSize value you just retrieved becomes more than the limit you defined do BACKUP LOG and delete everything from #LogSpace.

    Yes, it will take some time, but eventually you'll get there.

    _____________
    Code for TallyGenerator

  • And clustered index on DATE columns for both tables would help A LOT.

    If it's impossible try (not on production, if possible :))

    [Code]

    DELETE A

    FROM TABLE_A A

    INNER JOIN (

    select TOP 10000 PK_Col from Table_A

    where DATE <= GETDATE() - 30

    order by DATE) B ON A.PK_Col = B.PK_Col

    DELETE A

    FROM TABLE_B A

    INNER JOIN (

    select TOP 10000 PK_Col from TABLE_B

    where DATE <= GETDATE() - 30

    order by DATE) B ON A.PK_Col = B.PK_Col

    [/Code]

    _____________
    Code for TallyGenerator

  • HI, Thanks guys for the Brilliant Examples you have given. I need help in using these commands for DB2. I need to delete from a table at a time 12000 rows then sleep for somtime & again start the deletion process. I am trying to do it through Stored Procedure.

    I could not find the equivalent command for waitfor delay and set rowcount .

    Please help me in getting through this.

    Thanks a lot

    Monalisa

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply