DELTING HUGE No. of ROWS

  • Dears ,

    I have to delete data from HUGETABLE which has around 700 million rows prior to 1st May 2014. This table has no Index, however, it has an ID column.

    Now in order to avoid the log growth, I wrote the below code. Assuming that it will delete 1 million rows and at any time I cancel it; it will rollback only the last transaction. But on the other hand when I cancel it; it starts rolling back all the rows back into the table.

    Any idea, how can I achieve it without running out of LOG File Space ?

    Next thing I have in my mind is to schedule a job to delete around a milllion rows every 10 minutes and truncate the log at the end of the job.

    Appreciate experts feedback.

    begin try drop table #counter end try begin catch end catch

    create table #counter (id int )

    insert into #counter SELECT top 1 ID From HUGETABLE with (nolock) where DateTime < '2014-05-01'

    while (select COUNT (1) from #counter) > 0

    begin

    begin transaction deletemol

    truncate table #counter

    insert into #counter SELECT top 10000000 ID From HUGETABLE with (nolock) where DateTime < '2014-05-01'

    if exists (select top 1 1 from #counter )

    begin

    delete from HUGETABLE where ID in (select ID from #counter)

    end

    truncate table #counter

    insert into #counter

    select top 1 ID From HUGETABLE with (nolock) where AuditDateTime < '2014-05-01'

    select GETDATE ()

    commit transaction deletemol

    DBCC SHRINKFILE (N'DB_log' , 0, TRUNCATEONLY)

    END

  • funooni (8/8/2014)


    Dears ,

    I have to delete data from HUGETABLE which has around 700 million rows prior to 1st May 2014. This table has no Index, however, it has an ID column.

    Now in order to avoid the log growth, I wrote the below code. Assuming that it will delete 1 million rows and at any time I cancel it; it will rollback only the last transaction. But on the other hand when I cancel it; it starts rolling back all the rows back into the table.

    Any idea, how can I achieve it without running out of LOG File Space ?

    Next thing I have in my mind is to schedule a job to delete around a milllion rows every 10 minutes and truncate the log at the end of the job.

    Appreciate experts feedback.

    begin try drop table #counter end try begin catch end catch

    create table #counter (id int )

    insert into #counter SELECT top 1 ID From HUGETABLE with (nolock) where DateTime < '2014-05-01'

    while (select COUNT (1) from #counter) > 0

    begin

    begin transaction deletemol

    truncate table #counter

    insert into #counter SELECT top 10000000 ID From HUGETABLE with (nolock) where DateTime < '2014-05-01'

    if exists (select top 1 1 from #counter )

    begin

    delete from HUGETABLE where ID in (select ID from #counter)

    end

    truncate table #counter

    insert into #counter

    select top 1 ID From HUGETABLE with (nolock) where AuditDateTime < '2014-05-01'

    select GETDATE ()

    commit transaction deletemol

    DBCC SHRINKFILE (N'DB_log' , 0, TRUNCATEONLY)

    END

    Looks like you need to delete 700 million rows of data, how many rows of data are you keeping?

  • around 150 million rows will be kept i.e. last three months data.

    I am thinking about taking those rows into another table and then truncating/dropping the table and renaming the actual table.

    But the problem is that this table is used for insertions, so there is a chance of data loss

  • When I cancelled the Running Query, shouldn't the SQL Server rollback only the rows of last transaction. But it is rolling back the rows which are deleted and committed in earlier iterations.

  • There are other issues with your query, but lets not worry about that just yet. Another question, what edition of SQL Server are you running?

    Also, is the intent to keep 3 months of data in the table going forward?

    If so, is that 3 months plus the current month?

  • If you cancel a query in SSMS, then the entire query's actions get rolled back, although I don't know what would happen if there were separate batches within what was executed. Just using a WHILE loop isn't going to get you past that problem.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I have to get to work but here's a 60,000 ft view of what I would do...

    1. DO A FULL BACKUP!!!!!!!!!!!!

    2. MAKE SURE POINT-IN-TIME BACKUPS ARE BEING TAKEN111!

    3. Create an empty identical table.

    4. Do a cascaded-rename (takes about 63ms) to rename the old table to something else and rename the new table to what the old table was named. This new table will continue in the place of the old table.

    5. Insert the desired old rows from the old table into the new table using a WHILE loop that does it in 3 million row batchs (most systems will reach a tipping point shortly after that so I wouldn't go much higher if at all).

    6. When you're absolutely sure that everything worked correctly, DROP the old table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ahan , so that's why it did not go the way I thought. SSMS entire query is getting rolled back

  • Jeff Moden (8/8/2014)


    I have to get to work but here's a 60,000 ft view of what I would do...

    1. DO A FULL BACKUP!!!!!!!!!!!!

    2. MAKE SURE POINT-IN-TIME BACKUPS ARE BEING TAKEN111!

    3. Create an empty identical table.

    4. Do a cascaded-rename (takes about 63ms) to rename the old table to something else and rename the new table to what the old table was named. This new table will continue in the place of the old table.

    5. Insert the desired old rows from the old table into the new table using a WHILE loop that does it in 3 million row batchs (most systems will reach a tipping point shortly after that so I wouldn't go much higher if at all).

    6. When you're absolutely sure that everything worked correctly, DROP the old table.

    Thanks Jeff

    Backup is done and point in time backups are being taken

    I will follow your suggestion. I believe 63 ms would mean a very slight downtime might be faced by some trigger or end user application.

    By the way the table is hit by a trigger on another table so the ultimate end user might not face anything.

    Any further suggestion about the downtime ? I mean what should I be expecting during that 63 ms ?

    Thanks again.

  • Lynn Pettis (8/8/2014)


    There are other issues with your query, but lets not worry about that just yet. Another question, what edition of SQL Server are you running?

    Also, is the intent to keep 3 months of data in the table going forward?

    If so, is that 3 months plus the current month?

    Yes its the current month and last three months i.e. May, June, July and August.

    I am using SQL Server 2008 R2

  • funooni (8/8/2014)


    Lynn Pettis (8/8/2014)


    There are other issues with your query, but lets not worry about that just yet. Another question, what edition of SQL Server are you running?

    Also, is the intent to keep 3 months of data in the table going forward?

    If so, is that 3 months plus the current month?

    Yes its the current month and last three months i.e. May, June, July and August.

    I am using SQL Server 2008 R2

    Edition, not version. Are you running Standard Edition or Enterprise Edition.

  • Lynn Pettis (8/8/2014)


    funooni (8/8/2014)


    Lynn Pettis (8/8/2014)


    There are other issues with your query, but lets not worry about that just yet. Another question, what edition of SQL Server are you running?

    Also, is the intent to keep 3 months of data in the table going forward?

    If so, is that 3 months plus the current month?

    Yes its the current month and last three months i.e. May, June, July and August.

    I am using SQL Server 2008 R2

    Edition, not version. Are you running Standard Edition or Enterprise Edition.

    Oh its Enterprise Edition.

    By the way the previous roll back is still going on and I am expecting it to take another two hours as per my calculation of remaining rows.

    Right now I am ran out of Tempdb Log file space 🙁

  • funooni (8/8/2014)


    Lynn Pettis (8/8/2014)


    funooni (8/8/2014)


    Lynn Pettis (8/8/2014)


    There are other issues with your query, but lets not worry about that just yet. Another question, what edition of SQL Server are you running?

    Also, is the intent to keep 3 months of data in the table going forward?

    If so, is that 3 months plus the current month?

    Yes its the current month and last three months i.e. May, June, July and August.

    I am using SQL Server 2008 R2

    Edition, not version. Are you running Standard Edition or Enterprise Edition.

    Since you are using Enterprise Edition I would look at using partitioning to facility data management. This will require establishing a clustered index on the table, and looking at the code you posted I would say on the AuditDateTime column. By using partitioning you will be able to use a sliding window to easily delete (actually switch out) a one month partition of data and start a new one. This will make managing the data much easier.

    I would recommend that you start by reading about partitioning in Books Online and set up a sandbox database where you can practice setting up the table and management of the partitions.

    Oh its Enterprise Edition.

    By the way the previous roll back is still going on and I am expecting it to take another two hours as per my calculation of remaining rows.

    Right now I am ran out of Tempdb Log file space 🙁

  • funooni (8/8/2014)


    Jeff Moden (8/8/2014)


    I have to get to work but here's a 60,000 ft view of what I would do...

    1. DO A FULL BACKUP!!!!!!!!!!!!

    2. MAKE SURE POINT-IN-TIME BACKUPS ARE BEING TAKEN111!

    3. Create an empty identical table.

    4. Do a cascaded-rename (takes about 63ms) to rename the old table to something else and rename the new table to what the old table was named. This new table will continue in the place of the old table.

    5. Insert the desired old rows from the old table into the new table using a WHILE loop that does it in 3 million row batchs (most systems will reach a tipping point shortly after that so I wouldn't go much higher if at all).

    6. When you're absolutely sure that everything worked correctly, DROP the old table.

    Thanks Jeff

    Backup is done and point in time backups are being taken

    I will follow your suggestion. I believe 63 ms would mean a very slight downtime might be faced by some trigger or end user application.

    By the way the table is hit by a trigger on another table so the ultimate end user might not face anything.

    Any further suggestion about the downtime ? I mean what should I be expecting during that 63 ms ?

    Thanks again.

    I'd recommend you post ALL the code you write for this before you even think of running it once you have it code.

    As to what should you be expecting during that 63ms? You're the one that's going to have to figure that out. I have no clue as to how often the table is hit.

    As a bit of a side bar... I typically have staging tables for things like this and a job to transfer the new rows from the staging table to the final table that run once a minute. That way, when I need to do something to the "huge" table, I just turn off the transfer job. The new rows continue to accumulate in the staging table while I'm working (usually just a couple of minutes) and when I'm done, I just turn the job back on and everything is good.

    For future deletions for this table, you might want to look into partitioning by date. Save that for until you're done with this "delete".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply