How to delete in small chunks

Question

Post reply

How to delete in small chunks

vsamantha35

SSChampion

Points: 11529
More actions
February 5, 2021 at 8:33 am
Go to Answer
#3841409

Hi All,
We have a large log table which consists of last 10 year data. When we are running a delete on such big table which has some LOB columns as well. So as a result, we are facing 9002 log full error. So, want to come up a purge script which can delete in small. Here is a small script which has 5 years of data .
Requirement : We want to retain latest 1 year data(i.e. 2021) and delete other year's data by deleting 30 days worth data in each iteration. In other words, want to delete only one month data at a time.
Need some help on the logic part.
demo code:
IF OBJECT_ID('dbo.test', 'U') IS NOT NULL
DROP TABLE dbo.test;
create table test
(
id int identity(1,1),
doj datetime,
dojj date
)
DECLARE @date_start DATE;
DECLARE @date_end DATE;
DECLARE @loop_date DATE;
SET @date_start = '2015/01/01';
SET @date_end = '2021/12/12';
SET @loop_date = @date_start;
WHILE @loop_date <= @date_end
BEGIN
--PRINT @loop_date;
insert into test(doj,dojj) values(@loop_date,@loop_date);
SET @loop_date = DATEADD(DAY, 1, @loop_date);
END;
go
select * from test;
go
select min(doj) min_dt,max(doj) max_dt from test
go
February 5, 2021 at 10:10 am

This was removed by the editor as SPAM

Jeffrey Williams

SSC Guru

Points: 90252

February 5, 2021 at 5:43 pm

Answer

#3841600

Something like this?

Declare @minDate datetime
      , @maxDate datetime;

    Set @minDate = (Select min(t.doj) From dbo.Test t);                 -- Get earliest date from table
    Set @minDate = dateadd(month, datediff(month, 0, @minDate) + 1, 0); -- Get the first of the month after @minDate
    Set @maxDate = dateadd(year, datediff(year, 0, getdate()), 0);      -- Get the first of current year

  While @minDate < @maxDate
  Begin

 Delete From dbo.Test
  Where doj < @minDate;
            
    Set @minDate = dateadd(month, 1, @minDate);
        
        Checkpoint;

Waitfor Delay '00:00:01';

    End;

Or - you could delete based on a specific batch size:

Declare @rowsAffected int = 1
      , @batchSize int = 2000000;

  While @rowsAffected > 0
  Begin

 Delete Top (@batchSize)
   From dbo.Test
  Where doj < dateadd(year, datediff(year, 0, getdate()), 0);

    Set @rowsAffected = @@rowcount;
        Checkpoint;

Waitfor Delay '00:00:01';

    End;

I would adjust this to keep at least 1 full year of log data - instead of deleting everything prior to the first of the current year. A rolling 12 months of log data would be better in my opinion. If you want to do that - then change the @maxDate - or filter to: dateadd(month, datediff(month, 0, getdate()) - 12, 0);

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

frdrckmitchell7 Ten Centuries Points: 1351 More actions · Answer 1

Hi

Did you think of partitioning your table into say 10 separate years? That way you don't have to run the whole 10 year data?

Kind regards

Fred

vsamantha35 SSChampion Points: 11529 More actions · Answer 2

Its a logging information. We don't need that log history to be maintained.

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 3

vsamantha35 wrote:

Requirement : We want to retain latest 1 year data(i.e. 2021) and delete other year's data by deleting 30 days worth data in each iteration. In other words, want to delete only one month data at a time.

Deletes aren't the right tool for this. You're talking about deleting 90% of the table.

Use minimal logging to transfer your year's worth of data to a new table in a new database. If you want to make it easy do drop a year at a time, partition the table so you can use SWITCH to quickly (almost instantly) move a month out of the table and then simply drop that table.

When you're done with the move, create a synonym to the new, much shorter table in the original database.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

vsamantha35 SSChampion Points: 11529 More actions · Answer 4

Hi Jeff,

Thanks for the options provided. One thing wanted to know, will both methods will take of Transaction log reuse right? meaning, rather than log file growing out of proportion , these solutions would reuse the same transaction log space without having to grow right?

Thanks,

Sam

vsamantha35 SSChampion Points: 11529 More actions · Answer 5

Thanks Jeff Moden. I would try the same on a sub-prod environment. Thats a very good idea.

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 6

vsamantha35 wrote:

Hi Jeff,
Thanks for the options provided. One thing wanted to know, will both methods will take of Transaction log reuse right? meaning, rather than log file growing out of proportion , these solutions would reuse the same transaction log space without having to grow right?
Thanks,
Sam

Correct. If you setup a separate database for the table and start it off in the SIMPLE Recovery Model and you do things in a "Minimally Logged" fashion, the log file will stay remarkable tiny.

You can do the same if you create the new table in the same database and, if you had been in the FULL Recovery Model, you can shift to the BULK LOGGED Recovery Model or the SIMPLE Recovery Model (you'll need to do either a DIF or FULL backup after shifting back to the FULL Recovery Model to reestablish the log file chain when you're done) but, if you follow the requirements for "Minimal Logging", the impact on the log file will be remarkable small compared to the the amount of data you're working with.

Here's the link to the MS document for the requirements for "Minimal Logging".

https://docs.microsoft.com/en-us/sql/relational-databases/import-export/prerequisites-for-minimal-logging-in-bulk-import?view=sql-server-ver15

Basically...

You have to get out of the FULL Recovery Model (which means that you'll need to turn off replication, mirroring, or anything else that requires the FULL Recovery Model and set it up again when you're done. This is NOT required for simple "clustered" instances, which rock!).
The table can have a Clustered Index on it. It will slow things down a bit but will usually take a lot less time than inserting the data into a heap and then adding a Clustered Index on such a large table. Converting to a Partitioned table and then doing this is also very worth while BUT partitioning has a shedload of stuff that you REALLY need to study before you try.
The table MUST NOT HAVE OTHER INDEXES on it. ONLY the Clustered Index. MS says it can work and people claim that things like Trace Flag 610 (on older instances) work, but the documentation also states in a footnote that there's no way to force it and, to wit, I've NEVER seen it work as claimed when other indexes are present. Make sure you don't have any FKs in the table either. Set all that stuff up AFTER you've loaded the data.
In the same vein as #3 above, "Minimal Logging" only really works on the FIRST INSERT! That means that you should do the entire data load as a single action.
You said you were going to test this in a non-prod environment... that's the right stuff! It will allow you to test that you've done things like "Minimal Logging" correctly, as well.

Oh... one final mention... if you use either row or page compression in this evolution, it's going to take about 3 to 4 times longer and further reduces the chances of "Minimal Logging".

And, yeah... I've done this a whole lot in the past. With a bit of planning and testing to make sure you've planned for and followed the correct steps, it's nasty fast and used very few resources compare to any other method I've found in (so far) any place where even as little as 20% of a table needed to be deleted. Doing so many DELETEs absolutely sucks. You DO need some temporary space to work in.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

carsten.saastamoinen Old Hand Points: 340 More actions · Answer 7

Here is an example from my education material. I have not changed names on columns, so you must learn a few danish words - Tid : Time, Navn : Name and Maaned : Month. I use partition table and TRUNCATE TABLE. Hope you can use it.

carsten.saastamoinen Old Hand Points: 340 More actions · Answer 8

With attachment!

Attachments:

You must be logged in to view attached files.

vsamantha35 SSChampion Points: 11529 More actions · Answer 9

vsamantha35

SSChampion

Points: 11529

February 10, 2021 at 6:46 am

#3843241

Hi Carsten,

Many thanks for sharing the scripts.

Sam

vsamantha35 SSChampion Points: 11529 More actions · Answer 10

Thanks Jeff, for sharing your thoughts. I have never thought in such a way. Perhaps, need to do a testing with a proper example to get some hands dirty on it. Thank you.