How to speed up purge process

Question

Post reply

How to speed up purge process

Ken Davis

SSCarpal Tunnel

Points: 4329
More actions
January 30, 2015 at 12:20 pm

#295044

Hello,
I have a logging database with two parent/child RI-related tables. The speed of constant inserts is very high. I wrote the purge routine below to keep six months of data. It runs 24x7 but barely stays ahead of the inserts and is starting to fall behind as volume increases. I need a purge routine that works faster than the incoming inserts.
High level of what I'm doing -
Load the top 100 key IDs, for deletable data, into a table variable.
Within a transaction, delete from the two tables where ID is in my table variable.
Empty my table variable, load it up again, delete again.
I tried bigger batches than 100 but 100 causes the least impact to the application.
Any advice on how to speed this up is appreciated!
SET NOCOUNT ON;
DECLARE @OldestDate DATETIME;
DECLARE @OldestDatetoKeep DATETIME;
DECLARE @SixMonthsAgo DATETIME;
DECLARE @IDsToDelete TABLE (ID nvarchar(32));
DECLARE @RowsDeleted INT;
SELECT @SixMonthsAgo = DATEADD(month,-6,getdate());
SELECT @OldestDate = MIN(TransactionDtTime) FROM TabletAuditLog.dbo.Transactions (nolock);
PRINT 'Oldest date = ' + convert(varchar(20),@OldestDate);
SELECT @OldestDatetoKeep = DATEADD(day,+1,@OldestDate);
IF @OldestDatetoKeep > @SixMonthsAgo
SET @OldestDatetoKeep = @SixMonthsAgo;
PRINT 'Oldest date to keep = ' + convert(varchar(20),@OldestDatetoKeep);
SET @RowsDeleted = 1;
IF @OldestDate < @OldestDatetoKeep
WHILE @OldestDate < @OldestDatetoKeep
BEGIN
INSERT INTO @IDsToDelete
SELECT TOP 100 ID
FROM TabletAuditLog.dbo.Transactions (nolock)
WHERE TransactionDtTime < @OldestDatetoKeep
ORDER BY TransactionDtTime;
WHILE @RowsDeleted > 0
BEGIN
BEGIN TRANSACTION
DELETE FROM TabletAuditLog.dbo.TransactionDetails
WHERE ID in (SELECT * FROM @IDsToDelete);
DELETE FROM TabletAuditLog.dbo.Transactions
WHERE ID in (SELECT * FROM @IDsToDelete);
SET @RowsDeleted = @@ROWCOUNT;
COMMIT TRANSACTION;
END
SET @RowsDeleted = 1;
DELETE FROM @IDsToDelete;
SELECT @OldestDate = MIN(TransactionDtTime) FROM TabletAuditLog.[dbo].[Transactions] (nolock);
PRINT 'Oldest date now = ' + convert(varchar(30),@OldestDate);
END

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101223 More actions · Answer 1

Is the TabletAuditLog.dbo.TransactionDetails table clustered on ID first?

Just to confirm, the "ID" in the TD table is the same value as the ID in the T table, right?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Ken Davis SSCarpal Tunnel Points: 4329 More actions · Answer 2

Scott, the Transactions table is the parent and ID is its primary key and clustered index. ID is the foreign key in TransactionDetails pointing to Transactions so it is the same value. You are bringing two problems to light. 1) ID is a GUID and I know a GUID is a terrible clustered index key. I am working on changing that. 2) TransactionDetails is a heap without a clustered index.

So I know this design is not ideal. But I would like to get that purge running faster until I can improve the design.

thanks

ScottPletcher SSC Guru Points: 101223 More actions · Answer 3

Ken Davis (1/30/2015)
Scott, the Transactions table is the parent and ID is its primary key and clustered index. ID is the foreign key in TransactionDetails pointing to Transactions so it is the same value. You are bringing two problems to light. 1) ID is a GUID and I know a GUID is a terrible clustered index key. I am working on changing that. 2) TransactionDetails is a heap without a clustered index.
So I know this design is not ideal. But I would like to get that purge running faster until I can improve the design.
thanks

Normally you'd be much better off clustering TD on ID, but with it being a GUID that's not gonna help in this case.

Verify that the query plan for the DELETE on TD shows it using the nonclus index.

What is the avg ratio of TD to T rows? That is, 100 T rows = on avg how many TD rows?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Ken Davis SSCarpal Tunnel Points: 4329 More actions · Answer 4

The process is making pretty good use of indexes. It is only seeks and it's a seek on the nonclus index on ID for the TD table. I will try to attach the query plan.

The ratio is about 2 TD rows to 1 T row.

ScottPletcher SSC Guru Points: 101223 More actions · Answer 5

Yeah, not bad, although it does have to do a full index scan for trans date on the TD table.

You can do minor tweaks on the list itself, changing it to a keyed, unique temp table and TRUNCATEing it instead of DELETEing from it.

CREATE TABLE #IDsToDelete (

ID uniqueidentifier PRIMARY KEY

)

...

TRUNCATE TABLE #IDsToDelete

...

When you can, you might want to cluster these tables on tran date time (assuming it (almost) never changes once the row is inserted). You can continue to use a nonclus index on ID, which has more freespace and is rebuilt much more often.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Ken Davis SSCarpal Tunnel Points: 4329 More actions · Answer 6

Thanks for those suggestions. Believe it or not those two changes sped up the process considerably. It is back to being faster than the inserts and it caught up overnight so I'm back to six months of data.

Thanks again!