How to speed up purge process

  • Hello,

    I have a logging database with two parent/child RI-related tables. The speed of constant inserts is very high. I wrote the purge routine below to keep six months of data. It runs 24x7 but barely stays ahead of the inserts and is starting to fall behind as volume increases. I need a purge routine that works faster than the incoming inserts.

    High level of what I'm doing -

    Load the top 100 key IDs, for deletable data, into a table variable.

    Within a transaction, delete from the two tables where ID is in my table variable.

    Empty my table variable, load it up again, delete again.

    I tried bigger batches than 100 but 100 causes the least impact to the application.

    Any advice on how to speed this up is appreciated!

    SET NOCOUNT ON;

    DECLARE @OldestDate DATETIME;

    DECLARE @OldestDatetoKeep DATETIME;

    DECLARE @SixMonthsAgo DATETIME;

    DECLARE @IDsToDelete TABLE (ID nvarchar(32));

    DECLARE @RowsDeleted INT;

    SELECT @SixMonthsAgo = DATEADD(month,-6,getdate());

    SELECT @OldestDate = MIN(TransactionDtTime) FROM TabletAuditLog.dbo.Transactions (nolock);

    PRINT 'Oldest date = ' + convert(varchar(20),@OldestDate);

    SELECT @OldestDatetoKeep = DATEADD(day,+1,@OldestDate);

    IF @OldestDatetoKeep > @SixMonthsAgo

    SET @OldestDatetoKeep = @SixMonthsAgo;

    PRINT 'Oldest date to keep = ' + convert(varchar(20),@OldestDatetoKeep);

    SET @RowsDeleted = 1;

    IF @OldestDate < @OldestDatetoKeep

    WHILE @OldestDate < @OldestDatetoKeep

    BEGIN

    INSERT INTO @IDsToDelete

    SELECT TOP 100 ID

    FROM TabletAuditLog.dbo.Transactions (nolock)

    WHERE TransactionDtTime < @OldestDatetoKeep

    ORDER BY TransactionDtTime;

    WHILE @RowsDeleted > 0

    BEGIN

    BEGIN TRANSACTION

    DELETE FROM TabletAuditLog.dbo.TransactionDetails

    WHERE ID in (SELECT * FROM @IDsToDelete);

    DELETE FROM TabletAuditLog.dbo.Transactions

    WHERE ID in (SELECT * FROM @IDsToDelete);

    SET @RowsDeleted = @@ROWCOUNT;

    COMMIT TRANSACTION;

    END

    SET @RowsDeleted = 1;

    DELETE FROM @IDsToDelete;

    SELECT @OldestDate = MIN(TransactionDtTime) FROM TabletAuditLog.[dbo].[Transactions] (nolock);

    PRINT 'Oldest date now = ' + convert(varchar(30),@OldestDate);

    END

  • Is the TabletAuditLog.dbo.TransactionDetails table clustered on ID first?

    Just to confirm, the "ID" in the TD table is the same value as the ID in the T table, right?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Scott, the Transactions table is the parent and ID is its primary key and clustered index. ID is the foreign key in TransactionDetails pointing to Transactions so it is the same value. You are bringing two problems to light. 1) ID is a GUID and I know a GUID is a terrible clustered index key. I am working on changing that. 2) TransactionDetails is a heap without a clustered index.

    So I know this design is not ideal. But I would like to get that purge running faster until I can improve the design.

    thanks

  • Ken Davis (1/30/2015)


    Scott, the Transactions table is the parent and ID is its primary key and clustered index. ID is the foreign key in TransactionDetails pointing to Transactions so it is the same value. You are bringing two problems to light. 1) ID is a GUID and I know a GUID is a terrible clustered index key. I am working on changing that. 2) TransactionDetails is a heap without a clustered index.

    So I know this design is not ideal. But I would like to get that purge running faster until I can improve the design.

    thanks

    Normally you'd be much better off clustering TD on ID, but with it being a GUID that's not gonna help in this case.

    Verify that the query plan for the DELETE on TD shows it using the nonclus index.

    What is the avg ratio of TD to T rows? That is, 100 T rows = on avg how many TD rows?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • The process is making pretty good use of indexes. It is only seeks and it's a seek on the nonclus index on ID for the TD table. I will try to attach the query plan.

    The ratio is about 2 TD rows to 1 T row.

  • Yeah, not bad, although it does have to do a full index scan for trans date on the TD table.

    You can do minor tweaks on the list itself, changing it to a keyed, unique temp table and TRUNCATEing it instead of DELETEing from it.

    CREATE TABLE #IDsToDelete (

    ID uniqueidentifier PRIMARY KEY

    )

    ...

    TRUNCATE TABLE #IDsToDelete

    ...

    When you can, you might want to cluster these tables on tran date time (assuming it (almost) never changes once the row is inserted). You can continue to use a nonclus index on ID, which has more freespace and is rebuilt much more often.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thanks for those suggestions. Believe it or not those two changes sped up the process considerably. It is back to being faster than the inserts and it caught up overnight so I'm back to six months of data.

    Thanks again!

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply