Cleanup Sysmail -items from msdb

  • Bruin

    Ten Centuries

    Points: 1011

     

    Could you provide full example that includes the checkpoint and Wait?

    Thanks.

  • Jeffrey Williams

    SSC Guru

    Points: 88590

    I did not include the progress table...you can add that in if needed:

        Use msdb;
    Go

    Set Nocount On;

    Declare @maxItemId int = 0
    , @rowsAffected int = 1
    , @batchSize int = (Select bd.BatchSize From dbo.BatchDelete Where bd.TableName = 'sysmailitems');

    --==== Set limit
    Select @maxItemId = max(sm.mailitem_id)
    From dbo.sysmail_mailitems sm
    Where sm.send_request_date < dateadd(day, -90, getdate());

    --==== Perform the batch delete
    While @rowsAffected > 0
    Begin

    Delete Top(@batchSize)
    From dbo.sysmail_mailitems
    Where mailitem_id < @maxItemId;

    Set @rowsAffected = @@rowcount;

    Checkpoint;

    --==== Wait for 1 second
    Waitfor DELAY '00:00:01';
    End;

    This sets the limit based on the number of days to keep - which could also be included in your BatchDelete configuration table.  If you add an index to the table based on the column send_request_date - you could change this to:

        Use msdb;
    Go

    Declare @daysBack int
    , @batchSize int
    , @rowsAffected int = 1;

    --==== Set limits
    Select @daysBack = bd.DaysBack
    , @batchSize = bd.BatchSize
    From dbo.BatchDelete bd
    Where bd.TableName = 'sysmailitems';

    --==== Perform the batch delete
    While @rowsAffected > 0
    Begin

    Delete Top(@batchSize)
    From dbo.sysmail_mailitems
    Where send_request_date < dateadd(day, @daysBack, getdate());

    Set @rowsAffected = @@rowcount;

    Checkpoint;

    --==== Wait for 1 second
    Waitfor DELAY '00:00:01';
    End;

     

     

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeff Moden

    SSC Guru

    Points: 996812

    I'm thinking that, although the mail items are being deleted, I'm not sure that any attachments will be and I'm pretty sure that the mail log isn't being reduced by any of the code so far.  Of course, I could be wrong.

    There's also a kind of built-in batch control here if you use the tools that are inherent to SQL Server.  That would be a "day" worth of deletes and you can pull it off, with logging if you setup the job to write to a table or file, like this...

    --===== Local variables and constants
    DECLARE @CutoffDate DATETIME = CONVERT(DATE,DATEADD(mm,-18,GETDATE())) --Change "-18" to how far back you want to stop
    ,@WorkingDate DATETIME
    ,@DisplayDate CHAR(10) -- This is just so we can list the date using RAISERROR.
    ;
    --===== Determine the earliest date that we need to delete mail from.
    SELECT @WorkingDate = CONVERT(DATE,MIN(sent_date))
    ,@DisplayDate = CONVERT(CHAR(10),@WorkingDate,23)
    FROM msdb.dbo.sysmail_allitems
    ;
    --===== For each date between the minimum date we found and the cutoff date,
    -- delete the mail items, attachments (part of the mail items), and
    -- the related log entries.
    WHILE @WorkingDate <= @CutoffDate
    BEGIN
    --======= Tell the operator what is going on. This can be logged to a file by the job.
    RAISERROR ('Working on %s..',0,0,@DisplayDate) WITH NOWAIT
    ;
    --======= Do the deletes for the day using what''s already built into SQL Server.
    EXECUTE msdb.dbo.sysmail_delete_mailitems_sp @sent_before = @WorkingDate;
    EXECUTE msdb.dbo.sysmail_delete_log_sp @logged_before = @WorkingDate;

    --======= Delay for 1 second so that other things have a chance to execute.
    WAITFOR DELAY '00:00:01'
    ;
    --======= Add 1 to working date and update the display value for it.
    SELECT @WorkingDate += 1
    ,@DisplayDate = CONVERT(CHAR(10),@WorkingDate,23)
    ;
    END
    ;

    Note that this one is setup to only keep the last 18 Months of email.  You can manually change that where indicated.  I wouldn't make this a parameter for a proc because if someone makes a mistake, then POOF!, everything is gone.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Bruin

    Ten Centuries

    Points: 1011

    If you have multiple jobs running from the sql agent during the delete cleanup, do you think I'll run into locking?

     

    Thanks.

  • Jeff Moden

    SSC Guru

    Points: 996812

    Maybe, but it shouldn't be for long unless you have a huge number of emails per day.  Of course, you would have had the same blocking (short and sweet) with the deletions in batches.

    I say "maybe" because we're deleting earlier emails and hopefully we'll only see page locking and not full table locking and hopeffully the deletes won't be mutually exclusive to the new inserts/updates unless the two stored procedures have some form of wild locking requirements built in.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeffrey Williams

    SSC Guru

    Points: 88590

    Jeff Moden wrote:

    Maybe, but it shouldn't be for long unless you have a huge number of emails per day.  Of course, you would have had the same blocking (short and sweet) with the deletions in batches.

    I say "maybe" because we're deleting earlier emails and hopefully we'll only see page locking and not full table locking and hopeffully the deletes won't be mutually exclusive to the new inserts/updates unless the two stored procedures have some form of wild locking requirements built in.

    The problem here is that there is no index on the table based on the date.  This will require a clustered index scan of the table for every iteration.  Adding the appropriate indexes to the table would help with the delete - but will require additional space available.

    Deleting the log may or may not be necessary...that depends on whether or not the logging level has been changed from the default.  If it was not changed then the only items logged are the startup/shutdown and any errors or warnings - which is minimal.  It doesn't hurt to also include this...but it probably isn't a problem.

    As for deleting attachments - there doesn't appear to be any procedures available to handle the maintenance of attachments but that isn't needed since the table is defined with a FK relationship to sysmail_mailitems and set to cascade.  In other words, deleting from sysmail_mailitems will cascade the delete to sysmail_attachments.

    The advantage to using the procedure to perform the delete is that will also log the delete in sysmail_log with the number of items deleted.  This eliminates the requirement of using a separate logging table to identify how many rows have been deleted.  The downside is again - no indexes on the columns used in the procedure to perform the delete.

    One other note - instead of creating your own logging table you can use the procedure sysmail_logmailevent_sp which will log the event in the sysmail_log table.  Here is what is used in the delete procedure:

     DECLARE @localmessage nvarchar(255)
    SET @localmessage = FORMATMESSAGE(14665, SUSER_SNAME(), @@ROWCOUNT)
    exec msdb.dbo.sysmail_logmailevent_sp @event_type=1, @description=@localmessage

    If you use the procedure to delete and don't have any indexes it could take quite a bit longer than expected and could escalate the locks and cause additional blocking.  It might not...but the only way to be sure is to test.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Bruin

    Ten Centuries

    Points: 1011

    The sysmail_attachments is empty, it's only the the sysmail_mailitems where the problem exists(Size\Records\Space). I choose to stay away from dates and use the PK(mailitem_id) for the deletes. I ran a query by month from oldest to newest to gather the mailitem_id(min\max) for my deletes. The recovery model is currently "Simple" for MSDB. With all that said would that change how

    you would proceed doing the cleanup?

    Thanks.

  • Jeffrey Williams

    SSC Guru

    Points: 88590

    Have you tested using the procedure with the oldest available date?  If so - how long did it take to delete the data?

    It is also possible to add an index to the table to improve the performance of the deletes by date - which would allow for using the stored procedure.

    But if you insist on using the mailitem_id - then the approach I put out will work.  I would recommend adding a call to the procedure to log the delete (instead of creating your own table) and include the cleanup of the log.

    You want a small enough batch size so the delete executes quickly - but large enough to get through the total number of items to be deleted in a reasonable time.  If the batch size is too large, the delete will block other processes - and potentially cause them to fail waiting for access.  If the batch size is too small - it can take a very long time to go through all items.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeff Moden

    SSC Guru

    Points: 996812

    To be honest, I think that all'y'all 😀 are making this a whole lot more complex than needed.  The PK on the table is an identity column.  That means that it will be in very close to the same order as the sent date.  If you do the deletes using that sent date like I had in the code I posted, you'll be deleting in virtually the same order as the PK which means that you'll be deleting from the logical "top" of the table while the logical "bottom" of the table continues to take in new rows in an unincumbered fashion.  You'll eventually need to get dates involved anyway to figure out when to stop deleting.  If you want to simply guarantee such operation, use the sent date to find the PKs and store them in a separate table and then use that as the delete-controller.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeffrey Williams

    SSC Guru

    Points: 88590

    Jeff Moden wrote:

    To be honest, I think that all'y'all 😀 are making this a whole lot more complex than needed.  The PK on the table is an identity column.  That means that it will be in very close to the same order as the sent date.  If you do the deletes using that sent date like I had in the code I posted, you'll be deleting in virtually the same order as the PK which means that you'll be deleting from the logical "top" of the table while the logical "bottom" of the table continues to take in new rows in an unincumbered fashion.  You'll eventually need to get dates involved anyway to figure out when to stop deleting.  If you want to simply guarantee such operation, use the sent date to find the PKs and store them in a separate table and then use that as the delete-controller.

    I agree here - the problem is the OP insists on deleting by the mailitem_id only.  It would be nice to see how long it takes for a single day to be deleted - to determine if that is acceptable.  If it is not within an acceptable range then either add an index or revert to deleting by the ID.

    The concern I have with deleting by date is that SQL Server *may* escalate to a table lock during the clustered index scan.  If SQL Server does not escalate and only uses row/page locks - it should not have an impact on other processes.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Bruin

    Ten Centuries

    Points: 1011

     

    This is what I'm doing "If you want to simply guarantee such operation, use the sent date to find the PKs and store them in a separate table and then use that as the delete-controller." I just didn't store it in a table.

     

    I'm going to sample in the am.

     

    Thanks.

  • Jeff Moden

    SSC Guru

    Points: 996812

    Excellent.  Thanks for the feedback, Bruin.  I think I can speak for the others in saying that we're very interested in how your test turns out.  My production system only had 1.5 GB across 2 years on it and it took 31 seconds to delete 1 full year in one shot.  It will likely do better using the PK.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Bruin

    Ten Centuries

    Points: 1011

    I got delayed on this, but wanted to give an update:

    I'm trying to put together my test environment to begin the deletions.

    Thanks.

     

     

  • Jeff Moden

    SSC Guru

    Points: 996812

    Awesome.  Thanks for the feedback, Bruin.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Bruin

    Ten Centuries

    Points: 1011

    I have been running the Batch Deletes using the MailID, and deleting 10,000 at a pass. It deleted 3/4 million records

    in 8 minutes, and 1.5 million in 10 minutes on my test box. The test box only has 8 gig of memory(2cpus) but is very lightly loaded.

    The Prod box has 32gig of memory and 4 cpus.

    Thanks for ALL response and help much apprictiacted.

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply