Hidden Tricks To SQL Server Table Cleanup

  • Comments posted to this topic are about the item Hidden Tricks To SQL Server Table Cleanup

  • I think the creation of a separate view is unnecessary when you can have the view inline:

    DECLARE @Rowcount int

    SET @Rowcount = -1 -- Initialise

    WHILE @Rowcount <> 0

    BEGIN

    ;WITH CTE AS

    (

    SELECT TOP(2000) *

    FROM EventTracking.dbo.Event

    WHERE EventCloseDate < GETDATE() - 6 -- remove rows older than 6 days

    ORDER BY Eventid

    )

    DELETE CTE

    SET @RowCount = @@ROWCOUNT

    END

    Alternatively:

    DECLARE @Rowcount int

    SET @Rowcount = -1 -- Initialise

    WHILE @Rowcount <> 0

    BEGIN

    DELETE EventTracking.dbo.Event

    WHERE EventId IN (SELECT TOP(2000) EventId

    FROM EventTracking.dbo.Event

    WHERE EventCloseDate < GETDATE() - 6 -- remove rows older than 6 days

    ORDER BY Eventid)

    SET @RowCount = @@ROWCOUNT

    END

  • Most people would never go beyond the bounds of a BIGINT with an IDENTITY(1,1) but for those of us who do, remember that reseeding identity values is incompatible with this method.

    Mind you at those sorts of scales you probably have enterprise edition and can do partition switching to do even faster deletes with minimal IO impact!

    I take it that the EventCloseDate as a NULL field means that you don't know in advance what that close date will be?

    If you did know the EventCloseDate upfront then I'd look at putting the clustered key on it. Either that or put persist the EventStartDate and put the clustered key over the EventStartDate and Duration. Obviously this would need evaluating and testing as events going in out of sync will cause some degree of fragmentation. Whether such fragmentation would be significant or an issue for you is something only you can determine. All standard "It Depends" stuff.

  • Alternatively, we can use this too.. it takes about an hour to delete a million records

    set rowcount 50000

    DELETE FROM EventTracking.dbo.Event

    while @@rowcount>0

    begin

    set rowcount 50000

    DELETE FROM EventTracking.dbo.Event

    END

    set rowcount 0

    ______________________________________________________________Every Problem has a Solution; Every Solution has a Problem: 🙂

  • Agree with last poster. Basically, DELETE TOP 1000000 FROM EVENT doesn't work, but DELETE TOP (1000000) FROM EVENT does. Similarly ROWCOUNT works, but the former (TOP), has the benefit that you can use a variable. So one of the statements is very misleading, except if you're using SQL Server 2000 - when DELETE TOP (1000000) won't work.

    Although this tip is useful for me, as I have groups of tables I have to delete in chunks and order of last ID in the same sequeunce across the sets of tables.

    This DBA says - "It depends".

  • The code in the article doesn't check that the EventCloseDate older than 6 days when it does the delete. Is it that the EventCloseDate is always present and monotonically increasing with the EventId?

    It also won't always delete all the rows, for example, if there are 5000 rows that need deleting it will only delete the first 4000.

  • Itzik Ben-Gan teaches

    WHILE 1=1

    BEGIN

    DELETE TOP (5000) FROM table WHERE .....

    IF @@ROWCOUNT < 5000 BREAK

    END

    and you can add a WAITFOR to delay between iterations

    Adjust the 5000 to suit your table.

  • Why keep counts at all?

    Let all logic live in the view, then while any entry in the view, remove

    ALTER PROCEDURE [dbo].[usp_VTrickleDeleteEvent]

    AS

    SET NOCOUNT ON

    SET QUOTED_IDENTIFIER OFF

    BEGIN

    WHILE exists (Select 1 from VEventCleanup)

    BEGIN

    BEGIN TRAN RemoveEvent

    DELETE FROM VEventCleanup –- delete from view with order by

    COMMIT TRAN -- COMMIT to release locks

    END

    END

  • What about the transaction log? I have worked in an environment where disk space was hard to come by. I had a similar need, but I also had to add CHECKPOINTs in the code, because the tran log was filling up the disk space.

    You might be asking, "Why not just add more disk?" You would have to understand the company I worked for. BTW, I don't work there anymore :).

  • This is excellent. We also use the DELETE TOP (###) syntax. Works great.

    BTW, it is 'stored procedure' not 'store procedure.' Sorry for picking the nit.

  • Adding a view would have helped my little process deleting polution from an audit trigger. A view would have simplied my efforts!

    My twist for others to consider is adding a pause to let other possibly locked out processes to jump in:

    WHILE Exists(SELECT TOP 1 RECORDID

    FROM [Audit].[dbo].SCHOOL

    where ChangeBy IN ('<snip>'

    ,'NT AUTHORITY\SYSTEM')

    ) BEGIN

    DELETE [Audit].[dbo].SCHOOL

    FROM [Audit].[dbo].SCHOOL x

    INNER JOIN (

    Select TOP 10000 RECORDID

    FROM [Audit].[dbo].SCHOOL

    where ChangeBy IN ('<snip>'

    ,'NT AUTHORITY\SYSTEM')

    ) y

    ON x.RECORDID = y.RECORDID

    waitfor delay '00:00:01'

    END

  • Here is one technique I use when there is a need to perform bulk deletes on a large table. It is not an online operation, but it should be able to complete the job on a multi-GB table within 10 minutes. Not only is this one of the fastest methods, but it minimizes tempdb usage, minimizes transaction logging, and the end result leaves the table with organized indexes.

    #1 BCP a select of the records you to keep into native format file ( -n option).

    #2 Truncate the table.

    #3 Drop indexes.

    #3 BCP from the file into the table. Specify -E option if you want to insert original identity values.

    #4 Re-create indexes.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • OK, I am not sure I am on board as the example is written. If you know you are going to be cleaning up the database regularly I would personally cluster the EventClosedDate field to ensure I can use that for fast removal and non-cluster the primary key (identity column). This is for multiple reasons.

    1) I don't see anything at all saying EventID 2 couldn't end months after EventIDs 3-2000. And since your top doesn't actually take that into account you could in theory delete something still valid in your rules.

    2) If your purpose is to remove values from a table where a particular is your target goal then that column as a rule must be referenced in your query otherwise you may miss values or get more values than expected.

    3) You are using ceiling in your query, thus if the calculation is 880.000000001 you will make 881 loops. In the 881st loop you will wipe out data which does have a date value >= your 6 day old value becuase there is no sanity check. I suggest use floor, you will have some values hanging but better to trim to little than to trim to much in the business world. Refer back to 1.

    4) I would use your method after altering indexes to ensure I can use EntryDate as my qualifier to trim exactly what I need.

    BTW, title is a bit misleading, breaking processes into smaller transactions has always been known to have the best performance method in reducing contention.

  • karthik babu (3/6/2013)


    Alternatively, we can use this too.. it takes about an hour to delete a million records

    set rowcount 50000

    DELETE FROM EventTracking.dbo.Event

    while @@rowcount>0

    begin

    set rowcount 50000

    DELETE FROM EventTracking.dbo.Event

    END

    set rowcount 0

    Although I'm fond of that technique as well, it's deprecated syntax: http://msdn.microsoft.com/en-us/library/ms143729.aspx.

    - Jeff

  • My concern about the application of this technique is a reliance on the primary key being date specific.

    In short, it selects the top N rows where the date < 6 days ago ordered by primary key.

    As soon as a row is added with a date >= 6 days ago, this query will ignore any records following that date which may have been previously deleted.

    You can still use the batch method, which is really the key to the whole performance issue, and very timely. The difference is that you need to step through the whole table.

Viewing 15 posts - 1 through 15 (of 95 total)

You must be logged in to reply to this topic. Login to reply