Hidden Tricks To SQL Server Table Cleanup

  • Add an index to date and loop round that field then, be quicker than using a view.

  • without "while"...

    Delete top xxx from yyy where zzzz

    go 1000:-)

  • radekkokes (9/4/2015)


    without "while"...

    Delete top xxx from yyy where zzzz

    go 1000:-)

    The problem is that is that 1000 might be too many or too few.

  • on one of the places I worked a few years back, we had similar situation.

    although we had an update/upgrade windows from 12AM to 6AM once or twice per month this was not entirely off the grid time. it was simply known that we had to designate a 2 hour window starting from 12AM to 2AM as off time (meaning the system might be /will be unreachable during that period)

    after that it would be slow with intermittent downtime. as in it might become unreachable for 5-10 min now and then.

    we also had live tables that would not be down at all but had to be cleaned.

    so, we set up job running at 15 min interval, that would select a several thousand records ids at a time with a (nolock) clause and put them into cache table. than we had a scheduled job that would run every hour at first and delete 100 records at a time, only at night when the traffic to the table was a lot less.

    after the initial clean up (it took about a week to clean up almost 50 mill records from 80mill) we switched the job to run every 2 hour from 12AM to 8AM . the table was clean and small after that.we only needed active records in there as database where backed up every 3 hours. add to that regular db maintenance with shrink log and data files when ever possible we had the db running lean and clean.

    the technique in this article is OK to use as an ad-hock thing but

    not on regular basis and specially not on live table. might run into big trouble one day.

  • Vlad-207446 (9/4/2015)


    on one of the places I worked a few years back, we had similar situation.

    although we had an update/upgrade windows from 12AM to 6AM once or twice per month this was not entirely off the grid time. it was simply known that we had to designate a 2 hour window starting from 12AM to 2AM as off time (meaning the system might be /will be unreachable during that period)

    after that it would be slow with intermittent downtime. as in it might become unreachable for 5-10 min now and then.

    we also had live tables that would not be down at all but had to be cleaned.

    so, we set up job running at 15 min interval, that would select a several thousand records ids at a time with a (nolock) clause and put them into cache table. than we had a scheduled job that would run every hour at first and delete 100 records at a time, only at night when the traffic to the table was a lot less.

    after the initial clean up (it took about a week to clean up almost 50 mill records from 80mill) we switched the job to run every 2 hour from 12AM to 8AM . the table was clean and small after that.we only needed active records in there as database where backed up every 3 hours. add to that regular db maintenance with shrink log and data files when ever possible we had the db running lean and clean.

    the technique in this article is OK to use as an ad-hock thing but

    not on regular basis and specially not on live table. might run into big trouble one day.

    Actually deleting rows in small batches in a loop, accessing the rows on an indexed column is about the best way you can delete rows in a table and if the batch size is small enough it won't affect current users and it can run through until all the required rows are deleted.

  • Jonathan AC Roberts >> I never sad otherwise.

    it's just some times it is not optimal to run this kind of process manually on a LIVE database and think that is the best option you have. when dealing with this range of records, and we are talking about millions of records here and a LIVE db to boot , I am most sure nobody expect you to make it lean and clean NOW. there is always a time period for when it have to be done. and if this is a 24/7/365 kind of environment, it means that it have to be done on regular basis after the initial cleaning. hence it might be more prudent for you to come up with a process that will do all that and will be doing it long after the initial cleanup is done, and doing it reliably. that's all.

    at the job in my example we had go through many techniques including one just like in the article and for our case the one we pick was most optimal and workable. as a matter of fact it is still there even after 5 year run. a bit streamlined and optimized but still used. also since we developed and maintained the front end for the customer, we changed the application to not do the in-time deletes but simply insert record id into the job monitored table to be deleted on schedule. all selects where adjusted to filter active records on ids in the pending deletes table as well.

  • P Jones (3/6/2013)


    Itzik Ben-Gan teaches

    WHILE 1=1

    BEGIN

    DELETE TOP (5000) FROM table WHERE .....

    IF @@ROWCOUNT < 5000 BREAK

    END

    and you can add a WAITFOR to delay between iterations

    Adjust the 5000 to suit your table.

    +1

    Gerald Britton, Pluralsight courses

  • jims-723592 (3/6/2013)


    Why keep counts at all?

    Let all logic live in the view, then while any entry in the view, remove

    ALTER PROCEDURE [dbo].[usp_VTrickleDeleteEvent]

    AS

    SET NOCOUNT ON

    SET QUOTED_IDENTIFIER OFF

    BEGIN

    WHILE exists (Select 1 from VEventCleanup)

    BEGIN

    BEGIN TRAN RemoveEvent

    DELETE FROM VEventCleanup –- delete from view with order by

    COMMIT TRAN -- COMMIT to release locks

    END

    END

    +1

    Gerald Britton, Pluralsight courses

  • David Walker-278941 (3/6/2013)


    But!!!!!

    ORDER IS IGNORED IN A VIEW, starting with SQL 2005. Even with Top 100 Percent specified, starting with (I believe) SQL 2008. See Books Online and many online posts. How does that affect this article?

    (From one blog entry: When you save a VIEW definition, the ORDER BY is ignored on the returnset. That is by SQL ANSI standards.)

    That's what I understand. If that's wrong, please let me know. Thanks.

    that's wrong. Here's the quote:

    The ORDER BY clause is used only to determine the rows that are returned by the TOP or OFFSET clause in the view definition. The ORDER BY clause does not guarantee ordered results when the view is queried, unless ORDER BY is also specified in the query itself.

    Ref: CREATE VIEW (Transact-SQL)

    Gerald Britton, Pluralsight courses

  • Edward.Polley 76944 (3/6/2013)


    I am a big fan of the 2 temp table approach and wrote about that on a previous article published at SQL Central. The difference here is to pull the rows in primary key sequence using an ORDER BY so I had to switch to this loop thru the view approach.

    That doesn't work the way you think:

    Important:

    The ORDER BY clause is used only to determine the rows that are returned by the TOP or OFFSET clause in the view definition. The ORDER BY clause does not guarantee ordered results when the view is queried, unless ORDER BY is also specified in the query itself.

    Gerald Britton, Pluralsight courses

  • Edward.Polley 76944 (3/6/2013)


    Try this.

    CREATE VIEW dbo.test as (SELECT top (100) * from table ORDER BY primary_key_column). Look at the result they will be ordered.

    Important

    The ORDER BY clause is used only to determine the rows that are returned by the TOP or OFFSET clause in the view definition. The ORDER BY clause does not guarantee ordered results when the view is queried, unless ORDER BY is also specified in the query itself.

    Gerald Britton, Pluralsight courses

  • I have been using this method for a while now, and I include a WAITFOR DELAY '00:00:02' between each loop for the delete. This will ensure that if there are any waiting processes they will take the resource during this wait period and do whet it needs to do.

    If you are not concerned about contention, disable the indexes on the table, the delete will happen a lot quicker!

  • g.britton (9/4/2015)


    P Jones (3/6/2013)


    Itzik Ben-Gan teaches

    WHILE 1=1

    BEGIN

    DELETE TOP (5000) FROM table WHERE .....

    IF @@ROWCOUNT < 5000 BREAK

    END

    and you can add a WAITFOR to delay between iterations

    Adjust the 5000 to suit your table.

    +1

    +1

  • There is a theoretical possibility that this loop never ends if we don't predetermine the loops in advance.

  • carbogast (3/6/2013)


    I did something very similar which is working great. The commit after each small batch is the real secret. I also put a WAITFOR DELAY '00:00:02' into the loop which seems to help keep the purge stored procedure from encroaching on the overall server performance.

    I don't think there is any benefit to adding WAITFOR DELAY as any processes will be able to jump in as soon as each transaction finishes. A better option would be to reduce the batch size if there is any problem with contention.

Viewing 15 posts - 76 through 90 (of 95 total)

You must be logged in to reply to this topic. Login to reply