Mass Delete question

  • On SQL2012 we have an Error log table with 6.7 million records going back to 2013. I am told only records from the past 30 days or so are needed.

    I've been considering an agent job to do the deletion in chunks to reduce performance & log impact, then I had a different idea..

    1) Use SELECT INTO to create a temp table containing only records from the past 30 days (one ones we want to keep). There are only 65,000 of those.

    2) Truncate the original table.

    3) Copy the contents of the temp table back to the original table.

    This would be much faster. Any gotchas with this plan?

  • I've used that plan (or variations on it) many times.

    Just keep in mind that the table may have locks, etc., on it that will cause this to bomb.

    If possible, set the database to single-user mode during this operation.

    The other thing is that sometimes it's better to use a persisted table than a temp table. What happens if the server goes down after you've truncated the source table but before you restore from the temp table? Temp tables don't live through a server bounce. You can kind of manage that with transaction control, but it might be easier to use a real table instead of a temp table, then drop the real table when you're done.

    Another version is: Select Into a persisted ("real") table, validate the data, drop the current table, rename the new table to the old name.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • dan-572483 (6/14/2016)


    On SQL2012 we have an Error log table with 6.7 million records going back to 2013. I am told only records from the past 30 days or so are needed.

    I've been considering an agent job to do the deletion in chunks to reduce performance & log impact, then I had a different idea..

    1) Use SELECT INTO to create a temp table containing only records from the past 30 days (one ones we want to keep). There are only 65,000 of those.

    2) Truncate the original table.

    3) Copy the contents of the temp table back to the original table.

    This would be much faster. Any gotchas with this plan?

    If you can do an index seek to acquire those 65K rows then this should be a slam-dunk, even if the logging process is live.

    A) Create proper holding permanent table (which will just be discarded after completion).

    B) INSERT the proper rows into said table, and add a WITH (TABLOCKX) hint on the SELECT that grabs them. This will block all INSERTS/DML that follow. This MUST be done inside an EXPLICIT TRANSACTION. It should be VERY quick if you are doing index seek plan to get the 65K rows out of 6.7M.

    C) TRUNCATE the logging table.

    D) INSERT the held rows back into the permanent table.

    E) COMMIT TRANSACTION. At this point any blocked INSERTs/DML queued up will complete. Assuming this is all completed in less than your application(s) timeout no data will be lost.

    F) Drop holding table.

    If you can't guarantee the timing of this then you may need to disable your logging stuff temporarily or redirect it to another data store.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I agree, definitely save, truncate and (re)load.

    Btw, you should very likely change the clustered index on the table to insert datetime if it's currently based on an identity column. If you prefer, you can add the identity column after the datetime to guarantee a unique key.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply