Hidden Tricks To SQL Server Table Cleanup

  • Edward.Polley 76944 (3/6/2013)


    Agreed the count method consumes resources so initially against huge tables I hard code the count. Say the table has 50 million rows to remove, no point using count just SET @count = 1000000. However this is production so when we get to a few million rows to remove the table is small enough where the count penalty is small. Thanks for the comment.

    But it looks like the count(*) controls when to stop deleting in your method. How can you process w/o the count, since the code won't know when to stop deleting?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Please, if you're on SQL2005, watch out that you have sufficient TEMPDB space to accommodate all those deleted rows.

    When a row is deleted, it is stored in TEMPDB as, forgive the imprecise term, a "Pending Delete". Something akin to the C# Garbage Collector runs in the background, processing through TEMPDB records and "finalizing" the deletes.

    If the rate of deletion exceeds the capacity of the Garbage Collector to clean up and free the TEMPDB space, you will grow TEMPDB.

    If TEMPDB exceeds the available disk capacity - YOU DIE!

    We used Itzik's small-batch with WAITFOR to annihilate over 500M rows in 5000 row batches. Ignorant of the ramifications above, Production DIED!

    Thankfully, restarting the SQL Service got us going again. I can honestly say I've NO IDEA what the final effect of the Garbage Collector not finalizing each row was, but we've been running OK for the last 6 months, while still deleting, albeit far fewer on a daily basis, from the same table! Perhaps an Expert would like to comment...?

  • I think it would be best to recreate the table (create a similar structure witj tmp_xxx prefixed to the actual name of the table) and then insert only the rows needed on the new table. Then once the new table has all the needed rows, recreate all indexes and fks from the new table, rename the table, then drop the old one. It's slower to do a batch of deletes when there's only a small set of data needed to remain on the table...

  • Thanks for comment. I like the CTE coding but this seems to not include the small batch concept, without that end up with lock contention.

  • Agree with saidie_vs, when you have more data to delete than to keep, the best way is to create a new table, load the data you want to keep to the new table and then flip the table names and finally re-create the FK's, Indexes.

    Cheers! 🙂

    Amol Naik

  • For those DBAs who inherit a 7*24 application and we can't change much in the database without business impact - this is a low impact method to cleanup tables. Wish I could do partitioning.

  • How would this affect replication? Would replication flow smoother than doing a delete * from the large row table?

  • What level of lock contention do you see with this?

  • Sometimes I use this trick ....

    Create tempTable ( the same as bigTable)

    then

    sp_rename bigtable, secondTableName

    sp_rename tempTable, bigTable

    you can do it in transaction ...

    after that you have bigTable empty, and ready for new inserts, and secondTableName with all

    data from old bigTable and you can do it with that what you want without fear of locking main tables ...

    Of course this is example only for specific situations.

  • It doesn't need to as the delete uses the view and the view only returns 2000 rows. Since the @cnt limits the total number of rows to those with a date older than 6 days that's all that can ever be deleted.

  • SAinCA (3/6/2013)


    Please, if you're on SQL2005, watch out that you have sufficient TEMPDB space to accommodate all those deleted rows.

    When a row is deleted, it is stored in TEMPDB as, forgive the imprecise term, a "Pending Delete". Something akin to the C# Garbage Collector runs in the background, processing through TEMPDB records and "finalizing" the deletes.

    If the rate of deletion exceeds the capacity of the Garbage Collector to clean up and free the TEMPDB space, you will grow TEMPDB.

    If TEMPDB exceeds the available disk capacity - YOU DIE!

    We used Itzik's small-batch with WAITFOR to annihilate over 500M rows in 5000 row batches. Ignorant of the ramifications above, Production DIED!

    Thankfully, restarting the SQL Service got us going again. I can honestly say I've NO IDEA what the final effect of the Garbage Collector not finalizing each row was, but we've been running OK for the last 6 months, while still deleting, albeit far fewer on a daily basis, from the same table! Perhaps an Expert would like to comment...?

    SQL only has to save the row in tempdb under certain, specific scenarios. That is not a normal process.

    Deletes beyond a certain size (64K I think) are processed separately later, as the relevant pages are "touched", but that doesn't involve tempdb.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • My thoughts exactly.

    SET ROWCOUNT

  • @scottpletcher

    Not sure if what you are saying is that "it didn't happen the way I said it did..."

    READ_COMMITTED_SNAPSHOT isolation is set ON for the database in question.

    In order to determine that the problem was, indeed, due to the "Version Store" in TEMPDB, I used this query:

    SELECT SUM(version_store_reserved_page_count) AS [version store pages used],

    (SUM(version_store_reserved_page_count)*1.0/128) AS [version store space in MB]

    FROM sys.dm_db_file_space_usage;

    (Ref:

    The 240GB of free space usually present on the TEMPDB drive were consumed by the version store.

    Whether or not "normal" comes into it, simply running the 10,000 row delete loop (sorry, mis-remembered the count), is what caused the problem. Simply wanted to caution others of this potentially catastrophic situation.

    Now we're on SQL2012 on new hardware with far more disk and running daily deletes, we have no issues 🙂

  • SAinCA (3/6/2013)


    @ScottPletcher

    Not sure if what you are saying is that "it didn't happen the way I said it did..."

    READ_COMMITTED_SNAPSHOT isolation is set ON for the database in question.

    In order to determine that the problem was, indeed, due to the "Version Store" in TEMPDB, I used this query:

    SELECT SUM(version_store_reserved_page_count) AS [version store pages used],

    (SUM(version_store_reserved_page_count)*1.0/128) AS [version store space in MB]

    FROM sys.dm_db_file_space_usage;

    (Ref:

    The 240GB of free space usually present on the TEMPDB drive were consumed by the version store.

    Whether or not "normal" comes into it, simply running the 10,000 row delete loop (sorry, mis-remembered the count), is what caused the problem. Simply wanted to caution others of this potentially catastrophic situation.

    Now we're on SQL2012 on new hardware with far more disk and running daily deletes, we have no issues 🙂

    I'm not saying it didn't happen. I'm saying SQL's doing it because of the snapshot isolation. Without snapshot isolation or some other specific requirement, SQL does not store deleted rows in tempdb; iow, that's not SQL's "normal" delete process, it only does that if it has to.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • @scottpletcher

    Thanks for clarifying.

    My "normal" is to always use READ_SNAPSHOT_ISOLATION for every SQL2005 or later User DB so I can nuke WITH (NOLOCK) from legacy code.

    Guess I normally expect SQL to delete using the version store. Not quite an "it depends" but a different "normal".

    Thanks again for your input.

    Cheers!

Viewing 15 posts - 31 through 45 (of 95 total)

You must be logged in to reply to this topic. Login to reply