Large table cleanup with minimal locks

  • jay.dunk (3/6/2012)


    @kalman --- that wont get round the contention, if you use the method shown in the article you can throttle the deletes to avoid locking for too long a time, whilst only perform one select per large batch to get the ids you want.

    I do something similar on a 24/7 db, 365 days a year with all files on a single raid 5 partition!?! I use the same approach but with a table variable and I process over 1.5 million deletes per day on a rolling basis.

    The cte will keep repeating teh select for the delete and therefore likely cause locking for the duration of the select/delete.

    Hi , I have too 24/7 db - but with good filer machine (raid 1+0 ) . I deleted with such CTE more then 4000K rows yesterday on a production machine (statistics table ) . in the CTE I use the hint WITH(NOLOCK) after the table name in the select . so I have no lock wait (dirty read ) . My DB is in FULL recovery model stat . try it . I will try to write an example tomorrow .

    Regards Kalman

  • ChrisTaylor (3/6/2012)


    Your statement is incorrect - if you have a reference that says otherwise please provide it.

    In this particular case a table variable will likely be best, because you WANT an index seek delete on the actual table(s). You also want to minimize the recompiles that will come from adding/truncating a temp table.

    Not quite sure about references but if you do a quick test of inserting 1000 records into a table variable and do a select from it, the estimated number of rows is always 1.....

    I don't think he was saying that the estimated number of rows for a table variable is always 1. I think he's saying that in this case that's a good thing. Not from a speed of delete perspective but from a concurrency perspective.

    From that same perspective, you may want to keep the number of rows limited to avoid lock escalation. The main threshold is 5,000 locks on a single index/heap but other factors, such as memory, can cause escalation at a lower level.

  • jberg-604007 (3/6/2012)


    can't index a table variable

    But you can declare a primary key which does have an effect on how the query is processed.

  • cfradenburg (3/6/2012)


    jberg-604007 (3/6/2012)


    can't index a table variable

    But you can declare a primary key which does have an effect on how the query is processed.

    In the vast majority of cases I have come across at clients a PK (or any index for that matter) on temp objects was NOT helping the query at all, but you still pay the cost of it/them. You really need a GOOD reason to waste the effort to put PK/Index on a temp object.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • cfradenburg (3/6/2012)


    ChrisTaylor (3/6/2012)


    Your statement is incorrect - if you have a reference that says otherwise please provide it.

    In this particular case a table variable will likely be best, because you WANT an index seek delete on the actual table(s). You also want to minimize the recompiles that will come from adding/truncating a temp table.

    Not quite sure about references but if you do a quick test of inserting 1000 records into a table variable and do a select from it, the estimated number of rows is always 1.....

    I don't think he was saying that the estimated number of rows for a table variable is always 1. I think he's saying that in this case that's a good thing. Not from a speed of delete perspective but from a concurrency perspective.

    From that same perspective, you may want to keep the number of rows limited to avoid lock escalation. The main threshold is 5,000 locks on a single index/heap but other factors, such as memory, can cause escalation at a lower level.

    Exactly right.

    Note that the locks taken for the DELETE in this case could be PAGE locks, in which case there could be potentially hundreds of thousands of rows available for the DELETE before the engine decides it needs to escalate to a TABLE lock. Also note if the TABLE lock isn't available due to other concurrent activity the engine will keep going and retry the lock escalation every 1250-lock increment (with some constraints, provisos, limitations, etc).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (3/6/2012)


    In the vast majority of cases I have come across at clients a PK (or any index for that matter) on temp objects was NOT helping the query at all, but you still pay the cost of it/them. You really need a GOOD reason to waste the effort to put PK/Index on a temp object.

    This was at another company so I can't look up the details anymore but I did run into a case where adding a primary key to a table variable resulted in a noticeable improvement in the query. I haven't tried it much outside that since I haven't seen anything else where it looked like it would help.

  • cfradenburg (3/6/2012)


    TheSQLGuru (3/6/2012)


    In the vast majority of cases I have come across at clients a PK (or any index for that matter) on temp objects was NOT helping the query at all, but you still pay the cost of it/them. You really need a GOOD reason to waste the effort to put PK/Index on a temp object.

    This was at another company so I can't look up the details anymore but I did run into a case where adding a primary key to a table variable resulted in a noticeable improvement in the query. I haven't tried it much outside that since I haven't seen anything else where it looked like it would help.

    Well, I didn't say "never". 🙂 Likely your query was able to do an ordered operation or 'rely' on the PK (or it's likely clustering) in some other way to produce a more efficient plan.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • jberg-604007 (3/6/2012)


    can't index a table variable

    Not quite true. You can define a primary key index while declaring the table variable. You can't add indexes to a table variable after it is declared.

  • TheSQLGuru (3/6/2012)


    Well, I didn't say "never". 🙂 Likely your query was able to do an ordered operation or 'rely' on the PK (or it's likely clustering) in some other way to produce a more efficient plan.

    Yep. I just wanted to point out that I had seen it be beneficial so anyone reading through this doesn't walk away with the impression that it's not worth trying at all. I do remember that the primary key had the variable ordered in the same order as what we were comparing it to later on.

  • Simple question re lock escalation:

    Would it help or hinder if LOCK ESCALATION were disabled for the duration of the deletes? As this is the one and only process that ever performs deletes, and the only updates on this table are on a row by row basis as data are forwarded to the central server, it seems safe to do so. (Not my architecture, so please avoid commenting on the presence of RBAR :-D)

    I have this in place on a SQL Express 2008 R2 DB for deletion of typically 72K rows per day but exceptionally millions of rows from a 9GB table. It works fine; there have been no concurrency issues that we can see as data flow from the remote PC to the server just fine, but I'd like to know if I've made some errors that may be similar to any of the comments already posted re locks, maxdop, et al.

    /* =============================================================================

    ** DELETE OLD t_site_raw RECORDS

    ** ========================================================================== */

    BEGIN TRY

    DECLARE @DeleteLoop bit = 1

    /* -----------------------------------------------------------------------------

    ** Maximize concurrency on t_site_raw (2008 Feature)

    ** -------------------------------------------------------------------------- */

    ALTER TABLE t_site_raw SET ( LOCK_ESCALATION = DISABLE )

    /* -----------------------------------------------------------------------------

    ** Loop through the deletes in 5,000 record chunks, pausing every loop.

    ** -------------------------------------------------------------------------- */

    WHILE @DeleteLoop = 1

    BEGIN

    BEGIN TRAN

    DELETE TOP (5000)

    FROM t_site_raw WITH ( ROWLOCK )

    WHERE t_stamp_utc < @DeleteDataDate

    /* -----------------------------------------------------------------------------

    ** When no rows were deleted, terminate the loop.

    ** -------------------------------------------------------------------------- */

    IF @@ROWCOUNT = 0

    SET @DeleteLoop = 0

    COMMIT TRAN

    /* -----------------------------------------------------------------------------

    ** If there may be more to delete, Wait to allow other processes to access the

    ** table.

    ** -------------------------------------------------------------------------- */

    IF @DeleteLoop = 1

    WAITFOR DELAY '00:00:02'

    END

    END TRY

    BEGIN CATCH

    -- Execute error retrieval routine.

    EXECUTE dbo.usp_GetSQLErrorInfo @SQLError OUTPUT, @ErrorNumber OUTPUT;

    RAISERROR( @SQLError, 0,1) WITH NOWAIT

    END CATCH

    ALTER TABLE t_site_raw SET (LOCK_ESCALATION = AUTO)

  • I haven't played with the lock_escalation setting but I would use it with care. It's possible for lock escalation to be triggered due to memory thresholds and if there are a lot of locks on this table that would be escalated but can't be then you may run into memory issues.

    I don't know if this setting will override escalations due to memory pressure but if it does I would be hesitant. If you know that the workload is such that, outside of this query, that lock escalation won't be seen then go for it.

  • hitrickk (3/6/2012)


    jay.dunk (3/6/2012)


    @kalman --- that wont get round the contention, if you use the method shown in the article you can throttle the deletes to avoid locking for too long a time, whilst only perform one select per large batch to get the ids you want.

    I do something similar on a 24/7 db, 365 days a year with all files on a single raid 5 partition!?! I use the same approach but with a table variable and I process over 1.5 million deletes per day on a rolling basis.

    The cte will keep repeating teh select for the delete and therefore likely cause locking for the duration of the select/delete.

    Hi , I have too 24/7 db - but with good filer machine (raid 1+0 ) . I deleted with such CTE more then 4000K rows yesterday on a production machine (statistics table ) . in the CTE I use the hint WITH(NOLOCK) after the table name in the select . so I have no lock wait (dirty read ) . My DB is in FULL recovery model stat . try it . I will try to write an example tomorrow .

    Regards Kalman

    Using nolock is quite risky and is getting used more and more these days to avoid locks, but it DOES come with issues which i only found out a few months agotake a look at this post which puts it a lot better than i do

    http://blogs.msdn.com/b/sqlcat/archive/2007/02/01/previously-committed-rows-might-be-missed-if-nolock-hint-is-used.aspx

  • cfradenburg (3/6/2012)


    I haven't played with the lock_escalation setting but I would use it with care. It's possible for lock escalation to be triggered due to memory thresholds and if there are a lot of locks on this table that would be escalated but can't be then you may run into memory issues.

    I don't know if this setting will override escalations due to memory pressure but if it does I would be hesitant. If you know that the workload is such that, outside of this query, that lock escalation won't be seen then go for it.

    @cfradenburg: Thanks for the quick response. The PC simply acquires data, record by record, from signalling interfaces (ModBus) and inserts the raw data into the table. A transmit SP then forwards each row to the central server. I went back to the transmit code and there's no update at all, so Lock Contention appears to be zero.

  • Great idea. Thkx. Does need just a little "fine tuning" to make it work as advertized on VLDBs.

    😉

  • Great idea. Thkx. Does need just a little "fine tuning" to make it work as advertized on VLDBs.

    😉

Viewing 15 posts - 16 through 30 (of 54 total)

You must be logged in to reply to this topic. Login to reply