Large table cleanup with minimal locks

  • Comments posted to this topic are about the item Large table cleanup with minimal locks

  • Is the IN Clause in the WHERE predicate faster than using the WHERE EXISTS clause? I've heard that large IN Clauses tend to be slow performers. I work mostly with DB2 on iSeries though. Just starting using SQL Server for some projects.

  • would a table variable not be better than a # table?

  • Hi

    I think There is much better solution (very simple to write and should work faster) - this by using CTE . You can select TOP X rows in the CTE and then delete it , this will be done in a loop .

    If You want I can give an example .

    Sincerely

    Kalman

  • @Kalman --- that wont get round the contention, if you use the method shown in the article you can throttle the deletes to avoid locking for too long a time, whilst only perform one select per large batch to get the ids you want.

    I do something similar on a 24/7 db, 365 days a year with all files on a single raid 5 partition!?! I use the same approach but with a table variable and I process over 1.5 million deletes per day on a rolling basis.

    The cte will keep repeating teh select for the delete and therefore likely cause locking for the duration of the select/delete.

  • A few thoughts: Why are you concerned about locking of the temp tables, they're only accessible from your own session.

    Furthermore, option maxdop 8 does allow for parallelism yes, but it is generally not recommended to span NUMA nodes, which maxdop 8 may do on dual quad servers. I do think maxdop 8 can potentially cause more performance issues than maxdop 4 in some cases, and this SHOULD be configured correectly as a server option by the dba.



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • Well, several issues with this, but the foremost issue is with using

    where ... in (select...)

    Please don't. You already have your IDs in a table and a Primary Key (hopefully clustered) on SalesID in your application table. So just join the tables!

    DELETE

    YourDB.dbo.SalesZIP

    from

    YourDB.dbo.SalesZIP orig

    inner join #InnerTemp dead on orig.SalesID = dead.SalesID

  • hi,

    if you want to use a join in this case, be careful and define a unique primary key constraint on your temp table(heap!). Joins on heaps are potential inefficient operations.

    ciao,

    jack

  • Still nailing down best fit on t-sql, but if you dont need any additional info from the 2nd table why join especially if not one to one? Surely just checking for a match will be quicker than actioning a join? hash join against merge join in query plan i would have thought?

  • jay.dunk (3/6/2012)


    I do something similar on a 24/7 db, 365 days a year with all files on a single raid 5 partition!?! I use the same approach but with a table variable and I process over 1.5 million deletes per day on a rolling basis.

    Be careful with using a table variable with a large amount of rows because a table variable is optimized for one row, by SQL Server i.e. it assumes 1 row will be returned therefore possibly producing a sub-optimal execution plan.

    I agree with the comments regarding using the inner join as opposed to WHERE...IN (SELECT...)

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • How would this apply to a heap?

    Cheers,

    Paul

  • ChrisTaylor (3/6/2012)


    jay.dunk (3/6/2012)


    I do something similar on a 24/7 db, 365 days a year with all files on a single raid 5 partition!?! I use the same approach but with a table variable and I process over 1.5 million deletes per day on a rolling basis.

    Be careful with using a table variable with a large amount of rows because a table variable is optimized for one row, by SQL Server i.e. it assumes 1 row will be returned therefore possibly producing a sub-optimal execution plan.

    I agree with the comments regarding using the inner join as opposed to WHERE...IN (SELECT...)

    Your statement is incorrect - if you have a reference that says otherwise please provide it.

    In this particular case a table variable will likely be best, because you WANT an index seek delete on the actual table(s). You also want to minimize the recompiles that will come from adding/truncating a temp table.

    Another suboptimal thing in this article is the non-clustered indexes. They serve no purpose in the queries and impose the significant overhead of building them every loop. Just have a heap table and go about the deletes.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (3/6/2012)


    Another suboptimal thing in this article is the non-clustered indexes. They serve no purpose in the queries and impose the significant overhead of building them every loop. Just have a heap table and go about the deletes.

    Or, if the indexes are helpful, cluster the temp table so you're not doubling up.

  • Your statement is incorrect - if you have a reference that says otherwise please provide it.

    In this particular case a table variable will likely be best, because you WANT an index seek delete on the actual table(s). You also want to minimize the recompiles that will come from adding/truncating a temp table.

    Not quite sure about references but if you do a quick test of inserting 1000 records into a table variable and do a select from it, the estimated number of rows is always 1.....

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • can't index a table variable

Viewing 15 posts - 1 through 15 (of 54 total)

You must be logged in to reply to this topic. Login to reply