Slow DELETEs on a table

  • OK - here goes...

    I've been messing about with this archival process for a couple of days now. Ideally, due to the number of rows to be deleted, it would be preferable to copy the rows I need from one table to a temporary one, truncate the contents of the table and then reload back what I need. It has to be online.

    So... I have a table with 10,000,000+ rows in it (not a huge one) and there is currently no activity on the table as I am messing about with it in test. I want to perform a limited number of deletes via a transaction just in case I end up with a table-lock and hurt processing of other users.

    I received a query from the third-party vendor to use to clean up that required a lot of functions to compare the timestamp (stored as seconds in the table) and reformat it to be a date that I can read. I have reversed that so that the date I want to use is in seconds and now becomes a simple compare in the WHERE clause of the DELETE.

    I am using:

    - ROWCOUNT to limit the rows to delete per cycle.

    - ROWLOCK & the FAST option in the DELETE statement.

    - An index created on the column upon which the searching is performed (no collision with any other key, foreign key or index).

    Yet still, after optimising all of this, I can still only obtain about 50,000 deletes in 13 mins. Yesterday while using rubbish code and an index not suited to the DELETE statement, I was getting rid of 1,000,000 rows every 25 minutes (and I thought this was rubbish performance as well).

    Using SQL2000 (b2187) on W2003+sp1.

    Any advice would be appreciated (I hate to think that this would be a problem that has an incredibly simply solution....)

    BTW: Here is my delete statement:

    DELETE FROM {tablename} WITH (ROWLOCK)

    WHERE {char field} <= @dtBackThen

    OPTION (FAST 2000)

    I just realised this as I wrote it that it is comparing a BIGINT with a car field.... that might do it....

    A lack of planning on your part does not constitute an emergency on mine.

  • How many rows should be left after all?

    _____________
    Code for TallyGenerator

  • I knew I would forget something pertinent to this.... 🙂

    Table is going from roughly 11,000,000 rows down to about 2,500,000.

    I'm not comparing char-with-char in the WHERE clause (which I picked up when writing the post) and that has yielded an approximate 6x increase in performance. Deleted 280,000 rows in 9.5 mins.

    A lack of planning on your part does not constitute an emergency on mine.

  • Then it would be faster to create a new table, populate it with rows you want to leave, drop FK's, drop old table, rename new table to "old table name" and recreate FK's.

    You can make EM to generate all scripts for you by changing datatype for any column in the table.

    Copy script from "Save change script" to QA, eliminate the change to datatype and apply WHERE clause to the copying statement.

    Just in case I would not drop old table but rename it into "MyTable_2drop".

    I would drop it a little bit later. Just in case... 😉

    _____________
    Code for TallyGenerator

  • Do you really need the ROWLOCK? try removing it. 8.5M:11M means that a table lock would be appropriate for the operation. That would be a lot faster.


    Ronald San Juan | SQL DBA
    ID 710124 ~ Code "Northwind"

  • Yes, it would be pertinent to unload everything and reload it but the application itself has been doing a yo-yo during the last week and the bosses won't allow any more downtime.

    The app needs to stay up and thus the solution needs to be enacted while online.

    I've pretty-much sorted it out now. Thanks to all who responded.

    A lack of planning on your part does not constitute an emergency on mine.

  • Good story about "copy-drop-rename" approach is that it does not require any stoppage during copy process.

    After it's finished you may revisit the data and find what's changed since you've started the copy.

    Update new table with fresh data and retry the check again.

    When you have found no rows were changed since last refresh you just start transaction to lock the table from update, do "drop-rename" - and here you are.

    "Drop-rename-recreate FK" will not take more than 1 second, I don't think anyone will have any concern about such stoppage time.

    _____________
    Code for TallyGenerator

  • I like Sergiy's copy/drop/rename solution if it works. sometimes it's much quicker to work with inserts than deletes.

  • I will second the copy/drop/rename solution.

    Or, in my case, drop/rename/create as I was doing a SELECT INTO of 750K

    rows every few minutes from an intermediate table. It was quicker to drop

    the real table, rename the intermediate, and then recreate the intermediate.

  • Steve Jones - Editor (11/4/2007)


    I like Sergiy's copy/drop/rename solution if it works.

    I did not invent it.

    😉

    It's proven working by years of thousands developers experience working with EM:

    this approach is used every time when you change table schema in EM (except adding columns at the end, then it just uses ALTER TABLE).

    _____________
    Code for TallyGenerator

  • I quite like the idea of the create/rename solution as it makes sense do do it that way and the outage time is much less. Apart from the issue of the bosses refusing to allow any more downtime on the application, the application is from a third-party vendor that I can sincerely trust to have written the application in the worst possible way.

    The app is one of those that was written by someone creating screens and then a database was mottled on behind the scenes with tables and columns added as they went along. This is made worse by the fact that the company has no data dictionary for the app and there is virtually no consistency throughout the collection of data. Timestamps in one table may be a DATETIME field. It may be stored as a char in another and as seconds after 1920 (or so) as a char field in another. There is NO consistency between data names between tables and you don't want to even think about some of the queries that exist in the code.

    We get about 3-5 deadlock notifications on the database for each and every hour during the business day and managers have been told by the vendor that the issue is caused by SQL2K and would be fixed by going to 2K5. I've already countered their claim with fact and I'm glad that I've heard no more about this but I still bring up the case of "rubbish code" whenever those managers are around.

    With an app that is as dodgy as this one is, I really don't want to risk dropping that table out even for a second in case it causes the whole bloody lot to go pear shaped.

    I have been playing with a heap of ideas and I think the best way may be to do a complete unload of the table into a copy of the same; truncate the data; and reload the data I want to keep back into it.

    All I need to do now is work out how to restrict access to this one table in between the unloading of the data and the truncation so that records aren't lost.

    Brain is not working well at the moment... :/

    A lack of planning on your part does not constitute an emergency on mine.

  • Matt, there is no any risk.

    Just do it proper way.

    To make sure the way is really proper copy proven solution.

    Open this table in "design table" window in EM.

    Change datatype for some column in the middle.

    DON'T SAVE the table.

    Click on "Save change script".

    Open saved script in QA.

    Find the new datatype you set up for the column and change it back.

    Replace "GO" with error control.

    I would suggest posting the result script here for verification.

    After everything is clear press F5 and pray. 😀

    _____________
    Code for TallyGenerator

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply