Batch delete tuning

  • I recently took a position as a DBA and inherited some sql servers. One of the jobs that runs every night is a batch delete (see code below) that deletes about 120,000 records total from a table. The reason we are doing batch deletes is because this table is being constantly written to by a critical piece of software that times out if we delete everything at once. I can make no changes to that software. The table we are deleting from has no primary key and an index on the date field. Possibly unrelated but we had a power outage 2 weeks ago and a reorganize index task did not get run. A few days later the run time went from 5 to 10 hours. Regardless, even 5 hours seems long with so few deletes happening. I was wondering if adding a autoincrement primary key and selecting and deleting off of that would increase the performance or if I should look at something else. Thanks and sorry for the long post but I wanted to make sure all the information is out there.

    Nathan

    DECLARE @p1 datetime

    select @p1 = dateadd([day], datediff([day], '19000101', getdate()) - 1, '19000101');

    declare @batchsize int

    select @batchsize = 10000

    while @batchsize <> 0

    begin

    delete top (@batchsize) from

    [SomeOtherSqlServer].mytable --table is not on the sql server running the job

    where dateandtime < @p1

    set @batchsize = @@rowcount

    end

  • If I were you, I'd consider putting an index onto that field. If there's no clustered index at all, basicaly you can pick almost any column which would work for you and make it a clustered index - e.g. the dateandtime field. If there's a clustered index and if there's a correlation between that and the dateandtime column, you can use it to determine the minumum and maximum values of clustered index keys to delete. Problem with deleting based on a nonclustered index that it will rpoduce too many reads and it's very likely that SQL Server will go and use full table scan instead (you can check if this is the case by running the script manually and asking for an execution plan or just asking for an execution plan).

    Question back: why do you run it from a different server?

  • The reason it is run from a different server is because before we delete all this data, we pull it over to an archive database. Once the archive is complete it sets off the bulk delete job to delete everything we just pulled over. I'll give your idea a try and post back how it did tommorow.

  • So I added a clustered index to the datetime field and that brought the runtime down from 10 hours to 5 hours. This is acceptable since it now isn't running into buisness hours. As a learning exercise I am more than willing to try some other suggestions to bring it down further. 5 hours still seems like far to long to delete only 122,000 records. Thanks for the help.

  • If you're on a SAN I would check to see if you are sharing disk activity with some other function.

  • Maybe you could add a procedure on that remote server that would execute the loop locally. Also, how many rows are there in the table? Linchi Shea wrote about a security issue that requires login used by linked server to have certain level of priviledges to use statistics of a table. It is possible that you are doing table scans - you may check this.

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • Thanks for the tip. Turns out the account running the delete doesn't have the appropriate permissions to access statistics. I've made that change and I will report back on how that goes tommorow. As for the number of rows, like I said above we are deleting around 122,000 rows in batches of 10,000.

  • Your're welcome 🙂

    But how many rows do exist in the table? Looking at the example you posted I would think you delete half of the table each day.

    I ran a few tests and I think that the loop could be implemented in a different way. Try this:

    set rowcount @batchsize

    while 1 = 1

    begin

    delete [SomeOtherSqlServer].mytable where dateandtime < @p1

    if @@rowcount = 0

    break;

    end

    set rowcount 0

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • The setting of the SET ROWCOUNT option is ignored for INSERT, UPDATE, and DELETE statements against remote tables and local and remote partitioned views. If I moved the stored procedure over the server that contains the table and then call the procedure remotly would this work?

  • I think it is definitely worth a try.

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • No go last night, I forgot to enable RPC on the remoter server. I'm going to enable it now and I will post the results tommorow. BTW thanks for showing me the neat trick with setting the rowcount. I had never seen it before but I can definatly see the use for it.

  • Depending on what kind of options and setup you have for this table, it really sounds to me like this is a good opportunity to put the sliding window partition method into play, that Kimberly Tripp talks about here:

    http://msdn.microsoft.com/en-us/library/ms345146%28SQL.90%29.aspx#sql2k5parti_topic24

    where your key is somehow related to the date that you are deleting based on. If you set it up correctly, when the job runs it would just be getting rid of a partition, and creating a new one (which should run in seconds).

    Obviously if you do not have the ability to set up partitioning, this is a moot point, but it seems like it might be worth looking into for your scenario, if time is becoming a problem.

  • I agree, sliding window might be a good solution, unless there are foreign key constraints on other tables referencing this one. Nathan, do you have any triggers on this table? Also - how many indexes do you have on this table? Are there any foreign keys with ON DELETE CASCADE defined on the table?

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • There are no foreign key constraints or triggers. The table has one clustered index on the date field.

  • Piotr, your script worked beautifully and dropped the runtime down from 4.5 hours to 4.5 minutes. Thank you very much. I took a look at how a sliding window with partitions work and I agree that it would probably have the quickest runtime of all. The downside to it is since it is quite complicated to setup I won't be giving it a try when my current runtime is now better than I had ever hoped for. Thank you everyone for your help, I really appreciate it.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply