Deleting Large Number of Records

  • This works if you are the ONLY one using the database.

    Running in a production system you can't just go about changing the recovery model.

    as for a 'batch' delete

    select 1

    while @@rowcount > 0

    begin

    delete top () ....

    end

    if you are concerned about log growth have ANOTHER process manage it.

  • In the sample, you are using a batchsize of 10,000. This will end up causing a table lock unless the escalation gets blocked by other processes. If table locks must be prevented, batch sizes under 5000 need to be used. If a large batch size is required for decent performance, but the table must not be locked, lock escalation for the table can be disabled (2008), or another spid can be used to hold a special escalation blocking lock (Table IX). For example:

    BEGIN TRANSACTION;

    SELECT 1 FROM dbo.MyTable WITH (UPDLOCK HOLDLOCK) WHERE 1 = 0;

    Once the delete processing is done, the above transaction can be committed. I have researched the above and it does not prevent checkpointing when using the SIMPLE recovery model.

    One other thing to take into consideration regardling batch sizes is the type of delete plan generated. Depending on the number of indexes and their statistics, a row-by-row type plan or a set based plan may be generated. They should both be evaluated for page counts and durations.

  • kjs714 (9/15/2009)


    In the sample, you are using a batchsize of 10,000. This will end up causing a table lock unless the escalation gets blocked by other processes. If table locks must be prevented, batch sizes under 5000 need to be used. If a large batch size is required for decent performance, but the table must not be locked, lock escalation for the table can be disabled (2008), or another spid can be used to hold a special escalation blocking lock (Table IX). For example:

    BEGIN TRANSACTION;

    SELECT 1 FROM dbo.MyTable WITH (UPDLOCK HOLDLOCK) WHERE 1 = 0;

    Once the delete processing is done, the above transaction can be committed.

    Or you could just use WITH (ROWLOCK)...

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Matt Whitfield (9/15/2009)


    kjs714 (9/15/2009)


    In the sample, you are using a batchsize of 10,000. This will end up causing a table lock unless the escalation gets blocked by other processes. If table locks must be prevented, batch sizes under 5000 need to be used. If a large batch size is required for decent performance, but the table must not be locked, lock escalation for the table can be disabled (2008), or another spid can be used to hold a special escalation blocking lock (Table IX). For example:

    BEGIN TRANSACTION;

    SELECT 1 FROM dbo.MyTable WITH (UPDLOCK HOLDLOCK) WHERE 1 = 0;

    Once the delete processing is done, the above transaction can be committed.

    Or you could just use WITH (ROWLOCK)...

    WITH (ROWLOCK) is only nominally respected. After a certain number of locks, escalation will be attempted regardless of that hint. The lock escalation article in BOL gives some rough guidelines, but I have seen escalation happen well before the stated lock counts are reached.

  • rob.lobbe (9/15/2009)


    This works if you are the ONLY one using the database.

    Running in a production system you can't just go about changing the recovery model.

    Not so sure about it.

    MSDN (http://msdn.microsoft.com/en-us/library/ms190203(SQL.90).aspx) say nothing to imply one cannot change recovery modes in full flight. I did change production DBs recovery modes with no adverse effects... OK, had to as someone creates them in FULL mode but does not set trans log backups...

    Does anyone have reliable info to the contrary (except for general principles etc)?

  • kjs714 (9/15/2009)


    WITH (ROWLOCK) is only nominally respected. After a certain number of locks, escalation will be attempted regardless of that hint. The lock escalation article in BOL gives some rough guidelines, but I have seen escalation happen well before the stated lock counts are reached.

    I just deleted 5 million rows from a table using rowlock, and it, indeed, took the mickey - you learn something new every day! 1 exclusive table lock, 12794 exclusive page locks and 1600 exclusive extent locks...

    Do you know anywhere that has more detail (i.e. that isn't lying) about lock escalation and how lock hints are *actually* applied?

    Thanks for putting me right on that one 😀

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Matt Whitfield (9/15/2009)


    kjs714 (9/15/2009)


    WITH (ROWLOCK) is only nominally respected. After a certain number of locks, escalation will be attempted regardless of that hint. The lock escalation article in BOL gives some rough guidelines, but I have seen escalation happen well before the stated lock counts are reached.

    I just deleted 5 million rows from a table using rowlock, and it, indeed, took the mickey - you learn something new every day! 1 exclusive table lock, 12794 exclusive page locks and 1600 exclusive extent locks...

    Do you know anywhere that has more detail (i.e. that isn't lying) about lock escalation and how lock hints are *actually* applied?

    Thanks for putting me right on that one 😀

    I have not run across anything more definitive what is in BOL. I recommend that you setup a test database and run Profiler, looking for the Lock Escalation event to find out when it occurs. I've seen it at strange numbers like 1889, even though the number of rowlocks taken was well below what BOL states. In the context of the article, 10,000 will escalate to a table lock if no other processes have a blocking lock.

  • John Campbell (9/15/2009)


    Lynn ... great article!

    Any thoughts on how (or even if) this approach should be modified for bulk deletions on a SQL 2005 mirror configuration with log-shipping in place?

    JC

    Sorry for not responding to this one right away, I really had to think about it. I currently don't work in an environment that uses log shipping so this is a bit more difficult conceptually. I would think that you'd need to modify this code so that it didn't do the backups, and at the same time modify your log shipping process during this time to run more frequently, perhaps as often as every 1 to 2 minutes.

    Again, I'm sorry I can't geive you a much better answer. Perhaps someone with more experience with log shipping could help out with this particular situation.

  • rob.lobbe (9/15/2009)


    This works if you are the ONLY one using the database.

    Running in a production system you can't just go about changing the recovery model.

    as for a 'batch' delete

    select 1

    while @@rowcount > 0

    begin

    delete top () ....

    end

    if you are concerned about log growth have ANOTHER process manage it.

    I'm not sure if you are talking about my code or not, but if you are, it will work in production. Also, the code does not change the recovery model of the database. It checks the recovery model since BACKUP LOG can not be run on a database using the SIMPLE recovery model.

  • kjs714 (9/15/2009)


    In the sample, you are using a batchsize of 10,000. This will end up causing a table lock unless the escalation gets blocked by other processes. If table locks must be prevented, batch sizes under 5000 need to be used. If a large batch size is required for decent performance, but the table must not be locked, lock escalation for the table can be disabled (2008), or another spid can be used to hold a special escalation blocking lock (Table IX). For example:

    BEGIN TRANSACTION;

    SELECT 1 FROM dbo.MyTable WITH (UPDLOCK HOLDLOCK) WHERE 1 = 0;

    Once the delete processing is done, the above transaction can be committed. I have researched the above and it does not prevent checkpointing when using the SIMPLE recovery model.

    One other thing to take into consideration regardling batch sizes is the type of delete plan generated. Depending on the number of indexes and their statistics, a row-by-row type plan or a set based plan may be generated. They should both be evaluated for page counts and durations.

    This is where testing really needs to come into play in each environment. Depending on the tables, the size of the rows, etc it is possible that the locks could escalate and cause blocking of other other processes. depending on how quickly each batch can actually be deleted and the locks released while the T-LOG backup is run may not cause too much of an issue as if trying to delete several million rows in one batch. If a smaller batch size is needed for performance reasons on the table, then go with a smaller batch size.

  • I've been using pretty much the same concept for data deletion since quite some time ago already. It does the same job as Lynn's code, except done a bit differently. I'll post the code anyway

    while 1 = 1

    begin

    delete top (???) from tablexxx

    IF @@Rowcount = 0 break -- (exit from loop or do something)

    end

    The code above doesnt require rowcount to be stored and readable.

    Simon

    Simon Liew
    Microsoft Certified Master: SQL Server 2008

  • Thanks Lyn

    Very helpful topic.

  • Lynn Pettis (9/15/2009)


    It is possible that the t-log backup files could over run the available disk space as well. That is another pain point in this process, but it too can be dealt with in a variety of ways. One would be to use a third party backup solution to zip the t-log backups on the fly so that they don't take as much space. In that regard, we use HyperBac on our SQL Server systems.

    Excellent point. We use Red Gate SQL Backup 6 and it works beautifully for that.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • Lynn Pettis (9/15/2009)


    I have to make two observations about this article. As of just checking the main page, there have been 7,992 views recorded for this article. Compare this to where it was originally published, 963 views in 90 days. Plus, there has been a nice discussion along with it, where there was none on the other site.

    I am quite happy! :w00t:

    Out of curiosity, where was it originally published?

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • Just a quick comment:

    I have to deal with large deletes as well and even with setting up rowlock and limiting the size of my deletes batch, I end up with micro-locks sometimes. It made sense because they are exepensive delete operations, with blob data notably.

    Having that said, I performed a lot of swith-out partition operations and for very large deletes operations, this is the best solution for performance. Having that said, it is not always that easy but when it is, partitioning is really the best solution (if you have the right version to do it).

    Clement

Viewing 15 posts - 31 through 45 (of 72 total)

You must be logged in to reply to this topic. Login to reply