Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

Deleting Large Number of Records Expand / Collapse
Author
Message
Posted Monday, September 14, 2009 11:16 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 5:09 PM
Points: 23,089, Visits: 31,635
Comments posted to this topic are about the item Deleting Large Number of Records


Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #787895
Posted Tuesday, September 15, 2009 1:54 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Monday, July 22, 2013 11:59 AM
Points: 509, Visits: 719
Nice one Lynn :)

I've found that for deleting a sub-section of data from a much larger table (for example - can we delete history from transactions table that is over x years old) it can help to create a filtered index under SS2008 for that... You can create the index on-line, and it significantly reduces the seek time if you're deleting a sub-set.

Of course, if the column is already indexed - no probs!


Atlantis Interactive - SQL Server Tools
My blog
Why I wrote a sql query analyzer clone
Post #787963
Posted Tuesday, September 15, 2009 5:28 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, July 25, 2014 6:37 AM
Points: 1,385, Visits: 1,242
Hi Lynn,

Good article concept, this is a problem that I've had to deal with many times, and I suspect most DBAs do at one point or another.

A couple concerns though:

1) Would those transaction log backup files really be manageable? If you chose to make the batch size smaller (say 1,000 instead of 10,000) to minimize duration of locks (esp. if the deletion operations are expensive, for example containing substantial BLOB columns), then you rapidly end up with 2,000 transaction log files for your 2,000,000 records... Imagine having to recover those! I guess it's a minor point, but I suspect in many (most?) cases the transaction log backup logic should be disconnected from the deletion loop itself.

2) Another point is the deletion mechanism - in SQL Server 2000 I have found the SET ROWCOUNT method to be quite painful on large tables. I've not looked deeply into it, but in the environments I've encountered it's MUCH more efficient to identify an indexed numeric key (not necessarily unique, as long as it's smoothly distributed), ideally with a clustered index, and loop through that instead. I imagine that there's a constant cost to evaluating the WHERE clause, even when the ROWCOUNT is only going to act on a subset of the matching records.

Am I alone in this? Has anyone else had to come up with different ways due to performance issues with the SET ROWCOUNT method?

Tao

---

Edited: fixed mistake: PK -> Clustered index


http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
Post #788069
Posted Tuesday, September 15, 2009 5:43 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 5:09 PM
Points: 23,089, Visits: 31,635
Matt Whitfield (9/15/2009)
Nice one Lynn :)

I've found that for deleting a sub-section of data from a much larger table (for example - can we delete history from transactions table that is over x years old) it can help to create a filtered index under SS2008 for that... You can create the index on-line, and it significantly reduces the seek time if you're deleting a sub-set.

Of course, if the column is already indexed - no probs!


Thank you for the info on SQL Server 2008. At this time I unfortunately haven't had the oportunity to work with 2008 so I'll have to look into that when I do and see what changes that may result in with this method of deleting records.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #788072
Posted Tuesday, September 15, 2009 6:02 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 5:09 PM
Points: 23,089, Visits: 31,635
Tao Klerks (9/15/2009)
Hi Lynn,

Good article concept, this is a problem that I've had to deal with many times, and I suspect most DBAs do at one point or another.

A couple concerns though:


Okay, let's take a look.

1) Would those transaction log backup files really be manageable? If you chose to make the batch size smaller (say 1,000 instead of 10,000) to minimize duration of locks (esp. if the deletion operations are expensive, for example containing substantial BLOB columns), then you rapidly end up with 2,000 transaction log files for your 2,000,000 records... Imagine having to recover those! I guess it's a minor point, but I suspect in many (most?) cases the transaction log backup logic should be disconnected from the deletion loop itself.


There will always be a pain point some where. Part of the issue I was trying to deal with here is controlling the size of the transaction log. This is one of the areas we see frequently on the forums where the OP is concerned with the transaction log growing excessively during the deletion process, taking much of the available disk space regardless of recovery model in use.

When the database is using the SIMPLE recovery model, then all that is needed is to batch the deletes. By doing this, as the transaction log is checkpointed the space will be reused in the transaction log keeping the file under control.

Unfortunately, if the database is using either BULK-LOGGED or FULL recovery model, you actually need to run periodic transaction log backups during the delete process. This is what my code allows one to do. You have to decide how big the batch should be for each delete, how large the transaction log should be allowed to grow (if necessary), how many transaction log files are you going to create. It is possible to further modify this code fairly easily to also incorprate periodic differential backups as well. If you want a differential backup after every 100 transaction log backups, it wouldn't be hard to do that with this code.


2) Another point is the deletion mechanism - in SQL Server 2000 I have found the SET ROWCOUNT method to be quite painful on large tables. I've not looked deeply into it, but in the environments I've encountered it's MUCH more efficient to identify an indexed numeric key (not necessarily unique, as long as it's smoothly distributed), ideally with a clustered index, and loop through that instead. I imagine that there's a constant cost to evaluating the WHERE clause, even when the ROWCOUNT is only going to act on a subset of the matching records.


I can't really address this particular issue regarding SET ROWCOUNT in SQL Server 2000 has I haven't really had to do this myself. I'm just trying to provide people with a method of accomplishing deleting a large number of records while trying to maintain the size the transaction log and keep the table relatively open for normal processing by hopefully keeping SQL Server 2000 from putting and holding a table lock. As I have always said on these forums, when you are using anything suggested on any forum be sure to test, test, and test some more.

This is just one option I am offering, it may not always be the best. It may require some tweaks and modifications depending on the environment it is used in. If there is a suitable index as you suggest, it would be easy to modify this code to force it to use that index.




Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #788076
Posted Tuesday, September 15, 2009 7:47 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 11:35 AM
Points: 33,102, Visits: 15,213
I'm not sure I agree that SET ROWCOUNT is the issue in SQL 2000. I've used a similar method years ago in 2000 to delete large numbers of records and it seemed to work fine for me. Granted, large is relative and what is large today might be substantially larger than five years ago, but I'd be interested to know specifically if it's changed the execution plan or where the issue was.






Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #788147
Posted Tuesday, September 15, 2009 7:51 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, July 29, 2010 10:43 AM
Points: 9, Visits: 82
Thanks, Lynn! Here's a possible minor tweak that I think makes one less trip through the loop (unless the total number of rows we're deleting is an exact multiple of the batchsize):

Instead of:
while @batchsize <> 0

If we say:
declare @orgBatchSize bigint
set @orgBatchSize = @batchsize
while @batchsize = @orgBatchSize

This exits the loop as soon as the number of deleted rows is less than the batchsize (which will only happen when there's nothing left to delete). This could be a time-saver if the Where clause on the Delete is slow.
Post #788152
Posted Tuesday, September 15, 2009 7:57 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, May 21, 2010 6:22 AM
Points: 7, Visits: 38
Lynn ... great article!

Any thoughts on how (or even if) this approach should be modified for bulk deletions on a SQL 2005 mirror configuration with log-shipping in place?

JC
Post #788161
Posted Tuesday, September 15, 2009 7:59 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Monday, July 22, 2013 11:59 AM
Points: 509, Visits: 719
Steve

Sorry - would have quoted but the forum was insistent on quoting another post when I clicked quote

I would put money on the fact that when you used SET ROWCOUNT one of the two following possibilities was true:

1) There was no where clause on the statement
2) The where clause was easily satisfied by an index

...


Atlantis Interactive - SQL Server Tools
My blog
Why I wrote a sql query analyzer clone
Post #788163
Posted Tuesday, September 15, 2009 8:15 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: 2 days ago @ 6:34 PM
Points: 1,426, Visits: 333
Just... keep simple!

If you need to delete a large number of rows and the scenario don't need to keep the database log.

In this case I want to delete all the rows between 2009/01/01 and 2009/10/23, about 100000000 rows.

delete top (10000) from myTable where colDate between '20090101' and '20091023'
backup log myDatabase with no_log
go 10000


What you think?


P.S. The recovery model is SIMPLE.
Post #788182
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse