Log in
::
Register
::
Not logged in
Home
Tags
Articles
Editorials
Stairways
Forums
Scripts
Videos
Blogs
QotD
Books
Ask SSC
SQL Jobs
Training
Authors
About us
Contact us
Newsletters
Write for us
Recent Posts
Recent Posts
Popular Topics
Popular Topics
Home
Search
Members
Calendar
Who's On
Home
»
Article Discussions
»
Article Discussions by Author
»
Discuss content posted by Lynn Pettis
»
Deleting Large Number of Records
72 posts, Page 1 of 8
1
2
3
4
5
»
»»
Deleting Large Number of Records
Rate Topic
Display Mode
Topic Options
Author
Message
Lynn Pettis
Lynn Pettis
Posted Monday, September 14, 2009 11:16 PM
SSC-Insane
Group: General Forum Members
Last Login: Today @ 3:10 PM
Points: 21,614,
Visits: 27,444
Comments posted to this topic are about the item
Deleting Large Number of Records
Lynn Pettis
For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here
or
when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here
and
here
Managing Transaction Logs
SQL Musings from the Desert
Fountain Valley SQL
(My Mirror Blog)
Post #787895
Matt Whitfield
Matt Whitfield
Posted Tuesday, September 15, 2009 1:54 AM
Mr or Mrs. 500
Group: General Forum Members
Last Login: Friday, February 15, 2013 7:29 AM
Points: 509,
Visits: 718
Nice one Lynn :)
I've found that for deleting a sub-section of data from a much larger table (for example - can we delete history from transactions table that is over x years old) it can help to create a filtered index under SS2008 for that... You can create the index on-line, and it significantly reduces the seek time if you're deleting a sub-set.
Of course, if the column is already indexed - no probs!
Atlantis Interactive -
SQL Server Tools
My blog
Why I wrote a
sql query analyzer
clone
Post #787963
Tao Klerks
Tao Klerks
Posted Tuesday, September 15, 2009 5:28 AM
Ten Centuries
Group: General Forum Members
Last Login: Sunday, April 14, 2013 8:55 AM
Points: 1,383,
Visits: 1,212
Hi Lynn,
Good article concept, this is a problem that I've had to deal with many times, and I suspect most DBAs do at one point or another.
A couple concerns though:
1) Would those transaction log backup files really be manageable? If you chose to make the batch size smaller (say 1,000 instead of 10,000) to minimize duration of locks (esp. if the deletion operations are expensive, for example containing substantial BLOB columns), then you rapidly end up with 2,000 transaction log files for your 2,000,000 records... Imagine having to recover those!
I guess it's a minor point, but I suspect in many (most?) cases the transaction log backup logic should be disconnected from the deletion loop itself.
2) Another point is the deletion mechanism - in SQL Server 2000 I have found the SET ROWCOUNT method to be quite painful on large tables. I've not looked deeply into it, but in the environments I've encountered it's MUCH more efficient to identify an indexed numeric key (not necessarily unique, as long as it's smoothly distributed), ideally with a clustered index, and loop through that instead. I imagine that there's a constant cost to evaluating the WHERE clause, even when the ROWCOUNT is only going to act on a subset of the matching records.
Am I alone in this? Has anyone else had to come up with different ways due to performance issues with the SET ROWCOUNT method?
Tao
---
Edited: fixed mistake: PK -> Clustered index
http://poorsql.com
for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
Post #788069
Lynn Pettis
Lynn Pettis
Posted Tuesday, September 15, 2009 5:43 AM
SSC-Insane
Group: General Forum Members
Last Login: Today @ 3:10 PM
Points: 21,614,
Visits: 27,444
Matt Whitfield (9/15/2009)
Nice one Lynn :)
I've found that for deleting a sub-section of data from a much larger table (for example - can we delete history from transactions table that is over x years old) it can help to create a filtered index under SS2008 for that... You can create the index on-line, and it significantly reduces the seek time if you're deleting a sub-set.
Of course, if the column is already indexed - no probs!
Thank you for the info on SQL Server 2008. At this time I unfortunately haven't had the oportunity to work with 2008 so I'll have to look into that when I do and see what changes that may result in with this method of deleting records.
Lynn Pettis
For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here
or
when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here
and
here
Managing Transaction Logs
SQL Musings from the Desert
Fountain Valley SQL
(My Mirror Blog)
Post #788072
Lynn Pettis
Lynn Pettis
Posted Tuesday, September 15, 2009 6:02 AM
SSC-Insane
Group: General Forum Members
Last Login: Today @ 3:10 PM
Points: 21,614,
Visits: 27,444
Tao Klerks (9/15/2009)
Hi Lynn,
Good article concept, this is a problem that I've had to deal with many times, and I suspect most DBAs do at one point or another.
A couple concerns though:
Okay, let's take a look.
1) Would those transaction log backup files really be manageable? If you chose to make the batch size smaller (say 1,000 instead of 10,000) to minimize duration of locks (esp. if the deletion operations are expensive, for example containing substantial BLOB columns), then you rapidly end up with 2,000 transaction log files for your 2,000,000 records... Imagine having to recover those!
I guess it's a minor point, but I suspect in many (most?) cases the transaction log backup logic should be disconnected from the deletion loop itself.
There will always be a pain point some where. Part of the issue I was trying to deal with here is controlling the size of the transaction log. This is one of the areas we see frequently on the forums where the OP is concerned with the transaction log growing excessively during the deletion process, taking much of the available disk space regardless of recovery model in use.
When the database is using the SIMPLE recovery model, then all that is needed is to batch the deletes. By doing this, as the transaction log is checkpointed the space will be reused in the transaction log keeping the file under control.
Unfortunately, if the database is using either BULK-LOGGED or FULL recovery model, you actually need to run periodic transaction log backups during the delete process. This is what my code allows one to do. You have to decide how big the batch should be for each delete, how large the transaction log should be allowed to grow (if necessary), how many transaction log files are you going to create. It is possible to further modify this code fairly easily to also incorprate periodic differential backups as well. If you want a differential backup after every 100 transaction log backups, it wouldn't be hard to do that with this code.
2) Another point is the deletion mechanism - in SQL Server 2000 I have found the SET ROWCOUNT method to be quite painful on large tables. I've not looked deeply into it, but in the environments I've encountered it's MUCH more efficient to identify an indexed numeric key (not necessarily unique, as long as it's smoothly distributed), ideally with a clustered index, and loop through that instead. I imagine that there's a constant cost to evaluating the WHERE clause, even when the ROWCOUNT is only going to act on a subset of the matching records.
I can't really address this particular issue regarding SET ROWCOUNT in SQL Server 2000 has I haven't really had to do this myself. I'm just trying to provide people with a method of accomplishing deleting a large number of records while trying to maintain the size the transaction log and keep the table relatively open for normal processing by hopefully keeping SQL Server 2000 from putting and holding a table lock. As I have always said on these forums, when you are using anything suggested on any forum be sure to test, test, and test some more.
This is just one option I am offering, it may not always be the best. It may require some tweaks and modifications depending on the environment it is used in. If there is a suitable index as you suggest, it would be easy to modify this code to force it to use that index.
Lynn Pettis
For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here
or
when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here
and
here
Managing Transaction Logs
SQL Musings from the Desert
Fountain Valley SQL
(My Mirror Blog)
Post #788076
Steve Jones - SSC Editor
Steve Jones - SSC Editor
Posted Tuesday, September 15, 2009 7:47 AM
SSC-Dedicated
Group: Administrators
Last Login: Today @ 1:48 PM
Points: 31,423,
Visits: 13,737
I'm not sure I agree that SET ROWCOUNT is the issue in SQL 2000. I've used a similar method years ago in 2000 to delete large numbers of records and it seemed to work fine for me. Granted, large is relative and what is large today might be substantially larger than five years ago, but I'd be interested to know specifically if it's changed the execution plan or where the issue was.
Follow me on Twitter:
@way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
Post #788147
Steve McRoberts-357330
Steve McRoberts-357330
Posted Tuesday, September 15, 2009 7:51 AM
Forum Newbie
Group: General Forum Members
Last Login: Thursday, July 29, 2010 10:43 AM
Points: 9,
Visits: 82
Thanks, Lynn! Here's a possible minor tweak that I think makes one less trip through the loop (unless the total number of rows we're deleting is an exact multiple of the batchsize):
Instead of:
while @batchsize <> 0
If we say:
declare @orgBatchSize bigint
set @orgBatchSize = @batchsize
while @batchsize = @orgBatchSize
This exits the loop as soon as the number of deleted rows is less than the batchsize (which will only happen when there's nothing left to delete). This could be a time-saver if the Where clause on the Delete is slow.
Post #788152
John Campbell-235017
John Campbell-235017
Posted Tuesday, September 15, 2009 7:57 AM
Forum Newbie
Group: General Forum Members
Last Login: Friday, May 21, 2010 6:22 AM
Points: 7,
Visits: 38
Lynn ... great article!
Any thoughts on how (or even if) this approach should be modified for bulk deletions on a SQL 2005 mirror configuration with log-shipping in place?
JC
Post #788161
Matt Whitfield
Matt Whitfield
Posted Tuesday, September 15, 2009 7:59 AM
Mr or Mrs. 500
Group: General Forum Members
Last Login: Friday, February 15, 2013 7:29 AM
Points: 509,
Visits: 718
Steve
Sorry - would have quoted but the forum was insistent on quoting another post when I clicked quote
I would put money on the fact that when you used SET ROWCOUNT one of the two following possibilities was true:
1) There was no where clause on the statement
2) The where clause was easily satisfied by an index
...
Atlantis Interactive -
SQL Server Tools
My blog
Why I wrote a
sql query analyzer
clone
Post #788163
Ric Sierra
Ric Sierra
Posted Tuesday, September 15, 2009 8:15 AM
Ten Centuries
Group: General Forum Members
Last Login: 2 days ago @ 9:11 AM
Points: 1,422,
Visits: 248
Just... keep simple!
If you need to delete a large number of rows and the scenario don't need to keep the database log.
In this case I want to delete all the rows between 2009/01/01 and 2009/10/23, about 100000000 rows.
delete top (10000) from myTable where colDate between '20090101' and '20091023'
backup log myDatabase with no_log
go 10000
What you think?
P.S. The recovery model is SIMPLE.
Post #788182
« Prev Topic
|
Next Topic »
72 posts, Page 1 of 8
1
2
3
4
5
»
»»
Permissions
You
cannot
post new topics.
You
cannot
post topic replies.
You
cannot
post new polls.
You
cannot
post replies to polls.
You
cannot
edit your own topics.
You
cannot
delete your own topics.
You
cannot
edit other topics.
You
cannot
delete other topics.
You
cannot
edit your own posts.
You
cannot
edit other posts.
You
cannot
delete your own posts.
You
cannot
delete other posts.
You
cannot
post events.
You
cannot
edit your own events.
You
cannot
edit other events.
You
cannot
delete your own events.
You
cannot
delete other events.
You
cannot
send private messages.
You
cannot
send emails.
You
may
read topics.
You
cannot
rate topics.
You
cannot
vote within polls.
You
cannot
upload attachments.
You
may
download attachments.
You
cannot
post HTML code.
You
cannot
edit HTML code.
You
cannot
post IFCode.
You
cannot
post JavaScript.
You
cannot
post EmotIcons.
You
cannot
post or upload images.
Copyright © 2002-2013 Simple Talk Publishing. All Rights Reserved.
Privacy Policy.
Terms of Use.
Report Abuse.