Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Deleting large number of rows from a table & a heap in a VLDB


Deleting large number of rows from a table & a heap in a VLDB

Author
Message
TheSQLGuru
TheSQLGuru
SSCertifiable
SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)

Group: General Forum Members
Points: 5941 Visits: 8299
alen teplitsky (3/24/2011)
is it better to use a non-clustered index in a large delete? i thought it was the opposite since you would be deleting data that is logically together in the table

years ago i saw an issue where large amounts of data was inserted, deleted and modified on a non-clustered index and it cause a lot of blocking when the app changed to use multiple threads. we changed the clustered index column and the problems went away since the data that was changed was on the same pages and extents


changing the clustered index on a huge table is often simply not possible due to tlog constraints and/or time. If your existing CI gets you a seek to deleted records that is great for you though!

also, my comments are directed at the (much more common IMHO) scenario where you have a maintenance window to do the massive delete without concurrent-access issues. That brings a whole 'nother level of headache since you need to leave all strutures in place and pay a huge overhead to log all the existing index deletes and you also have to use much smaller batch sizes to avoid long-duration locks causing unwanted blocking.

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
brad.corbin
brad.corbin
Grasshopper
Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)

Group: General Forum Members
Points: 15 Visits: 216
The explicit BEGIN TRAN, COMMIT TRAN in the batch delete is unnecessary, SQL will by default commit each individual statement, unless you put a transaction around the entire process. Totally agree about using a clustered key for batching (or a non-clustered index on a heap).

Here's my simpler version of the batch delete (for both SQL 2005 and 2000):


--SQL 2005+
DECLARE @i INT
SET @i = 1

WHILE @i > 0
BEGIN
DELETE TOP (10000) FROM dbo.SuperBigTable
      WHERE RecordID < 12345
SELECT @i = @@ROWCOUNT
END

--SQL 2000
DECLARE @i INT
SET @i = 1
SET ROWCOUNT 10000

WHILE @i > 0
BEGIN
   DELETE * FROM dbo.SuperBigTable
      WHERE RecordID < 12345
SELECT @i = @@ROWCOUNT
END
SET ROWCOUNT 0


Had to use this again this morning, a user tried to delete 88m rows out of a 90m row table, all in one statement. I told him not to do that Smile
Mike Byrd
Mike Byrd
SSC Journeyman
SSC Journeyman (89 reputation)SSC Journeyman (89 reputation)SSC Journeyman (89 reputation)SSC Journeyman (89 reputation)SSC Journeyman (89 reputation)SSC Journeyman (89 reputation)SSC Journeyman (89 reputation)SSC Journeyman (89 reputation)

Group: General Forum Members
Points: 89 Visits: 388
Well written for what it covered, but my initial expectation before reading the article was that you would touch more on techniques for handling transaction log growth and also handling child records.

I am currently working on a stored procedure that does many, many deletes and I had to set up logic to delete in chunks of 250,000 rows.

Otherwise, way to go!
Mike Byrd

Mike Byrd
J Thaddeus Klopcic
J Thaddeus Klopcic
SSC Rookie
SSC Rookie (46 reputation)SSC Rookie (46 reputation)SSC Rookie (46 reputation)SSC Rookie (46 reputation)SSC Rookie (46 reputation)SSC Rookie (46 reputation)SSC Rookie (46 reputation)SSC Rookie (46 reputation)

Group: General Forum Members
Points: 46 Visits: 192
We once had a similar problem, where we had to delete primary and child data for certain records in a batch process. The solution worked rather well -- mark the primary table with a ToBeDeleted flag, then set up a batch job. The batch job would run once per minute, and it was designed to delete and commit the TOP 10000 rows from the child tables joined to the primary table rows where the ToBeDeleted flag was set. After the batch job had run for a few iterations, the ROWCOUNT would equal zero, which was the condition to delete the primary rows.

Purging thus became a background process, with the load spread across many CPU cycles, not unlike garbage collection in a managed programming environment.
SQLRNNR
SQLRNNR
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21073 Visits: 18259
Thanks for the article.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server, MVP


SQL RNNR

Posting Performance Based Questions - Gail Shaw

belgarion
belgarion
Valued Member
Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)

Group: General Forum Members
Points: 65 Visits: 249
Great article Nakul.

Agree with all the comments about using smallish batch sizes to keep row level locks from escalating! Also, if the delete(s) involve hash matching, keeping to a small batches minimising the chances of a collision.

When multiple tables must be deleted from to preserve transaction integrety, I've had some success with cascading deletes but have also had some horrors when the database design isn't sufficently normalised to make it work well.

One point it would be worth adding to your article is how the purging gets implemented. While it can be run in dead time (typically overnight) not every database has this option nowadays. And if purging hasn't been done for ages then disk resources may just not be suffcient!

What we've done with two of our large DB is to use the Resource Governor to limit the impact of the purge job and we just have it running all the time. It chuggs away in the background when system resources are available and grinds to a halt when they aren't. If you don't have SQL2008 and don't have the Resource Governor then about the only option(?) you have is to keep the batch size small, the queries well optimised and insert waits of 10 seconds or more to minimise the impact.

Running purging jobs 24/7 has another benefit in that log files tend to stay roughly the same size whereas big overnight/weekly/monthly purging often pushes the log sizes way beyond normal growth sizes.
rob.lobbe-964963
rob.lobbe-964963
SSC Rookie
SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)

Group: General Forum Members
Points: 33 Visits: 216
table locks are the killer here...
Lock escallation happens @5000 rows, keep batch sizes below this

select 1
while @@rowcount > 0
delete top 4999 from ...
(or something similar)

Also you're using SQL2008+. read up on "Filtered Indexes"

Create a filtered index on your main table that contains your PK, and filters on your delete criteria. This index will be used to determine your deletion candidates, and the overhead should be minimal as the delete candidates should only be a small subset of the bulk of your data.

You can even monitor the size/rowcount of the Index to trigger when to start the purge - to keep the performance hit to a minimum. (or delay the start to an off-peak time) rather than using a scheduled date/time.
belgarion
belgarion
Valued Member
Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)

Group: General Forum Members
Points: 65 Visits: 249
rob.lobbe-964963 (3/24/2011)
Also you're using SQL2008+. read up on "Filtered Indexes"
Create a filtered index on your main table that contains your PK, and filters on your delete criteria. This index will be used to determine your deletion candidates, and the overhead should be minimal as the delete candidates should only be a small subset of the bulk of your data.


Good point! Sometimes the filtered index is a better approach than using a table, temporary or otherwise, to store PK values for transaction that are to be deleted.

BTW ... How many DBA's get "completed" applications from development teams that have absolutely no databases purging logic at all!?! I have no idea how they get away with this.
amenjonathan
amenjonathan
SSC-Enthusiastic
SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)

Group: General Forum Members
Points: 136 Visits: 434
belgarion (3/24/2011)
Great article Nakul.

Agree with all the comments about using smallish batch sizes to keep row level locks from escalating! Also, if the delete(s) involve hash matching, keeping to a small batches minimising the chances of a collision.

When multiple tables must be deleted from to preserve transaction integrety, I've had some success with cascading deletes but have also had some horrors when the database design isn't sufficently normalised to make it work well.

One point it would be worth adding to your article is how the purging gets implemented. While it can be run in dead time (typically overnight) not every database has this option nowadays. And if purging hasn't been done for ages then disk resources may just not be suffcient!

What we've done with two of our large DB is to use the Resource Governor to limit the impact of the purge job and we just have it running all the time. It chuggs away in the background when system resources are available and grinds to a halt when they aren't. If you don't have SQL2008 and don't have the Resource Governor then about the only option(?) you have is to keep the batch size small, the queries well optimised and insert waits of 10 seconds or more to minimise the impact.

Running purging jobs 24/7 has another benefit in that log files tend to stay roughly the same size whereas big overnight/weekly/monthly purging often pushes the log sizes way beyond normal growth sizes.


Wow now this is a good idea. Thanks for sharing!

-------------------------------------------------------------------------------------------------
My SQL Server Blog
jswong05
jswong05
Valued Member
Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)

Group: General Forum Members
Points: 59 Visits: 476
Instead of chasing up purge with a frequent large delete, partition table at design time is a good option. The criteria you use in delete (whereclause) can be used for partition table key columns.
See my SQLSaturday presentation.

Jason
http://dbace.us
:-P
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search