Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««1234»»»

Deleting large number of rows from a table & a heap in a VLDB Expand / Collapse
Author
Message
Posted Thursday, March 24, 2011 8:51 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 1:11 PM
Points: 4,368, Visits: 6,208
alen teplitsky (3/24/2011)
is it better to use a non-clustered index in a large delete? i thought it was the opposite since you would be deleting data that is logically together in the table

years ago i saw an issue where large amounts of data was inserted, deleted and modified on a non-clustered index and it cause a lot of blocking when the app changed to use multiple threads. we changed the clustered index column and the problems went away since the data that was changed was on the same pages and extents


changing the clustered index on a huge table is often simply not possible due to tlog constraints and/or time. If your existing CI gets you a seek to deleted records that is great for you though!

also, my comments are directed at the (much more common IMHO) scenario where you have a maintenance window to do the massive delete without concurrent-access issues. That brings a whole 'nother level of headache since you need to leave all strutures in place and pay a huge overhead to log all the existing index deletes and you also have to use much smaller batch sizes to avoid long-duration locks causing unwanted blocking.


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1083368
Posted Thursday, March 24, 2011 8:56 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, March 3, 2014 10:16 AM
Points: 13, Visits: 216
The explicit BEGIN TRAN, COMMIT TRAN in the batch delete is unnecessary, SQL will by default commit each individual statement, unless you put a transaction around the entire process. Totally agree about using a clustered key for batching (or a non-clustered index on a heap).

Here's my simpler version of the batch delete (for both SQL 2005 and 2000):

--SQL 2005+
DECLARE @i INT
SET @i = 1

WHILE @i > 0
BEGIN
DELETE TOP (10000) FROM dbo.SuperBigTable
WHERE RecordID < 12345
SELECT @i = @@ROWCOUNT
END

--SQL 2000
DECLARE @i INT
SET @i = 1
SET ROWCOUNT 10000

WHILE @i > 0
BEGIN
DELETE * FROM dbo.SuperBigTable
WHERE RecordID < 12345
SELECT @i = @@ROWCOUNT
END
SET ROWCOUNT 0


Had to use this again this morning, a user tried to delete 88m rows out of a 90m row table, all in one statement. I told him not to do that :)
Post #1083374
Posted Thursday, March 24, 2011 9:58 AM


SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Tuesday, September 2, 2014 10:53 AM
Points: 81, Visits: 370
Well written for what it covered, but my initial expectation before reading the article was that you would touch more on techniques for handling transaction log growth and also handling child records.

I am currently working on a stored procedure that does many, many deletes and I had to set up logic to delete in chunks of 250,000 rows.

Otherwise, way to go!
Mike Byrd


Mike Byrd
Post #1083432
Posted Thursday, March 24, 2011 10:18 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 8:18 AM
Points: 39, Visits: 184
We once had a similar problem, where we had to delete primary and child data for certain records in a batch process. The solution worked rather well -- mark the primary table with a ToBeDeleted flag, then set up a batch job. The batch job would run once per minute, and it was designed to delete and commit the TOP 10000 rows from the child tables joined to the primary table rows where the ToBeDeleted flag was set. After the batch job had run for a few iterations, the ROWCOUNT would equal zero, which was the condition to delete the primary rows.

Purging thus became a background process, with the load spread across many CPU cycles, not unlike garbage collection in a managed programming environment.
Post #1083445
Posted Thursday, March 24, 2011 12:34 PM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Yesterday @ 9:28 PM
Points: 17,729, Visits: 15,597
Thanks for the article.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server


SQL RNNR

Posting Performance Based Questions - Gail Shaw
Posting Data Etiquette - Jeff Moden
Hidden RBAR - Jeff Moden
VLFs and the Tran Log - Kimberly Tripp
Post #1083528
Posted Thursday, March 24, 2011 3:51 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, May 8, 2014 4:43 PM
Points: 57, Visits: 249
Great article Nakul.

Agree with all the comments about using smallish batch sizes to keep row level locks from escalating! Also, if the delete(s) involve hash matching, keeping to a small batches minimising the chances of a collision.

When multiple tables must be deleted from to preserve transaction integrety, I've had some success with cascading deletes but have also had some horrors when the database design isn't sufficently normalised to make it work well.

One point it would be worth adding to your article is how the purging gets implemented. While it can be run in dead time (typically overnight) not every database has this option nowadays. And if purging hasn't been done for ages then disk resources may just not be suffcient!

What we've done with two of our large DB is to use the Resource Governor to limit the impact of the purge job and we just have it running all the time. It chuggs away in the background when system resources are available and grinds to a halt when they aren't. If you don't have SQL2008 and don't have the Resource Governor then about the only option(?) you have is to keep the batch size small, the queries well optimised and insert waits of 10 seconds or more to minimise the impact.

Running purging jobs 24/7 has another benefit in that log files tend to stay roughly the same size whereas big overnight/weekly/monthly purging often pushes the log sizes way beyond normal growth sizes.
Post #1083646
Posted Thursday, March 24, 2011 3:53 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Yesterday @ 7:11 PM
Points: 27, Visits: 206
table locks are the killer here...
Lock escallation happens @5000 rows, keep batch sizes below this

select 1
while @@rowcount > 0
delete top 4999 from ...
(or something similar)

Also you're using SQL2008+. read up on "Filtered Indexes"

Create a filtered index on your main table that contains your PK, and filters on your delete criteria. This index will be used to determine your deletion candidates, and the overhead should be minimal as the delete candidates should only be a small subset of the bulk of your data.

You can even monitor the size/rowcount of the Index to trigger when to start the purge - to keep the performance hit to a minimum. (or delay the start to an off-peak time) rather than using a scheduled date/time.
Post #1083647
Posted Thursday, March 24, 2011 4:01 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, May 8, 2014 4:43 PM
Points: 57, Visits: 249
rob.lobbe-964963 (3/24/2011)
Also you're using SQL2008+. read up on "Filtered Indexes"
Create a filtered index on your main table that contains your PK, and filters on your delete criteria. This index will be used to determine your deletion candidates, and the overhead should be minimal as the delete candidates should only be a small subset of the bulk of your data.


Good point! Sometimes the filtered index is a better approach than using a table, temporary or otherwise, to store PK values for transaction that are to be deleted.

BTW ... How many DBA's get "completed" applications from development teams that have absolutely no databases purging logic at all!?! I have no idea how they get away with this.
Post #1083654
Posted Thursday, March 24, 2011 4:06 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, December 16, 2013 10:42 AM
Points: 96, Visits: 434
belgarion (3/24/2011)
Great article Nakul.

Agree with all the comments about using smallish batch sizes to keep row level locks from escalating! Also, if the delete(s) involve hash matching, keeping to a small batches minimising the chances of a collision.

When multiple tables must be deleted from to preserve transaction integrety, I've had some success with cascading deletes but have also had some horrors when the database design isn't sufficently normalised to make it work well.

One point it would be worth adding to your article is how the purging gets implemented. While it can be run in dead time (typically overnight) not every database has this option nowadays. And if purging hasn't been done for ages then disk resources may just not be suffcient!

What we've done with two of our large DB is to use the Resource Governor to limit the impact of the purge job and we just have it running all the time. It chuggs away in the background when system resources are available and grinds to a halt when they aren't. If you don't have SQL2008 and don't have the Resource Governor then about the only option(?) you have is to keep the batch size small, the queries well optimised and insert waits of 10 seconds or more to minimise the impact.

Running purging jobs 24/7 has another benefit in that log files tend to stay roughly the same size whereas big overnight/weekly/monthly purging often pushes the log sizes way beyond normal growth sizes.


Wow now this is a good idea. Thanks for sharing!


-------------------------------------------------------------------------------------------------
My SQL Server Blog
Post #1083659
Posted Friday, March 25, 2011 8:04 AM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, September 23, 2014 8:38 PM
Points: 27, Visits: 473
Instead of chasing up purge with a frequent large delete, partition table at design time is a good option. The criteria you use in delete (whereclause) can be used for partition table key columns.
See my SQLSaturday presentation.


Jason
http://dbace.us
Post #1084052
« Prev Topic | Next Topic »

Add to briefcase ««1234»»»

Permissions Expand / Collapse