|
|
|
Hall of Fame
       
Group: General Forum Members
Last Login: Yesterday @ 2:26 PM
Points: 3,572,
Visits: 5,105
|
|
alen teplitsky (3/24/2011) is it better to use a non-clustered index in a large delete? i thought it was the opposite since you would be deleting data that is logically together in the table
years ago i saw an issue where large amounts of data was inserted, deleted and modified on a non-clustered index and it cause a lot of blocking when the app changed to use multiple threads. we changed the clustered index column and the problems went away since the data that was changed was on the same pages and extents
changing the clustered index on a huge table is often simply not possible due to tlog constraints and/or time. If your existing CI gets you a seek to deleted records that is great for you though!
also, my comments are directed at the (much more common IMHO) scenario where you have a maintenance window to do the massive delete without concurrent-access issues. That brings a whole 'nother level of headache since you need to leave all strutures in place and pay a huge overhead to log all the existing index deletes and you also have to use much smaller batch sizes to avoid long-duration locks causing unwanted blocking.
Best,
Kevin G. Boles SQL Server Consultant SQL MVP 2007-2012 TheSQLGuru at GMail
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Tuesday, May 14, 2013 7:26 AM
Points: 9,
Visits: 210
|
|
The explicit BEGIN TRAN, COMMIT TRAN in the batch delete is unnecessary, SQL will by default commit each individual statement, unless you put a transaction around the entire process. Totally agree about using a clustered key for batching (or a non-clustered index on a heap).
Here's my simpler version of the batch delete (for both SQL 2005 and 2000):
--SQL 2005+ DECLARE @i INT SET @i = 1
WHILE @i > 0 BEGIN DELETE TOP (10000) FROM dbo.SuperBigTable WHERE RecordID < 12345 SELECT @i = @@ROWCOUNT END
--SQL 2000 DECLARE @i INT SET @i = 1 SET ROWCOUNT 10000
WHILE @i > 0 BEGIN DELETE * FROM dbo.SuperBigTable WHERE RecordID < 12345 SELECT @i = @@ROWCOUNT END SET ROWCOUNT 0
Had to use this again this morning, a user tried to delete 88m rows out of a 90m row table, all in one statement. I told him not to do that :)
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Tuesday, May 14, 2013 8:02 AM
Points: 80,
Visits: 341
|
|
Well written for what it covered, but my initial expectation before reading the article was that you would touch more on techniques for handling transaction log growth and also handling child records.
I am currently working on a stored procedure that does many, many deletes and I had to set up logic to delete in chunks of 250,000 rows.
Otherwise, way to go! Mike Byrd
Mike Byrd
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Thursday, September 29, 2011 8:53 AM
Points: 39,
Visits: 176
|
|
We once had a similar problem, where we had to delete primary and child data for certain records in a batch process. The solution worked rather well -- mark the primary table with a ToBeDeleted flag, then set up a batch job. The batch job would run once per minute, and it was designed to delete and commit the TOP 10000 rows from the child tables joined to the primary table rows where the ToBeDeleted flag was set. After the batch job had run for a few iterations, the ROWCOUNT would equal zero, which was the condition to delete the primary rows.
Purging thus became a background process, with the load spread across many CPU cycles, not unlike garbage collection in a managed programming environment.
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: 2 days ago @ 1:46 PM
Points: 18,732,
Visits: 12,329
|
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Sunday, April 21, 2013 6:28 PM
Points: 53,
Visits: 198
|
|
Great article Nakul.
Agree with all the comments about using smallish batch sizes to keep row level locks from escalating! Also, if the delete(s) involve hash matching, keeping to a small batches minimising the chances of a collision.
When multiple tables must be deleted from to preserve transaction integrety, I've had some success with cascading deletes but have also had some horrors when the database design isn't sufficently normalised to make it work well.
One point it would be worth adding to your article is how the purging gets implemented. While it can be run in dead time (typically overnight) not every database has this option nowadays. And if purging hasn't been done for ages then disk resources may just not be suffcient!
What we've done with two of our large DB is to use the Resource Governor to limit the impact of the purge job and we just have it running all the time. It chuggs away in the background when system resources are available and grinds to a halt when they aren't. If you don't have SQL2008 and don't have the Resource Governor then about the only option(?) you have is to keep the batch size small, the queries well optimised and insert waits of 10 seconds or more to minimise the impact.
Running purging jobs 24/7 has another benefit in that log files tend to stay roughly the same size whereas big overnight/weekly/monthly purging often pushes the log sizes way beyond normal growth sizes.
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Monday, May 13, 2013 9:38 PM
Points: 23,
Visits: 177
|
|
table locks are the killer here... Lock escallation happens @5000 rows, keep batch sizes below this
select 1 while @@rowcount > 0 delete top 4999 from ... (or something similar)
Also you're using SQL2008+. read up on "Filtered Indexes"
Create a filtered index on your main table that contains your PK, and filters on your delete criteria. This index will be used to determine your deletion candidates, and the overhead should be minimal as the delete candidates should only be a small subset of the bulk of your data.
You can even monitor the size/rowcount of the Index to trigger when to start the purge - to keep the performance hit to a minimum. (or delay the start to an off-peak time) rather than using a scheduled date/time.
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Sunday, April 21, 2013 6:28 PM
Points: 53,
Visits: 198
|
|
rob.lobbe-964963 (3/24/2011) Also you're using SQL2008+. read up on "Filtered Indexes" Create a filtered index on your main table that contains your PK, and filters on your delete criteria. This index will be used to determine your deletion candidates, and the overhead should be minimal as the delete candidates should only be a small subset of the bulk of your data.
Good point! Sometimes the filtered index is a better approach than using a table, temporary or otherwise, to store PK values for transaction that are to be deleted.
BTW ... How many DBA's get "completed" applications from development teams that have absolutely no databases purging logic at all!?! I have no idea how they get away with this.
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Thursday, September 20, 2012 5:21 PM
Points: 84,
Visits: 391
|
|
belgarion (3/24/2011) Great article Nakul.
Agree with all the comments about using smallish batch sizes to keep row level locks from escalating! Also, if the delete(s) involve hash matching, keeping to a small batches minimising the chances of a collision.
When multiple tables must be deleted from to preserve transaction integrety, I've had some success with cascading deletes but have also had some horrors when the database design isn't sufficently normalised to make it work well.
One point it would be worth adding to your article is how the purging gets implemented. While it can be run in dead time (typically overnight) not every database has this option nowadays. And if purging hasn't been done for ages then disk resources may just not be suffcient!
What we've done with two of our large DB is to use the Resource Governor to limit the impact of the purge job and we just have it running all the time. It chuggs away in the background when system resources are available and grinds to a halt when they aren't. If you don't have SQL2008 and don't have the Resource Governor then about the only option(?) you have is to keep the batch size small, the queries well optimised and insert waits of 10 seconds or more to minimise the impact.
Running purging jobs 24/7 has another benefit in that log files tend to stay roughly the same size whereas big overnight/weekly/monthly purging often pushes the log sizes way beyond normal growth sizes.
Wow now this is a good idea. Thanks for sharing!
My SQL Server Blog
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Monday, March 18, 2013 7:54 AM
Points: 27,
Visits: 421
|
|
Instead of chasing up purge with a frequent large delete, partition table at design time is a good option. The criteria you use in delete (whereclause) can be used for partition table key columns. See my SQLSaturday presentation.
Jason http://dbace.us
|
|
|
|