Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Deleting large number of rows from a table & a heap in a VLDB


Deleting large number of rows from a table & a heap in a VLDB

Author
Message
Nakul Vachhrajani
Nakul Vachhrajani
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1824 Visits: 2127
Comments posted to this topic are about the item Deleting large number of rows from a table & a heap in a VLDB

Thanks & Regards,
Nakul Vachhrajani.
http://nakulvachhrajani.com
Be courteous. Drive responsibly.

Follow me on
Twitter: @sqltwins
Google Plus: +Nakul
stevemash
stevemash
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 5
I also faced these problems. In our case we had a set of tables (~5) that had 25M to 50M inserts per day. This was on Tier1 SAN and 8-way and 16-way boxes. In our case, this was a fairly typical OLTP workload and we needed the data to be purged after around 3 days. We also had the additional constraint that we had to purge concurrently because the system was 24x7. Yes there were some periods of less activity (in which we ran the purge to minimize the decrease in throughput), but we still had concurrent inserts happening.

When I saw your suggestions I noted the following road blocks to implementing them in my environment:
1) the lock escalation-- this is going to escalate to table locks in SQL 2005 (without undocumented trace flags), which you accpet as you set the table lock to TABLE -- although this would happen with the default setting, so I'm not sure what this statement is for.
2) there are a lot of deadlock possibilities when using a secondary index to join into the clustered index to do a delete. An X lock will be taken on the record in the secondary, and then an X on the clustered. Which is OPPOSITE ordering from a normal insert which will take an X lock on the clustered primary index, then an X lock when inserting into the non-clustered, secondary index. In your case, this is largely mitigated after the 5000th lock or so as the lock escalation will prevent the additional deadlock opportunities. However, for us, the table lock escalation was unacceptable as is so we didn't do this.

Here are my thoughts on this (only with regards to clustered index deletes-- I have no experience with heap deletes, and would like to hear more about why you are using heaps in the first place-- I know of only a few very narrow cases with typical hardware configurations and typical workloads where heaps are the right answer):

1- the _best_ option if it works with your constraints is to utilize horizontal purging. Data purging should be a constant time, metadata operation if you are in a high volume, high churn OLTP environment. It just makes little sense to be wasting so many cpu cycles and log writes to handle something that is not part of "real time" processing. SQL 2005 supports "sliding window" partitioning fairly well-- there are some problems to overcome with some scripting, but others before me did it well and have their scripts out there online. To make this work you have to have easy groups (dates, days of the week, etc) that you can use to make your partitions.

2- If you aren't using SQL 2005+ or Oracle or something that natively supports horizontal partitioning (or you aren't using the enterprise editions of these products, which include support for this) then you can approximate horizontal partitioning with views, but it is extremely messy. I have seen this done in a number of systems, and its always a headache for the dba and the people writing the scripts to maintain it, but it doesn't always impact developers. They program accept the schema in the view and don't care about whats under the view.

3- If you can't use partitioning then if you can design the primary key to be prefixed with the date or something that is sargable to be able to purge on the clustered primary key then that is ideal. This is _close_ to what you have suggested, but results in a plan that is insert and deadlock friendly. Then you can mitigate escalation by "batching" the purge-- i.e. TOP n as you suggested so that no escalation occurs, hurting throughput. If you are in an environment that can tolerate the escalation then by all means do it!

4- If you can't do that, then you are stuck with secondary lookups and just have to deal with the escalation and hopefully have a window to delete that doesn't have concurrent throughput requirements.

In general this is one of those things that is extremely frustrating to me about traditional RDBMSs. In my opinion its a side-effect of the "be everything to everyone" philosophy which RDBMSs follow, which makes them less-good at any one of these behaviors. This is a complete requirement for an OLTP data store-- and yet we have to rely on features which cost avg 120k$ (enterprise mssql) or worse (600k$+ for oracle enterprise + partitioning) just to get a simple feature like clean, simple, meta-data only purging.

Steve
ziangij
ziangij
SSCrazy
SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)

Group: General Forum Members
Points: 2910 Visits: 374
thanks for this wonderful article :-)
Josep
Josep
SSC-Enthusiastic
SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)

Group: General Forum Members
Points: 148 Visits: 4666
Thank you for this post! I like the way you give the multiple alternatives, pros and cons :-)
Maxim Picard
Maxim Picard
Old Hand
Old Hand (394 reputation)Old Hand (394 reputation)Old Hand (394 reputation)Old Hand (394 reputation)Old Hand (394 reputation)Old Hand (394 reputation)Old Hand (394 reputation)Old Hand (394 reputation)

Group: General Forum Members
Points: 394 Visits: 594
Very well written article!
SQLBOT
SQLBOT
SSChasing Mays
SSChasing Mays (660 reputation)SSChasing Mays (660 reputation)SSChasing Mays (660 reputation)SSChasing Mays (660 reputation)SSChasing Mays (660 reputation)SSChasing Mays (660 reputation)SSChasing Mays (660 reputation)SSChasing Mays (660 reputation)

Group: General Forum Members
Points: 660 Visits: 836
This is a very well thought out article and well-written.
But I'm with stevemash here. You don't address locking at all in your article which is a big issue for people purging high transaction tables.

I have a few suggestions:

Taking smaller bites, which you mentioned, can sometimes beat the calculus of lock escalation
(This has something to do with % of rows being updated/deleted and the amount of memory pressure)

Disabling lock escalation is one method, but it can eat up lock memory so should be used with caution.
Disable lock escalation in your session using Trace 1244 or 1211 (nuclear option)

You can also jam open a transaction with an update lock like so:

BEGIN TRAN
SELECT * FROM [dbo].[myTable] with (UPDLOCK, HOLDLOCK) WHERE 1=0
WAITFOR DELAY '10:00:00'
COMMIT TRAN

The lock hints start an IX lock on the table which is held for one hour.
Any other transaction's table lock on dbo.myTable won't escalate past your jam (unless someone else manually uses the tablock hint)

This is pretty much like using the trace flags... you could probably run out of lock memory and tank your system this way, since SQL wants to escalate, but can't.

~Craig

Craig Outcalt



Tips for new DBAs: http://www.sqlservercentral.com/articles/Career/64632
My other articles: http://www.sqlservercentral.com/Authors/Articles/Craig_Outcalt/560258
Indianrock
Indianrock
SSC Eights!
SSC Eights! (864 reputation)SSC Eights! (864 reputation)SSC Eights! (864 reputation)SSC Eights! (864 reputation)SSC Eights! (864 reputation)SSC Eights! (864 reputation)SSC Eights! (864 reputation)SSC Eights! (864 reputation)

Group: General Forum Members
Points: 864 Visits: 2258
If your database has been normalized you'll have to deal with child table records. Ours is one of those created, initially, by a tool ( Visio I believe) and is very highly normalized.

All previous attempts to do significant purging ( for example to create a small version of the production database for development ) exceeded sql server's capability due to the high degree of normalization and constraints.

We do have a nightly data purge job, but it's limited to just a few tables.



alen teplitsky
alen teplitsky
SSCommitted
SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)

Group: General Forum Members
Points: 1595 Visits: 4621
i was going to try this, but i don't think it will work very well with a non-clustered PK that is an identity column. and how long is the insert into the lookup table take compared to just deleting the data without referencing the lookup table?

i was going to try it on a table with 100 million rows where i always delete data that is over 30 days old. populating the lookup table seems to be a problem. the data is log data from different servers.
TheSQLGuru
TheSQLGuru
SSCertifiable
SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)

Group: General Forum Members
Points: 6000 Visits: 8314
batching with an appropriate non-clustered index seek delete plan (verify based on your batch size - or force it explicitly) can be a HUGE win in large delete scenarios. Depending on what fraction of data you delete it is often a big win to drop other indexes and rebuild them after the deletes are finished.

Gotta take exception to one thing in your article:

1.Data File size - around 1GB initial, 1MB growth increment
2.Log File size - around 800MB initial, 2GB growth increment

Please don't EVER use 1MB growth rate. Devastatingly bad external and internal fragmentation. And for tlog sqlskills.com and kimberly tripp have very good information about tlog size/growth. 2GB growth is suboptimal.

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
alen teplitsky
alen teplitsky
SSCommitted
SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)

Group: General Forum Members
Points: 1595 Visits: 4621
is it better to use a non-clustered index in a large delete? i thought it was the opposite since you would be deleting data that is logically together in the table

years ago i saw an issue where large amounts of data was inserted, deleted and modified on a non-clustered index and it cause a lot of blocking when the app changed to use multiple threads. we changed the clustered index column and the problems went away since the data that was changed was on the same pages and extents
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search