RE: Deleting large number of rows from a table & a heap in a VLDB

SSC Journeyman

Points: 92

March 23, 2011 at 11:09 pm

I also faced these problems. In our case we had a set of tables (~5) that had 25M to 50M inserts per day. This was on Tier1 SAN and 8-way and 16-way boxes. In our case, this was a fairly typical OLTP workload and we needed the data to be purged after around 3 days. We also had the additional constraint that we had to purge concurrently because the system was 24x7. Yes there were some periods of less activity (in which we ran the purge to minimize the decrease in throughput), but we still had concurrent inserts happening.

When I saw your suggestions I noted the following road blocks to implementing them in my environment:

1) the lock escalation-- this is going to escalate to table locks in SQL 2005 (without undocumented trace flags), which you accpet as you set the table lock to TABLE -- although this would happen with the default setting, so I'm not sure what this statement is for.

2) there are a lot of deadlock possibilities when using a secondary index to join into the clustered index to do a delete. An X lock will be taken on the record in the secondary, and then an X on the clustered. Which is OPPOSITE ordering from a normal insert which will take an X lock on the clustered primary index, then an X lock when inserting into the non-clustered, secondary index. In your case, this is largely mitigated after the 5000th lock or so as the lock escalation will prevent the additional deadlock opportunities. However, for us, the table lock escalation was unacceptable as is so we didn't do this.

Here are my thoughts on this (only with regards to clustered index deletes-- I have no experience with heap deletes, and would like to hear more about why you are using heaps in the first place-- I know of only a few very narrow cases with typical hardware configurations and typical workloads where heaps are the right answer):

1- the _best_ option if it works with your constraints is to utilize horizontal purging. Data purging should be a constant time, metadata operation if you are in a high volume, high churn OLTP environment. It just makes little sense to be wasting so many cpu cycles and log writes to handle something that is not part of "real time" processing. SQL 2005 supports "sliding window" partitioning fairly well-- there are some problems to overcome with some scripting, but others before me did it well and have their scripts out there online. To make this work you have to have easy groups (dates, days of the week, etc) that you can use to make your partitions.

2- If you aren't using SQL 2005+ or Oracle or something that natively supports horizontal partitioning (or you aren't using the enterprise editions of these products, which include support for this) then you can approximate horizontal partitioning with views, but it is extremely messy. I have seen this done in a number of systems, and its always a headache for the dba and the people writing the scripts to maintain it, but it doesn't always impact developers. They program accept the schema in the view and don't care about whats under the view.

3- If you can't use partitioning then if you can design the primary key to be prefixed with the date or something that is sargable to be able to purge on the clustered primary key then that is ideal. This is _close_ to what you have suggested, but results in a plan that is insert and deadlock friendly. Then you can mitigate escalation by "batching" the purge-- i.e. TOP n as you suggested so that no escalation occurs, hurting throughput. If you are in an environment that can tolerate the escalation then by all means do it!

4- If you can't do that, then you are stuck with secondary lookups and just have to deal with the escalation and hopefully have a window to delete that doesn't have concurrent throughput requirements.

In general this is one of those things that is extremely frustrating to me about traditional RDBMSs. In my opinion its a side-effect of the "be everything to everyone" philosophy which RDBMSs follow, which makes them less-good at any one of these behaviors. This is a complete requirement for an OLTP data store-- and yet we have to rely on features which cost avg 120k$ (enterprise mssql) or worse (600k$+ for oracle enterprise + partitioning) just to get a simple feature like clean, simple, meta-data only purging.

Steve