First of all, great article and very insightful so thanks for taking the time to write it. I do have one question about the approach you explained for deleting large quantities of data from tables with clustered index -
Assuming I understood correctly, you are saying that on a daily basis (or some periodicity), move the data you would like deleted into a look-up table. Then, once a week, join the look-up table to the table that data needs to be deleted from and perform delete. And subsequently truncate the lookup table or something. So my question is - in your implementation of this, have you encountered any blocks on that big table that the data needs to be deleted from during the join? What if that big table is frequently being used by other processes? I was planning to try this out and I am somewhat of a novice so I thought I'd check before breaking anything 🙂
Also, out of curiosity, I noticed that the look-up table that you created for deleting data from a clustered-index table itself had no clustered index, simply a non-clustered index on the column "UnixDateToDelete". Was this done for a specific reason?
Thanks again for the article!
Thank-you for your feedback, and I am happy that you liked reading my article.
As far as the concept goes, yes, you have understood correctly. Now, about your questions:
Q1. Whether or not I have encountered any blocks on the tables during the periodic purge cycles?
A1. Our is an on-premise system, and hence, we execute the purge during the weekly IT maintenance window provided to us by the customer (it's a configurable SQL job that does the purge). The window typically varies from 2 to 4 hours, and hence we have to be in and out of the system in about an hour for IT to do the rest of their maintenace. Because they happen during the maintenance window, all interfaces are down and hence, we have not had any blocking issues.
There have been cases where we had to execute the purge online, and even then we did not face any major blocking issues.
As an alternative, you may want to partition your table, and set the lock escalation to AUTO. What this will do is ask SQL Server to escalate locks to the partition, and not to the entire table.
Q2. Why did I use non-clustered index on the lookup table as opposed to a clustered index?
A2. No specific reason as such. Generally speaking, you can use a clustered index on the lookup table as well - no harm at all (in fact, your deletes may be even faster). In our case, we did not want to enforce any constraints or establish any relationships with the lookup table, and hence you will see that no PK-FK has been used. If your design is such that you can allow for a relationship to exist, please go ahead and use the clustered index by creating the "UnixDateToDelete" as Primary Key.
I hope that I was able to answer your questions satisfactorily. If there is anything else I can help you out with, do let me know.