Efficient Data Deletion Method and Transactional Databases

  • What is the efficient method/T-SQL that I can use to delete data from my SQL Server 2008 R2 database.

    Currently I am using

    DELETE TOP (100) PERCENT FROM Table1

    WHERE Id IN

    (

    SELECT Id FROM DeletionIds

    )

    As I have to delete data from 20 tables so first I move the choosen Ids into DeletionIds table and then delete all these ids from referenced key tables.

    If I am doing something wrong; kindly guide as data deletion process is becoming very critical.

    We have disk space limitations and currently cannot increase it.

    The daily data insert operations are from 5-7 millions and currently I am only able to delete 2.5-3 million records from database. It means incoming data is more than outgoing data and incoming data is eating my disk space.

    Please help to resolve this issue; Thank you.

  • As a quick suggestion you may modify your code to match with following...

    Example:

    -- JOIN based deletion

    USE AdventureWorks2008R2;

    GO

    DELETE FROM Sales.SalesPersonQuotaHistory

    FROM Sales.SalesPersonQuotaHistory AS spqh

    INNER JOIN Sales.SalesPerson AS sp

    ON spqh.BusinessEntityID = sp.BusinessEntityID

    WHERE sp.SalesYTD > 2500000.00;

    Now few question:

    •Why TOP (100) PERCENT?

    •What’s the purpose of mass deletion? Please explain it. We might be able to suggest better.

  • The most efficient way to do mass deletions is to introduce a sliding window partitioned table setup.

    This would allow you to switch out whole days of data with out meta data changes and allow you to drop the table you switched the partition into.

  • First of all, I agree with Dev on the syntax. So, to your issue. If you have referential integrity, you have to delete from the referencinig tables first, followed by deleting from the referenced (central) table. How simple this may look, you may run into performance issues, most likely with the last delete. When you delete from the central table, referential integrity will be evaluated against all the referencing tables, possibly creating some 20 table or clustered index scans in your execution plan. If so, you may consider creating nonclustered indexes on the column(s) in the referencing tables which is a part of your foreign keys. I hope this helps.

    As for space, deleting from tables do generate a lot of transaction log, beacuse SQL Server both logs the deletion and the old data so that the change can be rolled back, and it reserves enough space in the transaction log to be able to roll back. It is not uncommon for me to generate 10GB+ of transaction log for a delete transaction (from some ten tables).



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • I worked with Join previously and found that It takes around 15-20 min to delete 1 million records from 20 tables.

    Then I changed the strategy and first pick and insert the ids (primary key) in DeletionIds table and then delete all foreign keys data (19 tables) first and then remove the data from main table and deletion time reduced to 10-13 minutes.

    Why TOP (100) PERCENT?

    I just do some R & D on web and try to find quick way to delete data from database. With this I found a 20-30% improvement in deletion process as compared to DELETE FROM Table1.

    Mass Deletion?

    1. The main reason is that my client's requirement is that I have to delete the data from database older than specific retention period (say 30 days)

    2. Also we have a limited disk space that we want to use for new/latest data and new client for the application.

    @MysteryJimbo; I will look into "sliding window partitioned table setup". Does this support all SQL 2008 R2 versions? I am using Standard version of SQL Server.

    Thanks

  • Babar Javaid (12/20/2011)


    I worked with Join previously and found that It takes around 15-20 min to delete 1 million records from 20 tables.

    Then I changed the strategy and first pick and insert the ids (primary key) in DeletionIds table and then delete all foreign keys data (19 tables) first and then remove the data from main table and deletion time reduced to 10-13 minutes.

    Why TOP (100) PERCENT?

    I just do some R & D on web and try to find quick way to delete data from database. With this I found a 20-30% improvement in deletion process as compared to DELETE FROM Table1.

    Mass Deletion?

    1. The main reason is that my client's requirement is that I have to delete the data from database older than specific retention period (say 30 days)

    2. Also we have a limited disk space that we want to use for new/latest data and new client for the application.

    @MysteryJimbo; I will look into "sliding window partitioned table setup". Does this support all SQL 2008 R2 versions? I am using Standard version of SQL Server.

    Thanks

    Based on your reply, 'sliding window' is a good idea. Yes, SQL Server 2008 R2 supports it.

  • Babar Javaid (12/20/2011)


    1. The main reason is that my client's requirement is that I have to delete the data from database older than specific retention period (say 30 days)

    IF this would be a continuous periodic job (say every month) i would recommend Batch approach in off peak hours, by this you can also manage you log.

    Babar Javaid (12/20/2011)


    2. Also we have a limited disk space that we want to use for new/latest data and new client for the application.

    rather deleting the data , i would say move the required data (latest) data into permanent tmp table (prefer new disk if you can or if you require) , rename existing table into some history table, and rename tmp into main table.in this way you can also put less overhead (log's space wise) on main table.and less data to operate.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Deleting as fast as possible:

    Two suggestions 1) log(ldf) and data(mdf) files on physically separate disks.

    2) Partition the table so that records to be deleted are distributed among several spindles

    if possible, partition the table so that incoming data and to_be_deleted data resides on seperate partition. if you can do that just delete the partition.

  • apatel62 (11/7/2012)


    Deleting as fast as possible:

    Two suggestions 1) log(ldf) and data(mdf) files on physically separate disks.

    2) Partition the table so that records to be deleted are distributed among several spindles

    if possible, partition the table so that incoming data and to_be_deleted data resides on seperate partition. if you can do that just delete the partition.

    +1

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply