Physically Deleting data or Deactivating rows approach

  • Hi everyone .

    We have a debate going on in the company on whether to physically delete data or use status flags to deactivate data . I personally favour physically deleting data and keeping the main transaction tables as small as possible. We have triggers that moves the deleted records to the history tables , so the data never gets lost.

    Some screens have radio buttons where you can select certain records and delete them . If we delete Table A , then we have to do cascading deletes on the child tables . In all we might have to delete from 10-14 tables down the road . The indexes are all in place to enable fast deletes .

    Even if we use the flag/status approach , we could have to fire cascading updates where we update the status to delete for the underlying tables.

    Here is my arguments for physically deleting the data

    Lets keep deleted or less frequently accessed data in history tables and keep the real or live data in the main transaction tables. This would also help “Read” performance . Lets say we have a table of 1 million records . If 10-20% of the rows are deleted (using the flag approach) , then sql server would have to do more work or read more pages to retrieve the data . Another drawback is that we need to add filter conditions in the related select and filter procs to avoid the deleted rows.

    So what would you do or normally prefer . I would greatly appreciate feedback on this .

    Thanks

    Bilal.

  • I prefer physical deletion and logging.

    Flagging data (virtual deletion) means that every single piece of code, every query, every update, etc., all has to take the flag into account. Even if a piece of code can safely ignore the flag, you have to test it to make sure of that before the code goes live. It becomes very easy to miss that flag somewhere.

    On the other hand, a query can't return a row that's no longer there. Simple.

    Flagged rows can also mess up table statistics and index selectivity. They make table scans (if such are necessary) take longer. They can mess with join-where performance pretty easily.

    A "Deleted" column, whether it's a bit, or char(1), usually only has 1 or 2 values in it, which means it's either unindexable, or unselective if indexed, which means it makes every query in the database slower.

    Keeping "deleted" data means backups are larger and take longer. Which means they also make restores take longer. Which means more downtime if you have to restore.

    The amount of disk space and the number of rows in tables are certainly things to consider, but with disk space as cheap as it is these days, that argument wouldn't hold much weight for me. But the statistics and indexing issues, the backup/recovery time issue, and the error-prone issue matter a lot, and can't be fixed simply by throwing more hardware budget at them.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Agreed on all counts. I even go one step further - I don't keep huge history files in the same database. So - I use a "local" history file for "warm storage" of the old stuff, which then gets flushed (say - once a month) to a separate database just for that. We then back that archive DB up once or twice each month after it's been populated.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I don't keep any history tables in the same database. They bulk up backups for no good reason. When I log updates/deletes/inserts, I do so in a separate database.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I'm wary of deleting data since it's often needed the day after it's deleted, but since you are moving to a history table, I'm be inclined to agree with everyone above and delete it.

  • I use the flag approach when I have to undelete through the application.

    Otherwise, delete with or without archiving or change history is fine.

    It depends on the system requirements.

  • bell_vbdev (3/26/2008)


    I use the flag approach when I have to undelete through the application.

    Otherwise, delete with or without archiving or change history is fine.

    It depends on the system requirements.

    An "Undelete" proc can easily recover data from a log table and restore it to the main table. A log table also allows the "undo" of other errors, even after they've been committed.

    For example, in a prior workplace, we outsourced part of our web development. One day after the owner of the company we outsourced to told me how poor my SQL skills were (he wasn't far wrong at that time), one of his developers did an update statement in the production database, on the Customers table, without a Where clause on the update. After a few customers and employees noticed that their last names had all been changed to something like "Umpilmuhestimanahimala" (I may have misspelled that), and one of them called me, I was able to revert the change in well under a minute. Didn't have to restore from a backup, didn't have to open a log parser (not sure I would have even known how to do that back then), just had to fire up an "Undo" proc I already had in place.

    (After they repeated similar errors, including truncating the Products table and changing every customer's e-mail addresses, all in the production database, several times in two days, all while using the "sa" login, I managed to convince the company's owners to cancel the contract with them.)

    So, besides the advantages of real deletes over fake deletes (table statistics and indexes, backup size, etc.), there are also significant other advantages to the "delete and log" method, since it also can include the "update and log" method.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thankyou guys for the feedback. The only drawback i see is the cascading deletes that my delete proc will have to fire for multiple tables in the chain . fortunately for us , deleting will not be a very popular activity from the F/E from the users pt of view.

  • I use a delete flag field in a table when users want to control undeleting within their application.

    Then I'll delete the records for real when they exit the application.

    Otherwise, if they need something restored outside the scope of the application, they'll need a DBA.

    Who can then restore from a backup, log file, or whereever.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply