I need to delete 20 million rows from a table...

  • If you restore a backup a week later, and the time of the backup was before the truncate committed, the table is there. Doesn't matter if the truncate was started or not when the backup was run.

    If the backup is post-commit, the data is gone.

    It's logged. It's logged in a different way, but that doesn't compromise the ACID principles or impact recovery if the backup is from the correct time.

    A basic demo: http://shaunjstuart.com/archive/2014/03/truncate-table-does-not/

  • mike.gallamore (3/18/2014)


    That doesn't really clear it up for me. She talks about rolling back the same transaction that the truncate was done in. There are exclusive locks: "and just like all X locks, they are held until the end of the transaction" . So if you use a fire and forget query in SSMS or whatever to truncate your "not needed" table, or simply think you're happy and commit the transaction then a week later want to restore the db (or a subset of the data that was in the table) what do you do? It isn't clear to me that you aren't then pouched.

    as per Kalen's answer to Raj (in the Comments section)...

    '...

    Raj said:

    Kalen,

    suppose we have done commit truncate table ..but can we recover that trucate table through transactional logged backup ? as we can do that for deleted rows ...

    September 22, 2012 4:08 AM

    Kalen Delaney said:

    Hi Raj

    We can use a log backup to recover any changes by just restoring to a time BEFORE the change took place, whether it's a delete, truncate or drop table, it doesn't matter. The restore does not actually undo anything.. it just brings the database back to a point before you removed the data.

    ~Kalen

    ...'

    So, if you're using FULL/BULK LOGGED and you can replay the backups to the point just before the TRUNCATE took place, you can recover the data - and if I were about to carry out such work on important data (and through change control if Prod) I would take a backup of the TLog just before carrying out the TRUNCATE operation, making sure I took a note of the TLog Backup name, so that if we every need to recover the data we have a record of at what point we would need to recover to.

    If you were only using SIMPLE RECOVERY, then, again depending on the importance of the data (and through change control if Prod) either a FULL or DIFF backup just before carrying out the TRUNCATE operation. Same as above, having a record of the backup name should we need to recover the data.

    And, of course, we can recover over the original DB, or if that is not possible, then to a suitable secondary SQL Instance and recovery the table data that way and use a number of methods that may be open to us to transfer the data back to the original truncated table.

    Simples... 🙂

    Edit: for clarity, changed '...making sure I took a note of the TLog name...' to '...making sure I took a note of the TLog Backup name...'

  • Yes, attempting to approach this the wrong way will potentially hose your server for days, and rebooting a giant transaction half-way will only make things worse. I am confronted with the exact same scenario on occasion, backing out 10s or 100s of millions of rows from large tables in a data warehouse. You said that you're disk space challenged, so I'll tell you how to delete this 20 million row table while minimizing transaction log growth. It works by deleting the rows in batches. It also completes a hell of a lot faster when batching updates and deletes. In my case it's like 2 hours to complete compared to 2 days. It uses raiserror to print status as informational messages between each batch. This technique will yield better performance regardless of recovery model; simple, full, or bulk, but simple recovery will minimize log file growth. You can cancel and re-start at any time, and any rows deleted up to that point are committed, so long as you don't wrap this block in a transaction. Also, if possible, I'd suggest placing database in SINGLE or RESTRICTED USER mode while this is running to prevent anyone else from taking out locks on the table or opening their own transactions. set nocount on; declare @batch_rows int = 0, @total_rows int = 0; while 1 = 1 begin -- to minimize transaction log growth, pausing and -- checkpointing after each batch will help re-use space: waitfor delay '00:00:05'; checkpoint; -- delete batch of rows: delete top (1000000) from SomeBigTable; select @batch_rows = @@rowcount; select @total_rows = @total_rows + @batch_rows; -- print status message: raiserror('Rows affected: %d', 0, 1, @total_rows) with nowait; -- if no rows were deleted, then break from loop: if @batch_rows = 0 break; end;

    Not that I mean to wake up an old sleeping thread , but I understand lock escalations are eligible to happen after 5k+ rows are affected by the operation. If we want to keep the table online and with minimal interuption (say we are only archiving), then I would probably batch delete < 5k rows (per iteration). More time consumed but keeps other connections happy realtively speaking

    • This reply was modified 11 months, 2 weeks ago by  MMartin1.

    ----------------------------------------------------

Viewing 3 posts - 46 through 47 (of 47 total)

You must be logged in to reply to this topic. Login to reply