Delete data and Transaction Logs

Question

Delete data and Transaction Logs

Shaira

SSCrazy

Points: 2432
More actions
June 20, 2013 at 4:01 pm

#274269

I have already Bulk inserted 9 million records from one table into another using SSIS; my problem now is how to delete these 8M records from the original table without blowing up my transaction logs? This table is heavily used 24-7, and honestly, I don't know what 'pitfalls' might occur if I change the recovery from Full to Simple, and back again after the delete. Do the transaction logs pickup where they ended? Don't they realize somethings missing :w00t:?
I found this query, where the writer said it was more efficient, but I'm just not experienced enough to know. From everything I've read, I'm assuming there is no way to bypass the transaction logs all together?
DELETE FROM myProductionTable
WHERE EXISTS(SELECT * FROM myNewArchiveTable as arc
WHERE arc.id = myProductionTable.id)
Any advice would be great!! Thanks!!

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Suresh Babu Palla SSChasing Mays Points: 610 More actions · Answer 1

There is no way that you can bypass the usage of T-Log, You can automate to delete the data from the table in chunks on daily or weeks whichever convenient or suits to your environment.

kwirz SSC-Addicted Points: 433 More actions · Answer 2

I suggest that you build a temp table of ids that you want to delete and then have a cursor delete one id at a time from your production table. That will keep your TempDB from growing huge and also prevent table locks from blocking access to your other users. Something like this:

declare @IDTable as table (ID int)

insert into @IDTable

select ID from ProductionTable

where exists (select ID from ArchiveLocation)

declare @ID int

declare myG_cursor cursor for

select ID from @IdTable

open myG_cursor

fetch myG_cursor into @ID

while @@fetch_status = 0 begin

delete from ProductionTable where ID = @ID

fetch myG_cursor into @ID

end

close myG_cursor

deallocate myG_cursor

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Shaira (6/20/2013)
Do the transaction logs pickup where they ended? Don't they realize somethings missing :w00t:?

You mean your log backups? No, they won't just pick up again, yes, they know the log chain's broken. Log backups will fail until another full or differential backup is taken

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

David Moutray SSCertifiable Points: 5956 More actions · Answer 4

Let me see if I understand the problem correctly. You have an overnight ETL process which takes 9 million + records from Table A and then Bulk Inserts them into Table B. Now you want to delete these records from Table B. You are concerned about the length of time it might take to delete these records, and you also want to avoid excessive growth in your transaction logs.

The other posters are correct: Deletions are always fully logged. Even if your database recovery mode is set to simple (meaning that the transactions are deleted from the log as soon as they are committed to the data file) - even in that case, your transaction log can get quite large from deleting several million records in a single transaction.

The way to avoid excessive growth of the transaction log is to separate the deletions into separate batches so that they aren't all done within the same transaction. I disagree with using a cursor to delete one record at a time. I haven't tested that approach, but I would expect it to take forever to delete 9,000,000 records. (If someone has tested that approach with several million records, I'd really like to hear how well that worked. :-))

Another approach is to use a WHILE Loop, like so:

DELETE TOP 50000

FROM TableA

WHERE Delete Condition is True;

WHILE @@ROWCOUNT = 50000

DELETE TOP 50000

FROM TableA

WHERE DeleteCondition is True;

Using the TOP clause without an ORDER BY clause will cause SQL to delete the first 50000 records retrieved which meet the condition to be deleted. If exactly 50,000 records were deleted, then (probably) more records need to be deleted, so you do it again until the deletion condition returns less than 50,000 records. Then you stop. You will need to experiment with the batch size to get the best performance for your environment. In my experience, something between 10,000 and 100,000 records per batch will probably work best.

If you are using the Simple recovery mode, this is all you need to do, because the deletions are truncated from the transaction log as soon as they are committed to the data file. It sounds as though you are using the Full recovery mode, though. In that case, you will need to take a Transaction Log backup between batches to clear the data from the log. If you have a batch size of 100,000 records (as an example) and you are deleting a total of 9,000,000 records, then you will have a total of 90 log backups. You can write a script to do that, but it will take a little work on your part. 🙂

If possible, you might want to look for an alternative approach. If you are deleting ALL the records from Table A after you Bulk Insert them into Table B, then you can use the TRUNCATE command on Table A. That will drop ALL the data from Table A, and it won't be logged (so it isn't reversible) but it is fast, as long as you can get an exclusive lock on the table for the length of time it takes (a second or two).

Another possible alternative, depending on the edition of SQL Server you have, might be to set up Partitions on Table A. Then you could use a "sliding window" technique to drop a partition after you have copied that data to Table B. That will also take some setup on your part, and it requires that you have the Enterprise edition of SQL server.

If you don't have access to the Enterprise edition, then perhaps you could do something a little creative like using a "poor man's partition." 🙂 Divide Table A into separate tables based on how that data moves from Table A into Table B. Create a view that combines the data from all of these tables into a single object that users can query. Then, when you have transferred one table's worth of data to Table B (as above) you can just truncate that table. This also will require a fair bit of code to manage. There are also some restrictions on views, so - depending on how Table A is used - this solution might not be practical, either.

That pretty much exhausts my bag of tricks. Perhaps someone else has another suggestion for you. I hope this is helpful. 🙂

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

If you're regularly moving millions of rows from one table to another, have you considered partitioning? One of its main features is exactly that.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Shaira SSCrazy Points: 2432 More actions · Answer 6

Wow – what great info! Thanks Gail for the info on the log backups. I had read several articles on changing the recovery mode to simple when doing the deletes but I could not understand what would happened to the transaction logs when I put it back to full. I do have concerns about having my transaction logs fail until the next full is taken.

I have been handed a database that someone built hundreds of tables and for the last 5+ years have been dumping massive amounts of data into – and unfortunately – data critical to our business. I need to get some of these tables under control and make them more efficient and my first thought was to archive some of this data … hence, what is probably just the start of my problems ahead.

I am on SQL 2008 Standard edition, so no partitioning available.

ScottPletcher SSC Guru Points: 100949 More actions · Answer 7

Ouch -- partitioning would be ideal for that situation.

Depending on exact circumstances, you might be able to change the current table to be a view that's defined as a concatenation the original table and the added data.

When you want to remove the data, you simply change the view definition. The actual physical truncation of the added data table could be done offline to the view change, and so with no delay or other affect on the original table/view.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

homebrew01 SSC Guru Points: 55544 More actions · Answer 8

For one-time large deletes, I also do batches as suggested above. It can take a long time, but is not disruptive to other users and t-logs do not get out of control. Another benefit is that you can cancel if you need to & restart later and pick up where you left off.

Something like:

DeleteMore:

WAITFOR DELAY '00:00:05' -- give other users some cpu

delete top(10000)

from MyTable

where OrderDate < '2011-01-01'

if @@rowcount > 0 goto DeleteMore

David Moutray SSCertifiable Points: 5956 More actions · Answer 9

Shaira (6/21/2013)
I have been handed a database that someone built hundreds of tables and for the last 5+ years have been dumping massive amounts of data into – and unfortunately – data critical to our business. I need to get some of these tables under control and make them more efficient and my first thought was to archive some of this data … hence, what is probably just the start of my problems ahead.
I am on SQL 2008 Standard edition, so no partitioning available.

Isn't that always the way it happens? Someone implements an application put together with bubble gum and toothpicks, and then no one pays any attention to it until the thing is half-dead and wheezing on life-support.

Since you have the Std. Edition, you can try storing the data in separate tables and using a view for data retrieval. I've seen that done, but it is awkward, takes a lot of effort to maintain, and it won't perform as well as true partitioning.

If you are routinely shuffling millions of records, then I recommend that you push back to management and insist on upgrading to the Enterprise edition. You are doing Enterprise-level work, and the Std. Edition is not intended for that. In addition to partitioning, you also get data compression which could be a big help to you as well.

Good luck! 🙂

Shaira SSCrazy Points: 2432 More actions · Answer 10

ScottPletcher (6/21/2013)
Ouch -- partitioning would be ideal for that situation.
Depending on exact circumstances, you might be able to change the current table to be a view that's defined as a concatenation the original table and the added data.
When you want to remove the data, you simply change the view definition. The actual physical truncation of the added data table could be done offline to the view change, and so with no delay or other affect on the original table/view.

Sorry Scott, I'm really going to show my "newbie-ness" here ... are you talking about partitioned views (something I spent the weekend reading about), or are you referring to something else. In my limited experience, I guess I always figured there was a table behind a view. Will you please point me in the right direction to see if this is a viable option for me.

Once under control, I was kind of encouraged by what I read on partitioned views. Unfortunately "no money" for an enterprise edition.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

Indeed, he's referring to partitioned views, views that have multiple tables behind them, with UNION ALL.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ScottPletcher SSC Guru Points: 100949 More actions · Answer 12

IIRC, partitioned views are allowed in Standard Edition (at least as long as you don't try to update using a distributed partitioned view?).

As I noted first, a truly partitioned table would be the ideal choice, but that's only possible with Ent. Ed. -- which, "ouch", you don't have and so can't do.

Worse case, you might have to use just a "dumb" view with a UNION ALL, but that would likely hurt performance much more than a true partitioned view.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.