Deleting Large Number of Records

Question

Post reply

Deleting Large Number of Records

Viewing 15 posts - 46 through 60 (of 73 total)

You must be logged in to reply to this topic. Login to reply

John Campbell-235017 SSC Journeyman Points: 83 More actions · Answer 1

I guess you could look at setting trace flags (1211 springs to mind) to prevent lock escalation happening at all?

But that could create another problem with insufficient memory ....

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

timothyawiseman (9/15/2009)
Lynn Pettis (9/15/2009)
I have to make two observations about this article. As of just checking the main page, there have been 7,992 views recorded for this article. Compare this to where it was originally published, 963 views in 90 days. Plus, there has been a nice discussion along with it, where there was none on the other site.
I am quite happy! :w00t:
Out of curiosity, where was it originally published?

I had originally published this and one other article on sswug.org.

Tore Bostrup-382308 SSC-Addicted Points: 448 More actions · Answer 3

Interesting article. I took a slightly different approach when needing to delete almost 99.5% of rows from two tables with a total of 200 million rows between them. The tables are log tables used for history and troubleshooting.

Because the rows to be kept were much fewer than the ones to be deleted, and because there is no need to log the operation (will take place at night after a full backup and with very low activity), and the criticality of individual rows is minimal, I chose to truncate the table and reinsert the rows to be kept. In my case, I get the "keep" rows from a query inserting into a temp table, but I could get them from a backup, which would also allow minimizing the use of tempdb. After gathering/obtaining these "keep" rows (takes a couple of hours in test), I add any rows that are added since the start of the gathering query (takes about 3 minutes in test), and then I truncate the table, insert the keepers, and reset the Identity value.

Originally, I had considered a similar approach using BCP/SSIS, but wound up simply using temp tables, as the amount of data to be kept was manageable in our environment.

Regards,

Tore Bostrup

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 4

I agree, how you delete a large number of records is also dependent on how many records are being deleted. In the article I was deleting approximately 10% of the total number of records. I would definately use a different process if the goal was to KEEP only 10% (or in your case much less) of the records.

spaghettidba SSC Guru Points: 105732 More actions · Answer 5

Tore Bostrup-382308 (9/16/2009)
Interesting article. I took a slightly different approach when needing to delete almost 99.5% of rows from two tables with a total of 200 million rows between them. The tables are log tables used for history and troubleshooting.
Because the rows to be kept were much fewer than the ones to be deleted, and because there is no need to log the operation (will take place at night after a full backup and with very low activity), and the criticality of individual rows is minimal, I chose to truncate the table and reinsert the rows to be kept. In my case, I get the "keep" rows from a query inserting into a temp table, but I could get them from a backup, which would also allow minimizing the use of tempdb. After gathering/obtaining these "keep" rows (takes a couple of hours in test), I add any rows that are added since the start of the gathering query (takes about 3 minutes in test), and then I truncate the table, insert the keepers, and reset the Identity value.
Originally, I had considered a similar approach using BCP/SSIS, but wound up simply using temp tables, as the amount of data to be kept was manageable in our environment.

Nice article, Lynn, it's in my briefcase now!:-)

Tore Bostrup,

I completely agree with this approach. I find myself very often in this situation, that basically applies to log tables.(I have wondered many times in the past if I'm logging too much information...)

I think that write-intensive + non business-critical data (==>logs) fits better in a separate database, with simple recovery: it makes everything easier to maintain.

-- Gianluca Sartori

spaghettidba SSC Guru Points: 105732 More actions · Answer 6

Tore Bostrup-382308 (9/16/2009)
Interesting article. I took a slightly different approach when needing to delete almost 99.5% of rows from two tables with a total of 200 million rows between them. The tables are log tables used for history and troubleshooting.
Because the rows to be kept were much fewer than the ones to be deleted, and because there is no need to log the operation (will take place at night after a full backup and with very low activity), and the criticality of individual rows is minimal, I chose to truncate the table and reinsert the rows to be kept. In my case, I get the "keep" rows from a query inserting into a temp table, but I could get them from a backup, which would also allow minimizing the use of tempdb. After gathering/obtaining these "keep" rows (takes a couple of hours in test), I add any rows that are added since the start of the gathering query (takes about 3 minutes in test), and then I truncate the table, insert the keepers, and reset the Identity value.
Originally, I had considered a similar approach using BCP/SSIS, but wound up simply using temp tables, as the amount of data to be kept was manageable in our environment.

Nice article, Lynn, it's in my briefcase now!:-)

Tore Bostrup,

I completely agree with this approach. I find myself very often in this situation, that basically applies to log tables.(I have wondered many times in the past if I'm logging too much information...)

I think that write-intensive + non business-critical data (==>logs) fits better in a separate database, with simple recovery: it makes everything easier to maintain.

-- Gianluca Sartori

spaghettidba SSC Guru Points: 105732 More actions · Answer 7

Sorry for double posting, looks like the site has some issues.

It says an error has occurred, but the post is added.

-- Gianluca Sartori

Paul White SSC Guru Points: 150468 More actions · Answer 8

kjs714 (9/15/2009)
I have not run across anything more definitive what is in BOL. I recommend that you setup a test database and run Profiler, looking for the Lock Escalation event to find out when it occurs. I've seen it at strange numbers like 1889, even though the number of rowlocks taken was well below what BOL states. In the context of the article, 10,000 will escalate to a table lock if no other processes have a blocking lock.

SQL Server Storage Engine Team: Lock Escalation

That should clear things up for you.

edit: regarding trace flags - the same article also compares TF 1211 with 1224. The latter is often preferred in the infrequent cases where it is required. Taking and holding a conflicting lock in another session can be problematic, especially in the circumstances Lynn describes in his article: the open transaction will prevent the log backup from achieving its goals...

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 9

So, overall, a good article and another interesting contribution to the SSC archives from Lynn 🙂

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

GSquared SSC Guru Points: 260824 More actions · Answer 10

Ol'SureHand (9/15/2009)
rob.lobbe (9/15/2009)
This works if you are the ONLY one using the database.
Running in a production system you can't just go about changing the recovery model.
Not so sure about it.
MSDN (http://msdn.microsoft.com/en-us/library/ms190203(SQL.90).aspx) say nothing to imply one cannot change recovery modes in full flight. I did change production DBs recovery modes with no adverse effects... OK, had to as someone creates them in FULL mode but does not set trans log backups...
Does anyone have reliable info to the contrary (except for general principles etc)?

Of course it's possible to change the recovery model in mid-flight. It's also possible to nail your foot to the floor. Doesn't mean either one is a good idea.

If a database needs to be in full recovery so that point-in-time restores can be done, changing it to simple recovery in order to expedite a large delete process may not be a wise thing to do. In that case, "can't" doesn't refer to physical possibility, it refers to business needs.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 11

Simon-413722 (9/15/2009)
I've been using pretty much the same concept for data deletion since quite some time ago already. It does the same job as Lynn's code, except done a bit differently. I'll post the code anyway
while 1 = 1
begin
delete top (???) from tablexxx
IF @@Rowcount = 0 break -- (exit from loop or do something)
end
The code above doesnt require rowcount to be stored and readable.
Simon

You can also use:

select 1;

while @@rowcount > 0

delete top (1000)

from table

where x= y;

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

R M Buda SSCrazy Points: 2226 More actions · Answer 12

Hi Lynn,

Nice article and an interesting discussion.

I am wondering if any folks who have been following this discussion can explain or corroborate this phenomenon:

Last year I was working on a large(ish) ETL project. I was playing with large transactions and using batching techniques to control the batch sizes. We had SIMPLE recovery mode and single user. I found that there seems to be an optimal batch size beyond which performance degrades. The mystery to me is that performance degrades as batch size increases even when there is plenty of free disk space for the transaction log. The performance degradation seemed to begin to occur with transaction volumes between about 500Mb and 4Gb. The performance degradation is beyond what would be expected for disk file allocation for the transaction log.

mgfyo01 Grasshopper Points: 17 More actions · Answer 13

Hi, why not create an empty table and insert into it only the data that must be kept together with all systems like triggers and so on ? Then either a name swap between both tables and that's it. Of course, there will be a short "blank" moment for users but if it is done together with other maintenance jobs... Best regards. Michel

Alocyte SSChasing Mays Points: 617 More actions · Answer 14

Hi guys and girls

I'd like to know why no-ones considered the Delete Cursor...

DELETE

[ FROM ]

{ table_name WITH ( < table_hint_limited > [ ...n ] )

| view_name

| rowset_function_limited

}

[ FROM { < table_source > } [ ,...n ] ]

[ WHERE

{ < search_condition >

| { [ CURRENT OF

{ { [ GLOBAL ] cursor_name }

| cursor_variable_name

}

] }

}

]

[ OPTION ( < query_hint > [ ,...n ] ) ]

Is there no batchability?

So long, and thanks for all the fishpaste 😉

Gabriel P SSCarpal Tunnel Points: 4289 More actions · Answer 15

Hi Lynn, just now reading this article due to the re-publish. If you have a table with a high rate of transactions, what do you think of using the READPAST query hint to reduce locking even further?