Hidden Tricks To SQL Server Table Cleanup

Question

Post reply

Hidden Tricks To SQL Server Table Cleanup

Edward.Polley 76944

SSC Eights!

Points: 834
More actions
March 5, 2013 at 9:26 pm

#274946

Comments posted to this topic are about the item Hidden Tricks To SQL Server Table Cleanup

Viewing 15 posts - 1 through 15 (of 96 total)

You must be logged in to reply to this topic. Login to reply

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 1

I think the creation of a separate view is unnecessary when you can have the view inline:

DECLARE @Rowcount int

SET @Rowcount = -1 -- Initialise

WHILE @Rowcount <> 0

BEGIN

;WITH CTE AS

(

SELECT TOP(2000) *

FROM EventTracking.dbo.Event

WHERE EventCloseDate < GETDATE() - 6 -- remove rows older than 6 days

ORDER BY Eventid

)

DELETE CTE

SET @RowCount = @@ROWCOUNT

END

Alternatively:

DECLARE @Rowcount int

SET @Rowcount = -1 -- Initialise

WHILE @Rowcount <> 0

BEGIN

DELETE EventTracking.dbo.Event

WHERE EventId IN (SELECT TOP(2000) EventId

FROM EventTracking.dbo.Event

WHERE EventCloseDate < GETDATE() - 6 -- remove rows older than 6 days

ORDER BY Eventid)

SET @RowCount = @@ROWCOUNT

END

David.Poole SSC Guru Points: 76351 More actions · Answer 2

Most people would never go beyond the bounds of a BIGINT with an IDENTITY(1,1) but for those of us who do, remember that reseeding identity values is incompatible with this method.

Mind you at those sorts of scales you probably have enterprise edition and can do partition switching to do even faster deletes with minimal IO impact!

I take it that the EventCloseDate as a NULL field means that you don't know in advance what that close date will be?

If you did know the EventCloseDate upfront then I'd look at putting the clustered key on it. Either that or put persist the EventStartDate and put the clustered key over the EventStartDate and Duration. Obviously this would need evaluating and testing as events going in out of sync will cause some degree of fragmentation. Whether such fragmentation would be significant or an issue for you is something only you can determine. All standard "It Depends" stuff.

LinkedIn Profile

karthik babu Hall of Fame Points: 3337 More actions · Answer 3

Alternatively, we can use this too.. it takes about an hour to delete a million records

set rowcount 50000

DELETE FROM EventTracking.dbo.Event

while @@rowcount>0

begin

set rowcount 50000

DELETE FROM EventTracking.dbo.Event

END

set rowcount 0

______________________________________________________________Every Problem has a Solution; Every Solution has a Problem: 🙂

SQLBoar SSC Veteran Points: 278 More actions · Answer 4

Agree with last poster. Basically, DELETE TOP 1000000 FROM EVENT doesn't work, but DELETE TOP (1000000) FROM EVENT does. Similarly ROWCOUNT works, but the former (TOP), has the benefit that you can use a variable. So one of the statements is very misleading, except if you're using SQL Server 2000 - when DELETE TOP (1000000) won't work.

Although this tip is useful for me, as I have groups of tables I have to delete in chunks and order of last ID in the same sequeunce across the sets of tables.

This DBA says - "It depends".

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 5

The code in the article doesn't check that the EventCloseDate older than 6 days when it does the delete. Is it that the EventCloseDate is always present and monotonically increasing with the EventId?

It also won't always delete all the rows, for example, if there are 5000 rows that need deleting it will only delete the first 4000.

P Jones SSChampion Points: 12374 More actions · Answer 6

Itzik Ben-Gan teaches

WHILE 1=1

BEGIN

DELETE TOP (5000) FROM table WHERE .....

IF @@ROWCOUNT < 5000 BREAK

END

and you can add a WAITFOR to delay between iterations

Adjust the 5000 to suit your table.

jims-723592 SSC Veteran Points: 202 More actions · Answer 7

Why keep counts at all?

Let all logic live in the view, then while any entry in the view, remove

ALTER PROCEDURE [dbo].[usp_VTrickleDeleteEvent]

AS

SET NOCOUNT ON

SET QUOTED_IDENTIFIER OFF

BEGIN

WHILE exists (Select 1 from VEventCleanup)

BEGIN

BEGIN TRAN RemoveEvent

DELETE FROM VEventCleanup –- delete from view with order by

COMMIT TRAN -- COMMIT to release locks

END

jcunningham SSC-Addicted Points: 407 More actions · Answer 8

What about the transaction log? I have worked in an environment where disk space was hard to come by. I had a similar need, but I also had to add CHECKPOINTs in the code, because the tran log was filling up the disk space.

You might be asking, "Why not just add more disk?" You would have to understand the company I worked for. BTW, I don't work there anymore :).

Brian Smithson Grasshopper Points: 14 More actions · Answer 9

This is excellent. We also use the DELETE TOP (###) syntax. Works great.

BTW, it is 'stored procedure' not 'store procedure.' Sorry for picking the nit.

jmccaskie Valued Member Points: 68 More actions · Answer 10

Adding a view would have helped my little process deleting polution from an audit trigger. A view would have simplied my efforts!

My twist for others to consider is adding a pause to let other possibly locked out processes to jump in:

WHILE Exists(SELECT TOP 1 RECORDID

FROM [Audit].[dbo].SCHOOL

where ChangeBy IN ('<snip>'

,'NT AUTHORITY\SYSTEM')

) BEGIN

DELETE [Audit].[dbo].SCHOOL

FROM [Audit].[dbo].SCHOOL x

INNER JOIN (

Select TOP 10000 RECORDID

FROM [Audit].[dbo].SCHOOL

where ChangeBy IN ('<snip>'

,'NT AUTHORITY\SYSTEM')

) y

ON x.RECORDID = y.RECORDID

waitfor delay '00:00:01'

END

Eric M Russell SSC Guru Points: 125623 More actions · Answer 11

Here is one technique I use when there is a need to perform bulk deletes on a large table. It is not an online operation, but it should be able to complete the job on a multi-GB table within 10 minutes. Not only is this one of the fastest methods, but it minimizes tempdb usage, minimizes transaction logging, and the end result leaves the table with organized indexes.

#1 BCP a select of the records you to keep into native format file ( -n option).

#2 Truncate the table.

#3 Drop indexes.

#3 BCP from the file into the table. Specify -E option if you want to insert original identity values.

#4 Re-create indexes.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Antares686 SSC Guru Points: 125444 More actions · Answer 12

OK, I am not sure I am on board as the example is written. If you know you are going to be cleaning up the database regularly I would personally cluster the EventClosedDate field to ensure I can use that for fast removal and non-cluster the primary key (identity column). This is for multiple reasons.

1) I don't see anything at all saying EventID 2 couldn't end months after EventIDs 3-2000. And since your top doesn't actually take that into account you could in theory delete something still valid in your rules.

2) If your purpose is to remove values from a table where a particular is your target goal then that column as a rule must be referenced in your query otherwise you may miss values or get more values than expected.

3) You are using ceiling in your query, thus if the calculation is 880.000000001 you will make 881 loops. In the 881st loop you will wipe out data which does have a date value >= your 6 day old value becuase there is no sanity check. I suggest use floor, you will have some values hanging but better to trim to little than to trim to much in the business world. Refer back to 1.

4) I would use your method after altering indexes to ensure I can use EntryDate as my qualifier to trim exactly what I need.

BTW, title is a bit misleading, breaking processes into smaller transactions has always been known to have the best performance method in reducing contention.

jjtetzlo SSCrazy Points: 2561 More actions · Answer 13

karthik babu (3/6/2013)
Alternatively, we can use this too.. it takes about an hour to delete a million records
set rowcount 50000
DELETE FROM EventTracking.dbo.Event
while @@rowcount>0
begin
set rowcount 50000
DELETE FROM EventTracking.dbo.Event
END
set rowcount 0

Although I'm fond of that technique as well, it's deprecated syntax: http://msdn.microsoft.com/en-us/library/ms143729.aspx.

- Jeff

taylor_benjamin SSC Veteran Points: 286 More actions · Answer 14

My concern about the application of this technique is a reliance on the primary key being date specific.

In short, it selects the top N rows where the date < 6 days ago ordered by primary key.

As soon as a row is added with a date >= 6 days ago, this query will ignore any records following that date which may have been previously deleted.

You can still use the batch method, which is really the key to the whole performance issue, and very timely. The difference is that you need to step through the whole table.