SQLServerCentral Article

Deleting Large Number of Records

,

Introduction

One of the frequent questions asked on forums is how to delete a large number of records from a table in SQL Server. The question can come from several directions. Sometimes it is the result of excess growth of the database's transaction log, or that the table must remain available to users during the delete process, or simple that the user is looking for an efficient means to delete records on a regular basis.

With this, I'd like to present one method that can be used to delete records from a table that will allow the table to remain available to users and will help keep the transaction log from growing excessively. I will show how this can be done in SQL Server 2000 and SQL Server 2005/2008.

The Setup

The first thing we need to do, however, is setup the table from which we will be deleting a large number of records. Most requests for help in this area come from individuals that are asking how to delete several million rows of data from a multi-million row table. For this code we will create a table with 20 million rows of data, and we will delete approximately 10%, or around 2 million, of those rows. The test data that will be generated below will have transaction dates (TDate) from January 1, 2000 through December 31, 2009. We will purge all data from the Year 2000.

    USE [SandBox] -- This is a my test bed database
GO
-- Create the test table
create tabledbo.LAPTest (
     AccountID int,
     Amount money,
     TDate datetime
)
create clustered index IX_LAPTest_TDate on dbo.LAPTest (
TDate asc
) GO --===== Build the table 100000rows at a time to "mix things up" DECLARE @CounterINT SET @Counter = 0 WHILE @Counter < 20000000 BEGIN --===== Add 100000rows to the test table INSERT INTO dbo.LAPTest( AccountID, Amount, Date) SELECT TOP 100000 AccountID = ABS(CHECKSUM(NEWID()))%50000+1, Amount = CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY), Date = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0ASDATETIME) FROM master.dbo.syscolumns t1 CROSS JOINmaster.dbo.syscolumns t2 --===== Increment the counter SET @Counter = @Counter + 100000 END GO--===== Build the table 100000rows at atime to "mix things up"

Now that we have our test table and data, we are ready to start work on the code that will delete all records for the Year 2000.

SQL Server 2000

The first code snippet I will write will be for deleting the records from table in SQL Server 2000. Although this code will also work in SQL Server 2005, it will not work in future versions of SQL Server. For that reason, I will rewrite the code I develop for SQL Server 2000 to use the new capabilities available in SQL Server 2005.

I have made every effort to make this code generic, and it should work with minimal changes. As it may be used in databases that are set to use either the SIMPLE, BULK LOGGED, or FULL recovery models; I have included conditional logic to perform a transaction log backup if the database is not using the SIMPLE recovery model.

 declare @recoverymodel varchar(12), 
         @dbname sysname, 
         @backupdevice sysname, 
         @batchsize int, 
         @loopcnt int, 
         @year smallint, 
         @backupfile nvarchar(512) 
 select 
     @recoverymodel = cast(DATABASEPROPERTYEX ( db_name() , 'Recovery') as varchar(12)), 
     @dbname = db_name(), 
     @backupdevice = 'D:\Backups\', 
     @batchsize = 10000, 
     @loopcnt = 0, 
     @year = 2000 
   
 set rowcount @batchsize -- used for SQL Server 2000 to limit rows affected 
   
 while @batchsize <> 0 
 begin 
     delete from 
         dbo.LAPTest 
     where 
         TDate >= dateadd(yyyy, (@year - 1900), 0) and 
         TDate <  dateadd(yyyy, (@year - 1900) + 1, 0) 
     set @batchsize = @@rowcount 
     if @batchsize > 0 
        and @recoverymodel <> 'SIMPLE' 
         begin 
             set @loopcnt = @loopcnt + 1 
             set @backupfile = @backupdevice + @dbname + N'_' + cast(@loopcnt as nvarchar(8)) + N'.trn' 
             backup log @dbname 
             to disk = @backupfile 
             with description = N'T-Log Backup during purge' 
         end 
 end 
   
 set rowcount 0 -- reset rows affected after processing, SQL Server 2000 

SQL Server 2005

Now we will take the code above and modify it to work using the new features introduced in SQL Server 2005. This will also allow this code to be used in SQL Server 2008 and future versions as well. As you look at the code you will very little difference between the two versions. What are missing from the SQL Server 2005 version are the set rowcount statements and the addition of the TOP clause in the delete statement.

 declare @recoverymodel varchar(12), 
         @dbname sysname, 
         @backupdevice sysname, 
         @batchsize int, 
         @loopcnt int, 
         @year smallint, 
         @backupfile nvarchar(512) 
 select 
     @recoverymodel = cast(DATABASEPROPERTYEX ( db_name() , 'Recovery') as varchar(12)), 
     @dbname = db_name(), 
     @backupdevice = 'D:\Backups\', 
     @batchsize = 10000, 
     @loopcnt = 0, 
     @year = 2000 
   
 while @batchsize <> 0 
 begin 
     delete top (@batchsize) from 
         dbo.LAPTest 
     where 
         TDate >= dateadd(yyyy, (@year - 1900), 0) and 
         TDate <  dateadd(yyyy, (@year - 1900) + 1, 0) 
     set @batchsize = @@rowcount 
     if @batchsize > 0 
        and @recoverymodel <> 'SIMPLE' 
         begin 
             set @loopcnt = @loopcnt + 1 
             set @backupfile = @backupdevice + @dbname + N'_' + cast(@loopcnt as nvarchar(8)) + N'.trn' 
             backup log @dbname 
             to disk = @backupfile 
             with description = N'T-Log Backup during purge' 
         end 
 end 

Variations

Now that you have the basic generic code to delete large volumes of data from a table, you may decide that you don't want 200+ transaction log backup files should you have to do a restore to a point in time. If your database is not using the SIMPLE recovery model, you could modify the conditional used in the IF statement to decide if the transaction log backup should be run. One change you could make is to only run the transaction log backup if (@loopcnt modulo (some value) = 0 and @batchsize > 0) OR (@batchsize = 0). In addition to this logic, you could also add a differential backup after the purge is run as well.

Conclusion

With these sample code snippets, I hope you can see how easily you can batch your deletes allowing the table to remain relatively open for continued use, and at the same time keep your transaction log files manageable when your database is using the BULK LOGGED or FULL recovery model.

Rate

3.11 (111)

You rated this post out of 5. Change rating

Share

Share

Rate

3.11 (111)

You rated this post out of 5. Change rating