Delete process

  • Hello, I am in a situation where on a daily basis I need to reload a table in which some of the data will remain. Between 1/3 and 1/2 of the data will be purged daily which usually amounts to several millions of rows. We are deleting where division_id equals a specified value and it only takes a couple of minutes to accomplish this. The issue is the transaction log grows to over 1.6GB regardless of whether I am in Simple or Full recovery mode during this process. I thought Simple would reduce this a lot.

    Is it safe to truncate the log prior to and after the load process which includes the delete without losing the ability to do a point in time recovery?

    So the steps would be:

    1. truncate the log???

    2. delete from outlet_test where division_id =

    3. Bulk Insert flat file into the table

    4. truncate the log

    here is the table definition:

    CREATE TABLE [outlet_test](

    [retailer_id] [int] NOT NULL,

    [sku_id] [varchar](24) NOT NULL,

    [outlet_id] [varchar](20) NOT NULL,

    [division_id] AS (left([outlet_id],(3))) PERSISTED NOT NULL,

    [threshold] [int] NOT NULL DEFAULT ((0)),

    [qty] [int] NOT NULL DEFAULT ((0)),

    [sku_price] [int] NULL,

    [tax_rate] [numeric](5, 3) NULL,

    [sell_thru] [numeric](8, 3) NULL),

    CONSTRAINT [pk_outlet_test] PRIMARY KEY CLUSTERED

    (

    [sku_id] ASC,

    [outlet_id] ASC,

    [retailer_id] ASC,

    [division_id] ASC

    )

    Best Regards,

    ~David

  • You probably want to use SELECT INTO, then Drop the old table and finally recreate index/PK


    * Noel

  • This will not work because select into will not give me the true definition of the table, specifically the division_id which is a computed column. I suppose I could create the table and then insert into it and then create the primary key index.

    Are there any other ways you can think of?

    Thank you for your input.

    Best Regards,

    ~David

  • We got around this issue by deleting the records in slices:

    DECLARE @DeleteIncrament INT

    SET @DeleteIncrament = 10000

    WHILE 1 = 1

    BEGIN

    DELETE TOP (@DeleteIncrament) FROM Table WHERE where_clause

    IF @@RowCount = 0 BREAK

    END

  • Yep, always delete the rows in chunks for every 1000 rows or 10,000 rows and then after deletion process is done you can set up a job to Shrink the log file using : DBCC SHRINKFILE

    and also you can set your reocvery model to simple as you stated before if you dont need point in time recovery

  • David Kranes (2/19/2009)


    This will not work because select into will not give me the true definition of the table, specifically the division_id which is a computed column. I suppose I could create the table and then insert into it and then create the primary key index.

    Are there any other ways you can think of?

    Thank you for your input.

    The tip about SELECT INTO is that it is a minimally logged operation but if you try to add the calculated column as an after the Fact thing you will end up scanning the table twice and generating as much log activity.

    The idea of batching the deletes works IF you interlace it with Transaction log Backups or your DB is in Simple Recovery mode.

    There is also the idea of using Partition switching to take part of the table in and out the main and drop the exiting partition. I have not played with it long enough but if it delivers on the MS promise that could be your solution.


    * Noel

  • Just remember that table partitions are only available in the Enterprise Edition.

  • I am using the batch method with 100000 row chunks and it is working very well. There is very little to no logging occurring. My question is with this process occurring every day could the table become fragmented? Is there any maintenance I should add to the routine to prevent this or any other potential issues? I already update statistics at the end of the data load. This is a customer database and we do not manage their backups so I can't play with the recovery mode.

    Best Regards,

    ~David

  • It can fragment, and you'd need to rebuild the clustered index to fix this. In EE you can do this online.

  • Steve, what is the best way to schedule this say weekly? Is it best to use ALTER INDEX, DBCC DBREINDEX or some other method? Thank you.

    Best Regards,

    ~David

  • How big is the db?

    A blanket, weekly schedule can work, but it can use more resources than needed. There are lots of scripts on this site that calculate fragmentation and then only reindex those items. I'd also schedule this for a low time.

    DBCC is a good way to do it, but be smart about how to call DBCC and for which tables. Have an idea of how long it will take and the impact to your system, which means testing.

  • Steve, there are only 2 tables that are heavily deleted from and inserted into. those are the ones that would need any type of defragmenting on a somewhat regular basis. Thank you kindly for all your help.

    Best Regards,

    ~David

  • David Kranes (2/20/2009)


    Steve, there are only 2 tables that are heavily deleted from and inserted into. those are the ones that would need any type of defragmenting on a somewhat regular basis. Thank you kindly for all your help.

    what is the size of each table? because fragmentation is quite normal for smaller tables and its well over 65% or 70%.

    check the fragmentation percentage after inserting. Ideally over 30% rebuild then or defrag them.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply