Batch move/deletion of records

  • I'm trying to write a stored procedure that selects 100000 records at a time, deletes from 1 table and adds them to another. I need to have a limit on the number of records that get processed each time as I keep blowing the transaction logs as there's a huge number of records to process. What I've come up with is as follows, will this work, it would also be nice to a a running toal, something that says '100000 records processed', '200000 records processed' etc

    WHILE (@@ROWCOUNT > 0)

    BEGIN TRANSACTION

    DECLARE @BatchSize int = 100000

    INSERT TOP (@Batchsize) INTO [DataStaging_Archive]

    SELECT ds.*

    FROM DataStaging ds

    LEFT OUTER JOIN DataStaging_Archive dsa

    ON ds.URN = dsa.URN

    WHERE dsa.URN IS NULL

    AND ds.DateProcessed IS NOT NULL

    DELETE FROM ds

    FROM DataStaging ds

    INNER JOIN DataStaging_Archive dsa

    ON ds.URN = dsa.URN

    EXEC TruncateLog

    IF @@ERROR <> 0

    ROLLBACK TRANSACTION

    ELSE

    COMMIT TRANSACTION

  • Composable dml:

    INSERT INTO DataStaging_Archive (URN, DateProcessed, AnotherColumn, LastRemainingColumn)

    SELECT URN, DateProcessed, AnotherColumn, LastRemainingColumn

    FROM (

    DELETE TOP (@Batchsize) -- note: no order by, rows picked at random, see BOL

    FROM DataStaging ds

    OUTPUT

    deleted.URN,

    deleted.DateProcessed,

    deleted.AnotherColumn

    deleted.LastRemainingColumn

    ) AS D

    The whole statement is atomic: the INSERT and DELETE both succeed or both fail.

    See http://www.sqlmag.com/article/sql-server/composable-dml

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I'm not particularly interested in the transaction logs anyway as I will always have the source data files to reload from if anything goes wrong, so is there any way of switching of logging completely.

  • Maybe something like this

    SET ROWCOUNT 20000

    DELETE from ....

    OUTPUT DELETED.* into ........

    SET ROWCOUNT 0

    Not sure about the running total

  • ron.grace (1/10/2013)


    I'm not particularly interested in the transaction logs anyway as I will always have the source data files to reload from if anything goes wrong, so is there any way of switching of logging completely.

    The only way, AFAIK, to "switch off" logging would be to change the recovery model of the database to bulk logged, which only enables minimal logging for certain bulk insert and insert operations rather than "switching it off" but also significantly affects the backup and restore options available to you for transactions that occur while the recovery model is set to bulk logged.

    I'm curious about why you need a "staging archive" table, though. Since you'll always have the source files to reload if necessary, why move the staging data to an archive table (with all the concomitant workload on the server)?

    Jason Wolfkill

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply