Archiving...INSERT & DELETE

  • ChrisM@Work (9/12/2012)


    ChrisM@home (9/11/2012)


    ...

    If there's no index on it, then that's your problem. Each delete in SPTLog will cause a table scan of the column in SPTLogArchive to ensure that the key doesn't exist there - if it does, SQL Server will throw an error. They won't exist because you are moving rows from one table to the other - but you will still get table scans for every delete. The circular reference will ensure that keys only exist in one table at any one time but it's an expensive process especially without indexes on the FK columns.

    This doesn't quite stack up, and in any case if there were a FK relationship between the tables, then Lynn's code would have raised the following error:

    "Msg 332, Level 16, State 1, Line 41

    The target table 'dbo.SPTLogArchive' of the OUTPUT INTO clause cannot be on either side of a (primary key, foreign key) relationship. Found reference constraint 'FK_SPTLogArchive_SPTLog'."

    However, you've stated that there is a FK relationship somewhere, so here's a code block to work around it, modified from your original code:

    DECLARE @dCutOffDate DATE;

    SET @dCutOffDate = '2008-02-20'; -- or whatever you want your value to be

    ALTER TABLE [dbo].[SPTLog] NOCHECK CONSTRAINT ALL;

    ALTER TABLE [dbo].[SPTLogArchive] NOCHECK CONSTRAINT ALL;

    begin try;

    begin transaction;

    INSERT INTO [dbo].[SPTLogArchive]

    ([LogID]

    ,[LogTime]

    ,[ArchiveTime])

    SELECT

    [LogID]

    ,[LogTime]

    ,GetDate()

    FROM [dbo].[SPTLog]

    WHERE [LogTime] < @dCutOffDate;

    DELETE FROM [dbo].[SPTLog]

    WHERE LogTime < @dCutOffDate;

    commit transaction;

    end try

    begin catch;

    rollback transaction;

    end catch

    ALTER TABLE [dbo].[SPTLog] WITH CHECK CHECK CONSTRAINT ALL;

    ALTER TABLE [dbo].[SPTLogArchive] WITH NOCHECK CHECK CONSTRAINT ALL;

    The code disables the FK constraints before the transaction and enables them again afterwards. Notice that the two enable statements

    ALTER TABLE [dbo].[SPTLog] WITH CHECK CHECK CONSTRAINT ALL;

    ALTER TABLE [dbo].[SPTLogArchive] WITH NOCHECK CHECK CONSTRAINT ALL;

    differ. The second statement (for SPTLogArchive)uses the clause “WITH NOCHECK” which instructs SQL Server NOT to check the new values – if it does, the statement will fail.

    This batch works just fine, moving about 40 thousand of 3 million rows from SPTLog to SPTLogArchive in about a second on my local instance. I've not incorporated "chunking" into this, where subsets of the data are moved (INSERT and DELETE of the same set of keys) within separate transactions to mitigate resource waits.

    Here’s the sample setup I used for the test:

    USE Sandbox

    GO

    DROP TABLE SPTLogArchive

    DROP TABLE SPTLog

    GO

    ----------------------------------

    /* SPTlog */

    ----------------------------------

    CREATE TABLE [dbo].[SPTLog] (

    LogID INT IDENTITY (1,1) NOT NULL CONSTRAINT PK_SPTLog PRIMARY KEY CLUSTERED,

    LogTime DATE NOT NULL)

    -- LogID is redundant - it's already part of the index as the cluster key

    CREATE NONCLUSTERED INDEX [IX_index_SPTLog_LogTime] ON [dbo].[SPTLog]

    ([LogTime] ASC) INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ----------------------------------

    /* SPTLogArchive */

    ----------------------------------

    CREATE TABLE [dbo].[SPTLogArchive] (

    SPTLogArchiveID INT IDENTITY (1,1) NOT NULL CONSTRAINT PK_SPTLogArchive PRIMARY KEY CLUSTERED,

    LogID INT NULL CONSTRAINT FK_SPTLogArchive_SPTLog REFERENCES SPTLog (LogID),

    LogTime DATE NOT NULL,

    ArchiveTime DATE NOT NULL)

    CREATE NONCLUSTERED INDEX [IX_SPTLogArchive_LogTime] ON [dbo].[SPTLogArchive]

    ([LogTime] ASC) INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ----------------------------------

    /* Load sample data into SPTLog */

    ----------------------------------

    SET IDENTITY_INSERT [dbo].[SPTLog] ON

    INSERT INTO [dbo].[SPTLog] (LogID,LogTime)

    SELECT

    NewLogid = ROW_NUMBER() OVER(ORDER BY rn DESC),

    NewDate = DATEADD(minute,0-rn,GETDATE())

    FROM (

    SELECT TOP 3000000 rn = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM SYS.columns a, sys.columns b, sys.columns c

    ) d

    SET IDENTITY_INSERT [dbo].[SPTLog] OFF

    ALTER INDEX PK_SPTLog ON SPTLog REBUILD

    GO

    DON’T use this in a production environment. What we need to see now is the full DDL for both tables.

    Folks, thanks for the help. i have solved the issue, and more importantly, I learned a few new tricks from you. Mine was mostly a case of poor design. In addition to deleting in batches, we added an extra column bDeleted (BIT) to indicate what need to be deleted from the archive table. The column stores a "1" or "0" and has an index.

    I tested the process and it deleted about 50k rows in just under 3 minutes. That may not be the optimal performance, but its a way better than deleting 50k rows in over 1hr.

    Again, thanks for all your input!

  • oscarooko (9/13/2012)


    ...

    Folks, thanks for the help. i have solved the issue, and more importantly, I learned a few new tricks from you. Mine was mostly a case of poor design. In addition to deleting in batches, we added an extra column bDeleted (BIT) to indicate what need to be deleted from the archive table. The column stores a "1" or "0" and has an index.

    I tested the process and it deleted about 50k rows in just under 3 minutes. That may not be the optimal performance, but its a way better than deleting 50k rows in over 1hr.

    Again, thanks for all your input!

    Thanks for the feedback, oscarooko. That's a tremendous improvement and it's fair to say you've achieved it without direct help, only hints and suggestions. It would be interesting to see your final solution - I'm curious as hell to see why it's still taking 3 minutes when it might take a lot less.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Clustered index on [LogTime] should solve the problem.

    _____________
    Code for TallyGenerator

Viewing 3 posts - 31 through 32 (of 32 total)

You must be logged in to reply to this topic. Login to reply