I wanted to delete records from a big table.

  • Ed7

    SSCrazy Eights

    Points: 8352

    Hello,

     I have a table with 200 millions records and the table has
         p_Index]  int  IDENTITY(1,1) NOT NULL,
          SessionID] [varchar](1000) NULL,
        p_Line [nvarchar](max) NOT NULL,
        Timestamp [datetime] NOT NULL,
        FixedColumnsInPipeline] [int] NOT NULL,
        pType] [varchar](50) NULL

    I'm not sure how to delete the huge records on this table and I wanted to keep the recent year.

    Please help.

    Thanks in advance for your helps.
    -Edwin

  • Lempster

    SSCoach

    Points: 15476

    Of the 200 million rows, how many are in the current year, i.e. how many do you want to keep?

  • Jonathan AC Roberts

    SSCoach

    Points: 16845

    I would write a script to delete the rows in small defined batches, something like this:

    DECLARE @Batchsize INT
    SET @Batchsize = 5000; /* Change to whatever is a suitable number of rows to delete in the inner loop */

    IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
        DROP TABLE #RowsToDelete;

    IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
        DROP TABLE #RowsDeleted;

    CREATE TABLE #RowsDeleted(p_Index INT NOT NULL PRIMARY KEY CLUSTERED );

    /* Store all the p_Index that are over 1 year old in a temporary table */
    SELECT t.p_Index
    INTO #RowsToDelete
    FROM myTable t
    WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());

    ALTER TABLE #RowsToDelete
      ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(p_Index);

    DECLARE @RowCount INT;
        SET @RowCount = @Batchsize;
    /* Delete in batches */
    WHILE @RowCount = @Batchsize BEGIN
        DELETE TOP (@Batchsize) t
        OUTPUT deleted.p_Index INTO #RowsDeleted
          FROM myTable t
         WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.p_Index = t.p_Index);

        SET @RowCount = @@ROWCOUNT;

        DELETE rtd
          FROM #RowsToDelete rtd
         WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.p_Index = rtd.p_Index);

        TRUNCATE TABLE #RowsDeleted;

    END;
    GO
    DROP TABLE #RowsDeleted;
    GO
    DROP TABLE #RowsToDelete;
    GO

  • Lempster

    SSCoach

    Points: 15476

    Jonathan AC Roberts - Monday, February 11, 2019 10:56 AM

    I would write a script to delete the rows in small defined batches, something like this:

    DECLARE @Batchsize INT
    SET @Batchsize = 5000; /* Change to whatever is a suitable number of rows to delete in the inner loop */

    IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
        DROP TABLE #RowsToDelete;

    IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
        DROP TABLE #RowsDeleted;

    CREATE TABLE #RowsDeleted(p_Index INT NOT NULL PRIMARY KEY CLUSTERED );

    /* Store all the p_Index that are over 1 year old in a temporary table */
    SELECT t.p_Index
    INTO #RowsToDelete
    FROM myTable t
    WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());

    ALTER TABLE #RowsToDelete
      ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(p_Index);

    DECLARE @RowCount INT;
        SET @RowCount = @Batchsize;
    /* Delete in batches */
    WHILE @RowCount <> @Batchsize BEGIN
        DELETE TOP (@Batchsize) t
        OUTPUT deleted.p_Index INTO #RowsDeleted
          FROM myTable t
         WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.p_Index = t.p_Index);

        SET @RowCount = @@ROWCOUNT;

        DELETE rtd
          FROM #RowsToDelete rtd
         WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.p_Index = rtd.p_Index);

        TRUNCATE TABLE #RowsDeleted;

    END;
    GO
    DROP TABLE #RowsDeleted;
    GO
    DROP TABLE #RowsToDelete;
    GO

    I agree with this approach, but i wanted to know how many rows the OP wanted to keep because my thinking was, if it's a small proportion of the total number of rows, it might be quicker to create a new table, insert the rows to keep into the new table, then truncate the original table and repopulate it with the data inserted into the new table.
    Would need to use IDENTITY_INSERT, but not difficult.
    Regards
    Lempster

  • Jonathan AC Roberts

    SSCoach

    Points: 16845

    Lempster - Tuesday, February 12, 2019 7:27 AM

    Jonathan AC Roberts - Monday, February 11, 2019 10:56 AM

    I would write a script to delete the rows in small defined batches, something like this:

    DECLARE @Batchsize INT
    SET @Batchsize = 5000; /* Change to whatever is a suitable number of rows to delete in the inner loop */

    IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
        DROP TABLE #RowsToDelete;

    IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
        DROP TABLE #RowsDeleted;

    CREATE TABLE #RowsDeleted(p_Index INT NOT NULL PRIMARY KEY CLUSTERED );

    /* Store all the p_Index that are over 1 year old in a temporary table */
    SELECT t.p_Index
    INTO #RowsToDelete
    FROM myTable t
    WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());

    ALTER TABLE #RowsToDelete
      ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(p_Index);

    DECLARE @RowCount INT;
        SET @RowCount = @Batchsize;
    /* Delete in batches */
    WHILE @RowCount <> @Batchsize BEGIN
        DELETE TOP (@Batchsize) t
        OUTPUT deleted.p_Index INTO #RowsDeleted
          FROM myTable t
         WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.p_Index = t.p_Index);

        SET @RowCount = @@ROWCOUNT;

        DELETE rtd
          FROM #RowsToDelete rtd
         WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.p_Index = rtd.p_Index);

        TRUNCATE TABLE #RowsDeleted;

    END;
    GO
    DROP TABLE #RowsDeleted;
    GO
    DROP TABLE #RowsToDelete;
    GO

    I agree with this approach, but i wanted to know how many rows the OP wanted to keep because my thinking was, if it's a small proportion of the total number of rows, it might be quicker to create a new table, insert the rows to keep into the new table, then truncate the original table and repopulate it with the data inserted into the new table.
    Would need to use IDENTITY_INSERT, but not difficult.
    Regards
    Lempster

    It might be but the way I read it was it might be put into a regular job to keep only 1 year's data on the table.
    In which case a script that could be run on a regular basis would be better. Have to wait and see what the OP had in mind.

  • Ed7

    SSCrazy Eights

    Points: 8352

    I'm not hundred percent sure on the historian data that the business wanted to keep.   My best guest is one year.
    Also, I did some research and some suggested to:
    .1. Create a table and insert  the historian data into a new table.
    .2. Truncated the old table
    .3. Rename the new table to old table.

  • Ed7

    SSCrazy Eights

    Points: 8352

    I'm reviewing Jonathan T-sql scripts and update shortly with questions.

    Many thanks for Jonathan AC Robert and everyone helps.  I'm very appreciated.

  • Jonathan AC Roberts

    SSCoach

    Points: 16845

    Jonathan AC Roberts - Monday, February 11, 2019 10:56 AM

    I would write a script to delete the rows in small defined batches, something like this:

    DECLARE @Batchsize INT
    SET @Batchsize = 5000; /* Change to whatever is a suitable number of rows to delete in the inner loop */

    IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
        DROP TABLE #RowsToDelete;

    IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
        DROP TABLE #RowsDeleted;

    CREATE TABLE #RowsDeleted(p_Index INT NOT NULL PRIMARY KEY CLUSTERED );

    /* Store all the p_Index that are over 1 year old in a temporary table */
    SELECT t.p_Index
    INTO #RowsToDelete
    FROM myTable t
    WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());

    ALTER TABLE #RowsToDelete
      ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(p_Index);

    DECLARE @RowCount INT;
        SET @RowCount = @Batchsize;
    /* Delete in batches */
    WHILE @RowCount = @Batchsize BEGIN
        DELETE TOP (@Batchsize) t
        OUTPUT deleted.p_Index INTO #RowsDeleted
          FROM myTable t
         WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.p_Index = t.p_Index);

        SET @RowCount = @@ROWCOUNT;

        DELETE rtd
          FROM #RowsToDelete rtd
         WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.p_Index = rtd.p_Index);

        TRUNCATE TABLE #RowsDeleted;

    END;
    GO
    DROP TABLE #RowsDeleted;
    GO
    DROP TABLE #RowsToDelete;
    GO

    Sorry, correction needed to the script. It should read:
    WHILE @RowCount = @Batchsize BEGIN
    I had <> instead of =

  • Ed7

    SSCrazy Eights

    Points: 8352

    Thank you, Jonanthan.  Very appreciated.

  • Ed7

    SSCrazy Eights

    Points: 8352

    I'm not hundred percent sure after I modified  your scripts and executed.  It returned zero records.

    DECLARE @Batchsize INT
    SET @Batchsize = 50; /* Change to whatever is a suitable number of rows to delete in the inner loop */

    IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL    DROP TABLE #RowsToDelete;

    IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
        DROP TABLE #RowsToDelete;

    IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL    DROP TABLE #RowsDeleted;

    IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
        DROP TABLE #RowsDeleted;

    CREATE TABLE #RowsDeleted(Pipeline_Index INT NOT NULL PRIMARY KEY CLUSTERED );

    CREATE TABLE #RowsDeleted(Pipeline_Index INT NOT NULL PRIMARY KEY CLUSTERED );

    /* Store all the Pipeline_Index that are over 1 year old in a temporary table */SELECT t.Pipeline_Index  INTO #RowsToDelete  FROM dbo.Pipeline_s t WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());

    /* Store all the Pipeline_Index that are over 1 year old in a temporary table */

    SELECT t.Pipeline_Index
      INTO #RowsToDelete
      FROM dbo.Pipeline_s t
     WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());

    ALTER TABLE #RowsToDelete  ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(Pipeline_Index);

    ALTER TABLE #RowsToDelete
      ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(Pipeline_Index);

    DECLARE @RowCount INT;    SET @RowCount = @Batchsize;/* Delete in batches */WHILE @RowCount = @Batchsize BEGIN    DELETE TOP (@Batchsize) t    OUTPUT deleted.Pipeline_Index INTO #RowsDeleted      FROM dbo.Pipeline_s t     WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.Pipeline_Index = t.Pipeline_Index);

    DECLARE @RowCount INT;
        SET @RowCount = @Batchsize;
    /* Delete in batches */
    WHILE @RowCount = @Batchsize
    BEGIN
        DELETE TOP (@Batchsize) t
        OUTPUT deleted.Pipeline_Index INTO #RowsDeleted
          FROM dbo.Pipeline_s t
         WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.Pipeline_Index = t.Pipeline_Index);

        SET @RowCount = @@ROWCOUNT;

        SET @RowCount = @@ROWCOUNT;

        DELETE rtd      FROM #RowsToDelete rtd     WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.Pipeline_Index = rtd.Pipeline_Index);

        DELETE rtd
          FROM #RowsToDelete rtd
         WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.Pipeline_Index = rtd.Pipeline_Index);

        TRUNCATE TABLE #RowsDeleted;

        TRUNCATE TABLE #RowsDeleted;

    END;GODROP TABLE #RowsDeleted;GO

    END;
    GO
    DROP TABLE #RowsDeleted;
    GO

  • Jonathan AC Roberts

    SSCoach

    Points: 16845

    Ed7 - Tuesday, February 12, 2019 5:10 PM

    I'm not hundred percent sure after I modified  your scripts and executed.  It returned zero records.

    DECLARE @Batchsize INT
    SET @Batchsize = 50; /* Change to whatever is a suitable number of rows to delete in the inner loop */

    IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL    DROP TABLE #RowsToDelete;

    IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
        DROP TABLE #RowsToDelete;

    IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL    DROP TABLE #RowsDeleted;

    IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
        DROP TABLE #RowsDeleted;

    CREATE TABLE #RowsDeleted(Pipeline_Index INT NOT NULL PRIMARY KEY CLUSTERED );

    CREATE TABLE #RowsDeleted(Pipeline_Index INT NOT NULL PRIMARY KEY CLUSTERED );

    /* Store all the Pipeline_Index that are over 1 year old in a temporary table */SELECT t.Pipeline_Index  INTO #RowsToDelete  FROM dbo.Pipeline_s t WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());

    /* Store all the Pipeline_Index that are over 1 year old in a temporary table */

    SELECT t.Pipeline_Index
      INTO #RowsToDelete
      FROM dbo.Pipeline_s t
     WHERE t.Timestamp < DATEADD(yy, -1, GETDATE());

    ALTER TABLE #RowsToDelete  ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(Pipeline_Index);

    ALTER TABLE #RowsToDelete
      ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(Pipeline_Index);

    DECLARE @RowCount INT;    SET @RowCount = @Batchsize;/* Delete in batches */WHILE @RowCount = @Batchsize BEGIN    DELETE TOP (@Batchsize) t    OUTPUT deleted.Pipeline_Index INTO #RowsDeleted      FROM dbo.Pipeline_s t     WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.Pipeline_Index = t.Pipeline_Index);

    DECLARE @RowCount INT;
        SET @RowCount = @Batchsize;
    /* Delete in batches */
    WHILE @RowCount = @Batchsize
    BEGIN
        DELETE TOP (@Batchsize) t
        OUTPUT deleted.Pipeline_Index INTO #RowsDeleted
          FROM dbo.Pipeline_s t
         WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.Pipeline_Index = t.Pipeline_Index);

        SET @RowCount = @@ROWCOUNT;

        SET @RowCount = @@ROWCOUNT;

        DELETE rtd      FROM #RowsToDelete rtd     WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.Pipeline_Index = rtd.Pipeline_Index);

        DELETE rtd
          FROM #RowsToDelete rtd
         WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.Pipeline_Index = rtd.Pipeline_Index);

        TRUNCATE TABLE #RowsDeleted;

        TRUNCATE TABLE #RowsDeleted;

    END;GODROP TABLE #RowsDeleted;GO

    END;
    GO
    DROP TABLE #RowsDeleted;
    GO

    I don't think the code you have pasted in can be the code you have. A lot of the lines are doubled-up in your code.
    Can you paste in your code without the doubled up lines and check it has copied correctly?

  • Ed7

    SSCrazy Eights

    Points: 8352

    My apologies for very late reply.
    I modified and updated your scripts and it works and returned records at a time.
    Then, I setup and create SQL Agent job to run your scripts.
    Steps by steps to create SQL Agent job:
    1. Created a SQL Agent job = _DelTableA
    2  Created a step1 - DelTableA
    3 Created a schedule to run daily at 1am

    Here are the scripts
    ==============
    DECLARE @Batchsize INT
    SET @Batchsize = 5; /* Change to whatever is a suitable number of rows to delete in the inner loop */

    IF OBJECT_ID('tempdb..#RowsToDelete', 'U') IS NOT NULL
        DROP TABLE #RowsToDelete;
    IF OBJECT_ID('tempdb..#RowsDeleted', 'U') IS NOT NULL
        DROP TABLE #RowsDeleted;
    CREATE TABLE #RowsDeleted (
     Pipeline_Index INT NOT NULL PRIMARY KEY CLUSTERED)--, /* Store all the Pipeline_Index that are over 1 year old in a temporary table */
    SELECT  Pipeline_Index
      INTO #RowsToDelete
      FROM dbo.Pipeline_s t
     WHERE [Timestamp] > (GETDATE()-7)ALTER TABLE #RowsToDelete
      ADD CONSTRAINT PK_#RowsToDelete PRIMARY KEY CLUSTERED(Pipeline_Index);
    DECLARE @RowCount INT;
        SET @RowCount = @Batchsize;
    /* Delete in batches */
    WHILE @RowCount = @Batchsize
    BEGIN
        DELETE TOP (@Batchsize) t
        OUTPUT deleted.Pipeline_Index INTO #RowsDeleted
          FROM dbo.Pipeline_s t
         WHERE EXISTS(SELECT * FROM #RowsToDelete r WHERE r.Pipeline_Index = t.Pipeline_Index);
        SET @RowCount = @@ROWCOUNT;
        DELETE rtd
          FROM #RowsToDelete rtd
         WHERE EXISTS (SELECT * FROM #RowsDeleted rd WHERE rd.Pipeline_Index = rtd.Pipeline_Index);
        TRUNCATE TABLE #RowsDeleted;
    END;
    GO
    DROP TABLE #RowsDeleted;
    GO
    DROP TABLE #RowsToDelete;
    GO

    Questions:
    What is mean the batchsize = 50?   Would you please explain and clarify?

    Many thanks,
    Edwin

  • Jonathan AC Roberts

    SSCoach

    Points: 16845

    @Batchsize is the number of rows it deletes each time it goes round the inner loop. You need to keep this value small enough so it doesn't take too long in each iteration of the loop. This is so other queries can access the table.
    I'm a bit concerned about your query:
    SELECT Pipeline_Index
      INTO #RowsToDelete
      FROM dbo.Pipeline_s t
     WHERE [Timestamp] > (GETDATE() - 7);

    This will delete rows that are newer than one week old, not rows over a year old!
    I think you meant to have something like:
    SELECT Pipeline_Index
      INTO #RowsToDelete
      FROM dbo.Pipeline_s t
     WHERE [Timestamp] < DATEADD(yy,-1,GETDATE);

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply