Truncate vs Delete for Bulk Removal of (some not all ) records

  • EDIT : the table is not 1gb its 1tb 🙁

     

    Hi

    I have a table with 351,495,000 records (about 1TB in size) . I need to remove a bunch of historical data , approx  219,000,000 records.

    Which be quicker , a delete based on criteria (int column , biggest value 4 digits) , or copy the 219 mil records to a copy of the table, truncate the original and then copy them back.

    No identity columns involved and I'll be setting recovery to SIMPLE.

    Table has a clustered composite key that includes the aforementioned int column.

    Many thanks

    Simon

     

     

    • This topic was modified 8 months, 3 weeks ago by  simon_s.
    • This topic was modified 8 months, 3 weeks ago by  simon_s.
    • This topic was modified 8 months, 3 weeks ago by  simon_s.
  • Another option: copy the rows you wish to keep to a copy of the current table, then swap that table with the current one.

    Also, how did you get J22 into an INT column?

    • This reply was modified 8 months, 3 weeks ago by  Phil Parkin.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • apologies ,cut and paste error , example value 2859 i.e theyre all 4 digit numbers.

     

    Would that be quicker do you think even with having to recreate the nonclustered (& 7 nonclustered) indexes ?

     

     

  • simon_s wrote:

    apologies ,cut and paste error , example value 2859 i.e theyre all 4 digit numbers.

    Would that be quicker do you think even with having to recreate the nonclustered (& 7 nonclustered) indexes ?

    I would not guarantee that it would be quicker. But in my experience, deletions of millions of rows of data are often painfully slow and cause blocking. Let's wait for some other comments.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Is the existing clustered composite key unique?  And explicitly declared to SQL as unique?

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Field names replaced with datatypes , not unique / explicitly declared as unique

    CREATE CLUSTERED INDEX [cl_table] ON [dbo].

    (
    [int] ASC,
    [varchar10] ASC,
    [dateTime] ASC
    )
    )
    GO


     

     

    • This reply was modified 8 months, 3 weeks ago by  simon_s.
  • Testing is your friend. Try Phil's suggestion. It's likely going to be faster just based on the simple idea of data movement. Moving 100 million rows through the copy may, emphasis, be faster than deleting 200 million rows, although, indexes can be a factor here as well.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Well, if the datetime in the clus index is current datetime, you're ok.  Otherwise, not good, since you wouldn't be able to easily tell when copying the "good" rows that you got them all but that you didn't copy any row twice.

    Typically it's best to copy in clus key order, in batches, but if the key is not always ascending, that will be far more difficult to do accurately.

    Do you have any column/set of columns that are guaranteed to be unique in the table?

     

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Unfortunately I dont have the space to test (create a duplicate db/table ), the whole rationale for this exercise is diskspace is full 🙁  , nor can I take it off line . I considered maybe I can do smallset subset copied to another server but that might negate the point of the testing.

  • You can't restore to another server & try to test it there? I'd argue it's important to get something like this right because messing it up could be painful.

     

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Then follow the standard method and delete from the table in batches in clus key order.

    In order to do that, you might first have to go thru the table and store the clus key values of all the rows to be deleted.  Hopefully all rows with the same clus key value are either all deleted or all kept.

    Loop thru the key-values-to-be-deleted table, deleting say TOP (5000) rows each time (or however many rows work out best to delete per query).  Generally I'll add a small delay after every n deletes, for example, wait 1/4 or 1/2 sec after every 50K or 100K deletes.

    If the db is not in simple mode, you'll want to do very frequent log backups are you're deleting to make sure the log doesn't need to grow to log the deletes, esp. since disk space is an issue for you.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • simon_s wrote:

    Unfortunately I dont have the space to test (create a duplicate db/table ), the whole rationale for this exercise is diskspace is full 🙁  , nor can I take it off line . I considered maybe I can do smallset subset copied to another server but that might negate the point of the testing.

    To be honest, disk space is relatively cheap.  I would petition management for an substantial addition of  a new disk.  You do need some headroom both for future growth and to be able to do things as a DBA.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply