I need to delete 20 million rows from a table...

  • I have a table with @ 20 Million rows of data I want to truncate or delete. I don't need any of the data in it anymore at all. I also need to do this on disk space challenged servers. Each row is comprised of 71 characters divided as 35, 35 and 1 in the included columns. Would it be best to:

    - DELETE in batches (don't think so myself)

    - TRUNCATE the table (concerned how many pages get logged in this scenario)

    - Possibly? SCRIPT, DROP and RECREATE the offending table?

    The table itself is not foreign keyed anywhere, nor does it participate in an indexed view. It is not replicated anywhere, although it might be at some point. And yes, I know I need to get after the devs to create a process that limits growth here going forward -- that is not in my province to dictate however. I can merely "suggest" and hope which sucks, but I digress.

    I am an accidental, but long time and fairly capable DBA here where I work. In other words, I ask other experts before I leap on faith alone lol...

    What say you experts out there?

  • If you want to remove all the data in the table, TRUNCATE TABLE.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • TRUNCATE will be faster and be minimally logged (it will only log the page deallocations in the transaction log), if the data isn't needed this is the route to take.

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • TRUNCATE, but be cautious if you want any auto-increment PK columns to not reseed back to the original seed value of the table.

    If that is the case, i would capture what I want the new initial seed to be, then reset it after the truncate.

  • MyDoggieJessie (3/12/2014)


    TRUNCATE will be faster and be minimally logged.

    Fully logged. It's efficiently logged, but it is fully logged, not minimally logged. Minimally logged has a specific definition.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/12/2014)


    MyDoggieJessie (3/12/2014)


    TRUNCATE will be faster and be minimally logged.

    Fully logged. It's efficiently logged, but it is fully logged, not minimally logged. Minimally logged has a specific definition.

    See, it's the logging that concerns me here. I have not a lot of spare disk sapce to have a log grow by leaps and bounds. I realize that DELETE will log each row delete and that will be a problem for sure. As I understand TRUNCATE, it only logs the deletion of PAGES, but I am unsure how many pages I have. How does one go about calculating PAGES give the number of rows and the size of each record? Sorry if it's a dumb question, but any help there would be appreciated.

  • Truncate logs a lot less than delete. It doesn't log the full page images as it doesn't need to. It logs the same amount as a drop table does, just the deallocation information for each page so that it can roll back the truncate if necessary and roll it forward if the server restarts unexpectedly.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You should be able to snag the total pages used from the DMV below:SELECT used_page_count, row_count

    FROM sys.dm_db_partition_stats

    WHERE index_id = 1

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Truncate the table. Thats the best option. Logging is minimized with truncate. But however if you see a need to roll back, use delete. DELETE LOGS ALL NECESSARY DATA FOR ROLLBACK and occupies space. If you dont see a necessity for roll back, hit the truncate.

  • karthik.bj (3/12/2014)


    But however if you see a need to roll back, use delete. DELETE LOGS ALL NECESSARY DATA FOR ROLLBACK and occupies space.

    So does truncate. Truncate can be rolled back exactly as delete can.

    It logs less because it operates at the page level not the row level, but truncate also logs all data necessary for a rollback and can be rolled back if run within a transaction (same as delete).

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/12/2014)


    karthik.bj (3/12/2014)


    But however if you see a need to roll back, use delete. DELETE LOGS ALL NECESSARY DATA FOR ROLLBACK and occupies space.

    So does truncate. Truncate can be rolled back exactly as delete can.

    It logs less because it operates at the page level not the row level, but truncate also logs all data necessary for a rollback and can be rolled back if run within a transaction (same as delete).

    So for my own edification, if the sys.objects record for this table lists 296,000 pages in use, then TRUNCATE would log 296,000 PAGE DELETE actions?

  • About that. There will be other pages (allocation pages) which get deallocated as well, there will be some metadata changes which get logged, but close to that.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/12/2014)


    About that. There will be other pages (allocation pages) which get deallocated as well, there will be some metadata changes which get logged, but close to that.

    Well, thanks for all the helpful insights!

    So I screwed up my courage and hit fire.

    1/2 second later it was all over. 😀

    Shrinking my files down to a manageable level which should solve my disk space crunch a bit.

  • GilaMonster (3/12/2014)


    MyDoggieJessie (3/12/2014)


    TRUNCATE will be faster and be minimally logged.

    Fully logged. It's efficiently logged, but it is fully logged, not minimally logged. Minimally logged has a specific definition.

    I do wish MS would realize there's 3 levels of logging instead of just 2. It would solve the problem of having to explain this all the time.

    Fully Logged

    Paritially Logged (Page deallocation only), can be rolled back)

    Minimially Logged

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Siberian Khatru (3/12/2014)


    Shrinking my files down to a manageable level which should solve my disk space crunch a bit.

    Shrinking your files will also make a narly trainwreck out of your indexes. Shrinking cause nearly 100% fragmentation. If you do a shrink to recover disk space, you should also rebuild all of your indexes which, by the way, will cause your file to grow by slightly larger than your largest index which is usually the largest clustered index of your largest table.

    If you don't need the space for anything else, just rebuild your indexes and call it a day.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 47 total)

You must be logged in to reply to this topic. Login to reply