Guidance Appreciated - Safely deleting large number of rows and resizing a DB

  • Hey all,

    Bit of background — I’ve somehow fallen into the classic “overnight DBA” role over the past few years at work, handling general maintenance, failover testing, etc. I wouldn’t call myself a deep SQL internals expert, so I’d really appreciate some guidance from those who know more about the guts of SQL Server 2022.

    I’m currently managing a 3-node HA cluster (auto-sync, auto-sync, manual-async) in production. We’ve recently had a client request their old data be removed — we’re talking about 3TB of data from a 6TB database.

    What’s the safest and most effective way to go about deleting that data and shrinking the disk size? I get that proactive disk sizing is a key part of DBA best practices, but in this case we’re planning to scale down from an 8TB disk to a 5TB one, which should give us another ~2 years of growth post-deletion.

    • Is there a recommended process for this? Delete the data, rebuild indexes/tables, shrink the DB?
    • Would I need to temporarily remove the DB from the Availability Group during the resize, then re-add and resync it?

    Any tips or tricks from those who’ve done something similar would be hugely appreciated!

  • I'm not going to comment on whether to remove db from HA or not - others with more experience on that will likely comment. But if you have a good downtime window that may be better.

    regarding shrinking the database - if you know you are going to fill it within the next 2 years and if you don't need the currently allocated disk space for other purposes I would not shrink it and would leave as is.

    regarding the deletion bit. it depends on table sizes, but for high volumes and when possible I advise the following approach.

    questions that affect what you do.

    do you need to keep a copy of the data being deleted? if so do this in advance of the main deletion.

     

    1. disable/drop any foreign keys referencing table (make sure to keep scripts to recreate them!!)
    2. create a table (xx_staging) with exact same structure as the main table (including any indexes/partition schemas/functions)
    3. switch out main table to xx_staging (this is a metadata operation taking miliseconds)
    4. drop all indexes on main table (perhaps leaving the clustered index (must do if clustered columnstore)
    5. Insert data to keep from xx_staging back to main table
    6. recreate dropped indexes on main table.
    7. recreate any foreign key if/as required.
    8. drop table xx_staging

    start with smaller tables so when you get to the biggest tables you are likely to have enough space on datafile to hold a copy of full big table + data to keep.

    if this process would likely increase the size of your datafile even further then I would create a new database/filegroup to keep a copy of the data to keep and change the steps above slightly.

    if Tempdatabase is used

    1. disable/drop any foreign keys referencing table (make sure to keep scripts to recreate them!!)
    2. on temp database create a table (xx_staging) with exact same structure as the main table - indexes not required

      Insert data to keep from main table to xx_staging

    3. truncate main table
    4. drop all indexes on main table (perhaps leaving the clustered index (must do if clustered columnstore)
    5. Insert data to keep from xx_staging back to main table
    6. recreate dropped indexes on main table.
    7. recreate any foreign key if/as required.
    8. drop table xx_staging

     

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply