Urgent: How to shrink this huge DB?

  • I inherited a bad designed DB with: 25GB mdf + 99GB ldf.

    As I know, most data are redundant and can be removed, question is how? every time I tried to run delete, it timed out, and maybe made the ldf even bigger?

    The server has run out of space due to this DB.

    Thank you very much

  • I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')

    This made the ldf to minimum from 99GB down to 1M

    I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:

    delete table where id > 20877065

    the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?

    And how to remove those redundant records?

    Thanks.

  • There is an implicit transaction for every delete/insert/update command. Therefore I have much better success doing the following:

    set rowcount 10000;

    -- do delete here;

    set rowcount 0;

    When trying to delete a full table in one shot, it requires too many resources (tempdb, logging, memory, etc). Just delete large chunks if you have a wimpy server.

  • Don't try to delete all the rows at once. Even in simple recovery mode, all the activity must be logged for the duration of the statement. Drop batches of rows until there are no more remaining. The optimum batch size depends on row size, server configuration, etc., and may be arrived at through testing.

    WHILE 1=1 BEGIN

    DELETE TOP (10000) FROM dbo.table WHERE id > 20877065 ;

    IF @@ROWCOUNT = 0 BREAK ;

    END

    An alternate technique is to copy the data you want to keep into a temp table, then truncate the main table and reload it. The TRUNCATE operation is minimally logged, although both record-copying operations will be logged. If much more data is being deleted than being kept, this is a better approach. This assumes there are no foreign key complications, but you may want to drop foreign keys on this table and recreate them after the delete no matter which version you use.

    SELECT * INTO #table FROM dbo.table WHERE id <= 20877065 ;

    IF @@ROWCOUNT > {expected row count} BEGIN

    TRUNCATE TABLE dbo.table ;

    INSERT INTO dbo.table WITH(TABLOCKX) SELECT * FROM #table ;

    IF @@ROWCOUNT > {expected row count}

    DROP TABLE #table ;

    END

    Other considerations:

    The DELETE TOP (N) version works best if "WHERE id > 20877065" is referencing a clustered primary key. If for example you're keeping the first million rows of a fifty-million row table, and id is not the clustered key, it will probably have to table scan through those first million rows every iteration before finding ten thousand rows to delete. Repeat 4900 times. If id is the clustered primary key, it will be able to find the rows to delete very quickly.

    If the table is a heap (no clustered index or key), then the TRUNCATE TABLE version is emphatically the best choice.

  • I have been able to resolve this issue:

    1. Select enough amount of records into a new table

    2. Truncate the original table as most of the records are redundant

    3. Delete the original table completely

    4. Change the new table back to the old name

    The remaining question is:

    As I do still need to replicate data from a linked server, I want to know what would be the best way to ONLY Select records from the linked server that are NOT in the existing table.

    The table has a unique ID.

  • Thanks For Replies Above.

  • halifaxdal (10/1/2013)


    I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')

    This made the ldf to minimum from 99GB down to 1M

    I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:

    delete table where id > 20877065

    the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?

    And how to remove those redundant records?

    Thanks.

    Heh... you move too fast. Shrinking the database just fragmented every index you have to 99%. If you have Point-in-time backups running, you also just broke the log chain by putting the DB into the SIMPLE recovery mode. If you don't have Point-in-Time backups running, you should seriously consider it even if it's "just" a Dev database.

    If you're doing deletes and you didn't take a backup just prior to doing the deletes, you may hate yourself in the morning when someone says you deleted to much or you deleted the wrong data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here are two versions, column lists removed for clarity. Don't rely on "SELECT *" to work across servers.

    -- V1: Copy the entire remote table, keep only what you need

    INSERT INTO dbo.table

    SELECT * FROM linksrv.db.dbo.table

    WHERE id NOT IN (SELECT id FROM dbo.table)

    -- V2: Get the remote table to read the existing IDs, and only send the records you want

    -- Requires linked servers both ways, with working two-hop security

    INSERT INTO dbo.table

    SELECT * FROM OPENQUERY(sourceserver,

    'SELECT * FROM db.dbo.table

    WHERE id NOT IN (SELECT id FROM targetserver.db.dbo.table WHERE id IS NOT NULL)')

  • Jeff Moden (10/1/2013)


    halifaxdal (10/1/2013)


    I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')

    This made the ldf to minimum from 99GB down to 1M

    I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:

    delete table where id > 20877065

    the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?

    And how to remove those redundant records?

    Thanks.

    Heh... you move too fast. Shrinking the database just fragmented every index you have to 99%. If you have Point-in-time backups running, you also just broke the log chain by putting the DB into the SIMPLE recovery mode. If you don't have Point-in-Time backups running, you should seriously consider it even if it's "just" a Dev database.

    If you're doing deletes and you didn't take a backup just prior to doing the deletes, you may hate yourself in the morning when someone says you deleted to much or you deleted the wrong data.

    Thanks for the advise, I am comfortable when I repaired the DB as it is not an important DB and used by only a handful users.

  • Scott Coleman (10/1/2013)


    Here are two versions, column lists removed for clarity. Don't rely on "SELECT *" to work across servers.

    -- V1: Copy the entire remote table, keep only what you need

    INSERT INTO dbo.table

    SELECT * FROM linksrv.db.dbo.table

    WHERE id NOT IN (SELECT id FROM dbo.table)

    This is what I used to do.

    "SELECT id FROM dbo.table" will return 200k records from linked server and the local table will contain more than 200k once the local table gets populated, so I am afraid if the efficiency will become an issue in the future.

  • halifaxdal (10/1/2013)


    Scott Coleman (10/1/2013)


    Here are two versions, column lists removed for clarity. Don't rely on "SELECT *" to work across servers.

    -- V1: Copy the entire remote table, keep only what you need

    INSERT INTO dbo.table

    SELECT * FROM linksrv.db.dbo.table

    WHERE id NOT IN (SELECT id FROM dbo.table)

    This is what I used to do.

    "SELECT id FROM dbo.table" will return 200k records from linked server and the local table will contain more than 200k once the local table gets populated, so I am afraid if the efficiency will become an issue in the future.

    Yes at some point that will start to run slow. Compared to the common alternative of figuring out which records to keep and move across the wire, you will likely find that copying the entire table is faster. Since the two datasets are on separate servers.

    An alternative would be to create an ssis package to migrate just the appropriate data and that can be done more efficiently in some cases.

    But for the sake of figuring out which data to keep and "migrate" to the other server, it usually works best to have all of the data on the same server so the comparisons, joins, and evaluation can be done.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • halifaxdal (10/1/2013)


    Jeff Moden (10/1/2013)


    halifaxdal (10/1/2013)


    I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')

    This made the ldf to minimum from 99GB down to 1M

    I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:

    delete table where id > 20877065

    the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?

    And how to remove those redundant records?

    Thanks.

    Heh... you move too fast. Shrinking the database just fragmented every index you have to 99%. If you have Point-in-time backups running, you also just broke the log chain by putting the DB into the SIMPLE recovery mode. If you don't have Point-in-Time backups running, you should seriously consider it even if it's "just" a Dev database.

    If you're doing deletes and you didn't take a backup just prior to doing the deletes, you may hate yourself in the morning when someone says you deleted to much or you deleted the wrong data.

    Thanks for the advise, I am comfortable when I repaired the DB as it is not an important DB and used by only a handful users.

    Heh... I've gotta disagree... if it's worth having, it's worth having right. All databases are important or they can be detached and dropped. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply