SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Urgent: How to shrink this huge DB?


Urgent: How to shrink this huge DB?

Author
Message
halifaxdal
halifaxdal
SSCommitted
SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)

Group: General Forum Members
Points: 1985 Visits: 1744
I inherited a bad designed DB with: 25GB mdf + 99GB ldf.
As I know, most data are redundant and can be removed, question is how? every time I tried to run delete, it timed out, and maybe made the ldf even bigger?

The server has run out of space due to this DB.

Thank you very much
halifaxdal
halifaxdal
SSCommitted
SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)

Group: General Forum Members
Points: 1985 Visits: 1744
I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')

This made the ldf to minimum from 99GB down to 1M

I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:
delete table where id > 20877065
the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?

And how to remove those redundant records?

Thanks.
Bill Talada
Bill Talada
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1190 Visits: 2001
There is an implicit transaction for every delete/insert/update command. Therefore I have much better success doing the following:

set rowcount 10000;
-- do delete here;
set rowcount 0;



When trying to delete a full table in one shot, it requires too many resources (tempdb, logging, memory, etc). Just delete large chunks if you have a wimpy server.
Scott Coleman
Scott Coleman
Hall of Fame
Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)

Group: General Forum Members
Points: 3842 Visits: 1465
Don't try to delete all the rows at once. Even in simple recovery mode, all the activity must be logged for the duration of the statement. Drop batches of rows until there are no more remaining. The optimum batch size depends on row size, server configuration, etc., and may be arrived at through testing.


WHILE 1=1 BEGIN
DELETE TOP (10000) FROM dbo.table WHERE id > 20877065 ;
IF @@ROWCOUNT = 0 BREAK ;
END



An alternate technique is to copy the data you want to keep into a temp table, then truncate the main table and reload it. The TRUNCATE operation is minimally logged, although both record-copying operations will be logged. If much more data is being deleted than being kept, this is a better approach. This assumes there are no foreign key complications, but you may want to drop foreign keys on this table and recreate them after the delete no matter which version you use.


SELECT * INTO #table FROM dbo.table WHERE id <= 20877065 ;
IF @@ROWCOUNT > {expected row count} BEGIN
TRUNCATE TABLE dbo.table ;
INSERT INTO dbo.table WITH(TABLOCKX) SELECT * FROM #table ;
IF @@ROWCOUNT > {expected row count}
DROP TABLE #table ;
END



Other considerations:
The DELETE TOP (N) version works best if "WHERE id > 20877065" is referencing a clustered primary key. If for example you're keeping the first million rows of a fifty-million row table, and id is not the clustered key, it will probably have to table scan through those first million rows every iteration before finding ten thousand rows to delete. Repeat 4900 times. If id is the clustered primary key, it will be able to find the rows to delete very quickly.

If the table is a heap (no clustered index or key), then the TRUNCATE TABLE version is emphatically the best choice.



halifaxdal
halifaxdal
SSCommitted
SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)

Group: General Forum Members
Points: 1985 Visits: 1744
I have been able to resolve this issue:
1. Select enough amount of records into a new table
2. Truncate the original table as most of the records are redundant
3. Delete the original table completely
4. Change the new table back to the old name

The remaining question is:

As I do still need to replicate data from a linked server, I want to know what would be the best way to ONLY Select records from the linked server that are NOT in the existing table.

The table has a unique ID.
halifaxdal
halifaxdal
SSCommitted
SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)

Group: General Forum Members
Points: 1985 Visits: 1744
Thanks For Replies Above.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)

Group: General Forum Members
Points: 87578 Visits: 41117
halifaxdal (10/1/2013)
I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')

This made the ldf to minimum from 99GB down to 1M

I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:
delete table where id > 20877065
the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?

And how to remove those redundant records?

Thanks.


Heh... you move too fast. Shrinking the database just fragmented every index you have to 99%. If you have Point-in-time backups running, you also just broke the log chain by putting the DB into the SIMPLE recovery mode. If you don't have Point-in-Time backups running, you should seriously consider it even if it's "just" a Dev database.

If you're doing deletes and you didn't take a backup just prior to doing the deletes, you may hate yourself in the morning when someone says you deleted to much or you deleted the wrong data.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Scott Coleman
Scott Coleman
Hall of Fame
Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)

Group: General Forum Members
Points: 3842 Visits: 1465
Here are two versions, column lists removed for clarity. Don't rely on "SELECT *" to work across servers.


-- V1: Copy the entire remote table, keep only what you need
INSERT INTO dbo.table
SELECT * FROM linksrv.db.dbo.table
WHERE id NOT IN (SELECT id FROM dbo.table)

-- V2: Get the remote table to read the existing IDs, and only send the records you want
-- Requires linked servers both ways, with working two-hop security
INSERT INTO dbo.table
SELECT * FROM OPENQUERY(sourceserver,
'SELECT * FROM db.dbo.table
WHERE id NOT IN (SELECT id FROM targetserver.db.dbo.table WHERE id IS NOT NULL)')





halifaxdal
halifaxdal
SSCommitted
SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)

Group: General Forum Members
Points: 1985 Visits: 1744
Jeff Moden (10/1/2013)
halifaxdal (10/1/2013)
I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')

This made the ldf to minimum from 99GB down to 1M

I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:
delete table where id > 20877065
the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?

And how to remove those redundant records?

Thanks.


Heh... you move too fast. Shrinking the database just fragmented every index you have to 99%. If you have Point-in-time backups running, you also just broke the log chain by putting the DB into the SIMPLE recovery mode. If you don't have Point-in-Time backups running, you should seriously consider it even if it's "just" a Dev database.

If you're doing deletes and you didn't take a backup just prior to doing the deletes, you may hate yourself in the morning when someone says you deleted to much or you deleted the wrong data.


Thanks for the advise, I am comfortable when I repaired the DB as it is not an important DB and used by only a handful users.
halifaxdal
halifaxdal
SSCommitted
SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)

Group: General Forum Members
Points: 1985 Visits: 1744
Scott Coleman (10/1/2013)
Here are two versions, column lists removed for clarity. Don't rely on "SELECT *" to work across servers.


-- V1: Copy the entire remote table, keep only what you need
INSERT INTO dbo.table
SELECT * FROM linksrv.db.dbo.table
WHERE id NOT IN (SELECT id FROM dbo.table)




This is what I used to do.

"SELECT id FROM dbo.table" will return 200k records from linked server and the local table will contain more than 200k once the local table gets populated, so I am afraid if the efficiency will become an issue in the future.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search