Urgent: How to shrink this huge DB?

Question

Post reply

Urgent: How to shrink this huge DB?

halifaxdal

SSCoach

Points: 19741
More actions
October 1, 2013 at 12:19 pm

#283794

I inherited a bad designed DB with: 25GB mdf + 99GB ldf.
As I know, most data are redundant and can be removed, question is how? every time I tried to run delete, it timed out, and maybe made the ldf even bigger?
The server has run out of space due to this DB.
Thank you very much

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

halifaxdal SSCoach Points: 19741 More actions · Answer 1

I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')

This made the ldf to minimum from 99GB down to 1M

I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:

delete table where id > 20877065

the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?

And how to remove those redundant records?

Thanks.

Bill Talada SSChampion Points: 11954 More actions · Answer 2

There is an implicit transaction for every delete/insert/update command. Therefore I have much better success doing the following:

set rowcount 10000;

-- do delete here;

set rowcount 0;

When trying to delete a full table in one shot, it requires too many resources (tempdb, logging, memory, etc). Just delete large chunks if you have a wimpy server.

Scott Coleman One Orange Chip Points: 27452 More actions · Answer 3

Don't try to delete all the rows at once. Even in simple recovery mode, all the activity must be logged for the duration of the statement. Drop batches of rows until there are no more remaining. The optimum batch size depends on row size, server configuration, etc., and may be arrived at through testing.

WHILE 1=1 BEGIN

DELETE TOP (10000) FROM dbo.table WHERE id > 20877065 ;

IF @@ROWCOUNT = 0 BREAK ;

END

An alternate technique is to copy the data you want to keep into a temp table, then truncate the main table and reload it. The TRUNCATE operation is minimally logged, although both record-copying operations will be logged. If much more data is being deleted than being kept, this is a better approach. This assumes there are no foreign key complications, but you may want to drop foreign keys on this table and recreate them after the delete no matter which version you use.

SELECT * INTO #table FROM dbo.table WHERE id <= 20877065 ;

IF @@ROWCOUNT > {expected row count} BEGIN

TRUNCATE TABLE dbo.table ;

INSERT INTO dbo.table WITH(TABLOCKX) SELECT * FROM #table ;

IF @@ROWCOUNT > {expected row count}

DROP TABLE #table ;

END

Other considerations:

The DELETE TOP (N) version works best if "WHERE id > 20877065" is referencing a clustered primary key. If for example you're keeping the first million rows of a fifty-million row table, and id is not the clustered key, it will probably have to table scan through those first million rows every iteration before finding ten thousand rows to delete. Repeat 4900 times. If id is the clustered primary key, it will be able to find the rows to delete very quickly.

If the table is a heap (no clustered index or key), then the TRUNCATE TABLE version is emphatically the best choice.

halifaxdal SSCoach Points: 19741 More actions · Answer 4

I have been able to resolve this issue:

1. Select enough amount of records into a new table

2. Truncate the original table as most of the records are redundant

3. Delete the original table completely

4. Change the new table back to the old name

The remaining question is:

As I do still need to replicate data from a linked server, I want to know what would be the best way to ONLY Select records from the linked server that are NOT in the existing table.

The table has a unique ID.

halifaxdal SSCoach Points: 19741 More actions · Answer 5

halifaxdal

SSCoach

Points: 19741

October 1, 2013 at 1:31 pm

#1655044

Thanks For Replies Above.

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 6

halifaxdal (10/1/2013)
I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')
This made the ldf to minimum from 99GB down to 1M
I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:
delete table where id > 20877065
the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?
And how to remove those redundant records?
Thanks.

Heh... you move too fast. Shrinking the database just fragmented every index you have to 99%. If you have Point-in-time backups running, you also just broke the log chain by putting the DB into the SIMPLE recovery mode. If you don't have Point-in-Time backups running, you should seriously consider it even if it's "just" a Dev database.

If you're doing deletes and you didn't take a backup just prior to doing the deletes, you may hate yourself in the morning when someone says you deleted to much or you deleted the wrong data.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Scott Coleman One Orange Chip Points: 27452 More actions · Answer 7

Here are two versions, column lists removed for clarity. Don't rely on "SELECT *" to work across servers.

-- V1: Copy the entire remote table, keep only what you need

INSERT INTO dbo.table

SELECT * FROM linksrv.db.dbo.table

WHERE id NOT IN (SELECT id FROM dbo.table)

-- V2: Get the remote table to read the existing IDs, and only send the records you want

-- Requires linked servers both ways, with working two-hop security

INSERT INTO dbo.table

SELECT * FROM OPENQUERY(sourceserver,

'SELECT * FROM db.dbo.table

WHERE id NOT IN (SELECT id FROM targetserver.db.dbo.table WHERE id IS NOT NULL)')

halifaxdal SSCoach Points: 19741 More actions · Answer 8

Jeff Moden (10/1/2013)
halifaxdal (10/1/2013)
I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')
This made the ldf to minimum from 99GB down to 1M
I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:
delete table where id > 20877065
the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?
And how to remove those redundant records?
Thanks.
Heh... you move too fast. Shrinking the database just fragmented every index you have to 99%. If you have Point-in-time backups running, you also just broke the log chain by putting the DB into the SIMPLE recovery mode. If you don't have Point-in-Time backups running, you should seriously consider it even if it's "just" a Dev database.
If you're doing deletes and you didn't take a backup just prior to doing the deletes, you may hate yourself in the morning when someone says you deleted to much or you deleted the wrong data.

Thanks for the advise, I am comfortable when I repaired the DB as it is not an important DB and used by only a handful users.

halifaxdal SSCoach Points: 19741 More actions · Answer 9

Scott Coleman (10/1/2013)
Here are two versions, column lists removed for clarity. Don't rely on "SELECT *" to work across servers.
-- V1: Copy the entire remote table, keep only what you need
INSERT INTO dbo.table
SELECT * FROM linksrv.db.dbo.table
WHERE id NOT IN (SELECT id FROM dbo.table)

This is what I used to do.

"SELECT id FROM dbo.table" will return 200k records from linked server and the local table will contain more than 200k once the local table gets populated, so I am afraid if the efficiency will become an issue in the future.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 10

halifaxdal (10/1/2013)
Scott Coleman (10/1/2013)
Here are two versions, column lists removed for clarity. Don't rely on "SELECT *" to work across servers.
-- V1: Copy the entire remote table, keep only what you need
INSERT INTO dbo.table
SELECT * FROM linksrv.db.dbo.table
WHERE id NOT IN (SELECT id FROM dbo.table)
This is what I used to do.
"SELECT id FROM dbo.table" will return 200k records from linked server and the local table will contain more than 200k once the local table gets populated, so I am afraid if the efficiency will become an issue in the future.

Yes at some point that will start to run slow. Compared to the common alternative of figuring out which records to keep and move across the wire, you will likely find that copying the entire table is faster. Since the two datasets are on separate servers.

An alternative would be to create an ssis package to migrate just the appropriate data and that can be done more efficiently in some cases.

But for the sake of figuring out which data to keep and "migrate" to the other server, it usually works best to have all of the data on the same server so the comparisons, joins, and evaluation can be done.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 11

halifaxdal (10/1/2013)
Jeff Moden (10/1/2013)
halifaxdal (10/1/2013)
I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')
This made the ldf to minimum from 99GB down to 1M
I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:
delete table where id > 20877065
the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?
And how to remove those redundant records?
Thanks.
Heh... you move too fast. Shrinking the database just fragmented every index you have to 99%. If you have Point-in-time backups running, you also just broke the log chain by putting the DB into the SIMPLE recovery mode. If you don't have Point-in-Time backups running, you should seriously consider it even if it's "just" a Dev database.
If you're doing deletes and you didn't take a backup just prior to doing the deletes, you may hate yourself in the morning when someone says you deleted to much or you deleted the wrong data.
Thanks for the advise, I am comfortable when I repaired the DB as it is not an important DB and used by only a handful users.

Heh... I've gotta disagree... if it's worth having, it's worth having right. All databases are important or they can be detached and dropped. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)