Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Urgent: How to shrink this huge DB? Expand / Collapse
Author
Message
Posted Tuesday, October 01, 2013 12:19 PM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: 2 days ago @ 8:22 AM
Points: 759, Visits: 1,353
I inherited a bad designed DB with: 25GB mdf + 99GB ldf.
As I know, most data are redundant and can be removed, question is how? every time I tried to run delete, it timed out, and maybe made the ldf even bigger?

The server has run out of space due to this DB.

Thank you very much
Post #1500542
Posted Tuesday, October 01, 2013 12:35 PM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: 2 days ago @ 8:22 AM
Points: 759, Visits: 1,353
I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')

This made the ldf to minimum from 99GB down to 1M

I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:
delete table where id > 20877065
the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?

And how to remove those redundant records?

Thanks.
Post #1500550
Posted Tuesday, October 01, 2013 12:55 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 5:47 AM
Points: 110, Visits: 718
There is an implicit transaction for every delete/insert/update command. Therefore I have much better success doing the following:
set rowcount 10000;
-- do delete here;
set rowcount 0;

When trying to delete a full table in one shot, it requires too many resources (tempdb, logging, memory, etc). Just delete large chunks if you have a wimpy server.
Post #1500556
Posted Tuesday, October 01, 2013 1:26 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 12:24 PM
Points: 2,835, Visits: 1,124
Don't try to delete all the rows at once. Even in simple recovery mode, all the activity must be logged for the duration of the statement. Drop batches of rows until there are no more remaining. The optimum batch size depends on row size, server configuration, etc., and may be arrived at through testing.

WHILE 1=1 BEGIN
DELETE TOP (10000) FROM dbo.table WHERE id > 20877065 ;
IF @@ROWCOUNT = 0 BREAK ;
END

An alternate technique is to copy the data you want to keep into a temp table, then truncate the main table and reload it. The TRUNCATE operation is minimally logged, although both record-copying operations will be logged. If much more data is being deleted than being kept, this is a better approach. This assumes there are no foreign key complications, but you may want to drop foreign keys on this table and recreate them after the delete no matter which version you use.

SELECT * INTO #table FROM dbo.table WHERE id <= 20877065 ;
IF @@ROWCOUNT > {expected row count} BEGIN
TRUNCATE TABLE dbo.table ;
INSERT INTO dbo.table WITH(TABLOCKX) SELECT * FROM #table ;
IF @@ROWCOUNT > {expected row count}
DROP TABLE #table ;
END

Other considerations:
The DELETE TOP (N) version works best if "WHERE id > 20877065" is referencing a clustered primary key. If for example you're keeping the first million rows of a fifty-million row table, and id is not the clustered key, it will probably have to table scan through those first million rows every iteration before finding ten thousand rows to delete. Repeat 4900 times. If id is the clustered primary key, it will be able to find the rows to delete very quickly.

If the table is a heap (no clustered index or key), then the TRUNCATE TABLE version is emphatically the best choice.



Post #1500567
Posted Tuesday, October 01, 2013 1:29 PM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: 2 days ago @ 8:22 AM
Points: 759, Visits: 1,353
I have been able to resolve this issue:
1. Select enough amount of records into a new table
2. Truncate the original table as most of the records are redundant
3. Delete the original table completely
4. Change the new table back to the old name

The remaining question is:

As I do still need to replicate data from a linked server, I want to know what would be the best way to ONLY Select records from the linked server that are NOT in the existing table.

The table has a unique ID.
Post #1500570
Posted Tuesday, October 01, 2013 1:31 PM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: 2 days ago @ 8:22 AM
Points: 759, Visits: 1,353
Thanks For Replies Above.
Post #1500572
Posted Tuesday, October 01, 2013 2:13 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:48 AM
Points: 35,950, Visits: 30,232
halifaxdal (10/1/2013)
I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')

This made the ldf to minimum from 99GB down to 1M

I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:
delete table where id > 20877065
the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?

And how to remove those redundant records?

Thanks.


Heh... you move too fast. Shrinking the database just fragmented every index you have to 99%. If you have Point-in-time backups running, you also just broke the log chain by putting the DB into the SIMPLE recovery mode. If you don't have Point-in-Time backups running, you should seriously consider it even if it's "just" a Dev database.

If you're doing deletes and you didn't take a backup just prior to doing the deletes, you may hate yourself in the morning when someone says you deleted to much or you deleted the wrong data.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1500592
Posted Tuesday, October 01, 2013 2:18 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 12:24 PM
Points: 2,835, Visits: 1,124
Here are two versions, column lists removed for clarity. Don't rely on "SELECT *" to work across servers.

-- V1: Copy the entire remote table, keep only what you need
INSERT INTO dbo.table
SELECT * FROM linksrv.db.dbo.table
WHERE id NOT IN (SELECT id FROM dbo.table)

-- V2: Get the remote table to read the existing IDs, and only send the records you want
-- Requires linked servers both ways, with working two-hop security
INSERT INTO dbo.table
SELECT * FROM OPENQUERY(sourceserver,
'SELECT * FROM db.dbo.table
WHERE id NOT IN (SELECT id FROM targetserver.db.dbo.table WHERE id IS NOT NULL)')




Post #1500594
Posted Tuesday, October 01, 2013 2:32 PM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: 2 days ago @ 8:22 AM
Points: 759, Visits: 1,353
Jeff Moden (10/1/2013)
halifaxdal (10/1/2013)
I did the following to empty ldf: 1. Put the recovery mode to simple; 2. dbcc shrinkdatabase('database_name')

This made the ldf to minimum from 99GB down to 1M

I then tried to delete redundant records in one of the problematic table that has 25877065 records, apparently my command is wrong:
delete table where id > 20877065
the ldf file gets increased immediately and dramatically to GB level and the query is still running. I don't know if it would fail again due to running out of space. Should I continue waiting or should I terminate it?

And how to remove those redundant records?

Thanks.


Heh... you move too fast. Shrinking the database just fragmented every index you have to 99%. If you have Point-in-time backups running, you also just broke the log chain by putting the DB into the SIMPLE recovery mode. If you don't have Point-in-Time backups running, you should seriously consider it even if it's "just" a Dev database.

If you're doing deletes and you didn't take a backup just prior to doing the deletes, you may hate yourself in the morning when someone says you deleted to much or you deleted the wrong data.


Thanks for the advise, I am comfortable when I repaired the DB as it is not an important DB and used by only a handful users.
Post #1500597
Posted Tuesday, October 01, 2013 2:40 PM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: 2 days ago @ 8:22 AM
Points: 759, Visits: 1,353
Scott Coleman (10/1/2013)
Here are two versions, column lists removed for clarity. Don't rely on "SELECT *" to work across servers.

-- V1: Copy the entire remote table, keep only what you need
INSERT INTO dbo.table
SELECT * FROM linksrv.db.dbo.table
WHERE id NOT IN (SELECT id FROM dbo.table)




This is what I used to do.

"SELECT id FROM dbo.table" will return 200k records from linked server and the local table will contain more than 200k once the local table gets populated, so I am afraid if the efficiency will become an issue in the future.
Post #1500600
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse