Shrinking a large DB

Question

Shrinking a large DB

MiguelSQL

Hall of Fame

Points: 3112
More actions
September 15, 2016 at 8:05 pm

#313340

Hello friends,
I need to reduce a DB from nearly 1 TB down to 200 GB.
Why? and before I get laugh out of this site... ( it's not my idea!)
Customer has this big database, and they want to provide it to developers so they can do their work.
So they ran some scripts to truncate a bunch of tables (hence the space saving) and then run anonymization scripts.
They end up with a large DB mostly empty, and instead of wasting space multiple times, it'll be nice to shrink it.
All of this done on a copy, obviously.
So I wrote a script that takes the size, shrink by 10 GB, takes the size again, shrink by 10 GB, etc, until it reaches the target. http://www.cookingsql.com/2016/01/shrink-a-data-file/
I'm testing this on a 500 GB that has only 50 GB worth of data and here is what I see...
<it does it like 10 times... then>
Start dbcc shrinkfile ( CIR2_Data, 265453 ) WITH NO_INFOMSGS at 2016-09-15 15:53:30.617
Done at 2016-09-15 15:53:34.933
Start dbcc shrinkfile ( CIR2_Data, 255453 ) WITH NO_INFOMSGS at 2016-09-15 15:53:34.953
Done at 2016-09-15 15:53:38.177
Start dbcc shrinkfile ( CIR2_Data, 245453 ) WITH NO_INFOMSGS at 2016-09-15 15:53:38.200
Done at 2016-09-15 16:27:20.847
Start dbcc shrinkfile ( CIR2_Data, 235453 ) WITH NO_INFOMSGS at 2016-09-15 16:27:20.857
Done at 2016-09-15 16:50:14.507
Start dbcc shrinkfile ( CIR2_Data, 225453 ) WITH NO_INFOMSGS at 2016-09-15 16:50:14.527
Done at 2016-09-15 17:23:31.083
Start dbcc shrinkfile ( CIR2_Data, 215453 ) WITH NO_INFOMSGS at 2016-09-15 17:23:31.107
Done at 2016-09-15 17:44:30.120
Start dbcc shrinkfile ( CIR2_Data, 205453 ) WITH NO_INFOMSGS at 2016-09-15 17:44:30.140
Done at 2016-09-15 18:02:04.763
<not a complete picture>
Look at the time stamps.
At first it goes real nice, but then it start to take forever. It ran for 5 hrs before I stopped it and it dropped to 185 GB (72% full).
I know it's heavy IO, the SPID was mostly waiting on Sos_scheduler_yield.
Any better ideas on how to do this?
We would need to do this several times a year (bad idea... I know)
Any way short of migrating tables to a different filegroup and then dropping it? (that would end up with a different database than Prod for whinny developers)
PS: I call this Project George Constanza.
SQL 2008r2 Standard Edition. DB recovery mode was Simple

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

I don't think you need to shrink it to share it with a bunch of developers efficiently. Go get a copy of Redgate's SQL Clone.

http://www.red-gate.com/products/dba/sql-clone/

https://www.simple-talk.com/sql/sql-tools/clone-sweet-clone-database-provisioning-made-easy/

Tell Richard I sent you! 😎

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 2

TheSQLGuru (9/15/2016)
I don't think you need to shrink it to share it with a bunch of developers efficiently. Go get a copy of Redgate's SQL Clone.
http://www.red-gate.com/products/dba/sql-clone/
https://www.simple-talk.com/sql/sql-tools/clone-sweet-clone-database-provisioning-made-easy/
Tell Richard I sent you! 😎

Did you find any caveats, unexpected behavior, or limitations when you used it?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 3

MiguelSQL (9/15/2016)
We would need to do this several times a year (bad idea... I know)

I'm not sure why anyone would think that's a bad idea unless there were some PII or other sensitive information in the database, encrypted or not.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 4

SQL Clone will certainly help, but you'd still want to get the allocated space of the database down prior to using SQL Clone.

If shrink isn't going to work, and it is the best option, then you need to look at migrating the data and structures from the existing database into a second database. That second database can be set to the right size. The only trick would be to batch the operations such that you don't bloat the log.

I'm with Jeff, this is a good idea. Setting up an automated mechanism for getting a working database to development teams is fundamental. Just make sure the data is clean. If you're working with any kind of sensitive data, there are actually prison terms associated with leaking that data with knowledge (and you now have that knowledge, so...).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 5

Jeff Moden (9/15/2016)
TheSQLGuru (9/15/2016)
I don't think you need to shrink it to share it with a bunch of developers efficiently. Go get a copy of Redgate's SQL Clone.
http://www.red-gate.com/products/dba/sql-clone/
https://www.simple-talk.com/sql/sql-tools/clone-sweet-clone-database-provisioning-made-easy/
Tell Richard I sent you! 😎
Did you find any caveats, unexpected behavior, or limitations when you used it?

The product still is not RTM yet, but I have had interactions with the PM and it is pretty straight-forward. It was literally made for the OP's use case. I have sadly not yet had the time to get it implemented in a client for testing purposes. :crying:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 6

I will disagree with others about reducing the data being a good thing. Obfuscation sure, but ONLY where REQUIRED.

The OP stated this database was to be promulgated to developers so they could do their work. They should ABSOLUTELY do development/QA/Testing/PERFORMANCE TESTING using full-size client data. If more companies did that there would (probably) be something less than an infinite amount of work out there for me. 😎 :w00t: :hehe:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

MiguelSQL Hall of Fame Points: 3112 More actions · Answer 7

Thank you for all your help.

I said it was a bad idea because developers should generate their own test data, rater than use a copy of Prod that needs to be anonymized.

Testing should be done with anonymized data.

But there are far more copies of DEV than TEST... so it's a matter of space and time savings for the developers.

I'll take a look at SQL Clone and see if it's something we can do. I'm the new guy so I still need to find out what the new company can do.

Did someone say that prior using SQL Clone we still need to reduce the size?

Any tips on how to reduce the size more efficiently?

Would rebuilding the Index (compacting them) help?

What do you think of the script I wrote? Make sense to reduce size in chunks?

Thank you

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 8

MiguelSQL (9/16/2016)
Thank you for all your help.
I said it was a bad idea because developers should generate their own test data, rater than use a copy of Prod that needs to be anonymized.
Testing should be done with anonymized data.

I'm going to, with respect, disagree. Generated data won't have the same distributions, statistics, quirks, that production data has. A clean copy of the production database is one of the best tools you can provide for your development team.

But there are far more copies of DEV than TEST... so it's a matter of space and time savings for the developers.
I'll take a look at SQL Clone and see if it's something we can do. I'm the new guy so I still need to find out what the new company can do.
Did someone say that prior using SQL Clone we still need to reduce the size?

I did. I work for Redgate and know a little about how SQL Clone works. Actually you don't have to reduce the database size. That was bad typing on my part. You need to do the cleanup prior to setting up the clone. If part of the clean-up is deleting data, then you might want to go ahead and shrink it before creating the clone, but that's not necessary. A big part of how SQL Clone works and why we're doing it is to avoid disk space issues.

Any tips on how to reduce the size more efficiently?
Would rebuilding the Index (compacting them) help?
What do you think of the script I wrote? Make sense to reduce size in chunks?
Thank you

I've never had to do such radical shrinks before, so no, I don't have any good advice there. Sorry.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 9

Grant already addressed the data distribution/development stuff.

I would think there is the potential for iterative shrinks to actually take a lot longer. Shrink works by moving pages and I don't know internally how "smart" or capable it is about that. It could wind up doing work multiple times for some pages. I thus also don't know if rebuilding indexes first would help. If they wound up "magically" getting put on disk from start-of-file-on, then it would be a great thing because you could simply do that and truncate free space in file and poof - as small as possible. I am 99.998% certain it doesn't work this way.

Speaking of moving pages, that means potentially ~100% fragmentation of every object in the database when you are complete with shrink. You should explicitly rebuild every index after the shrink, before you back it up and turn it over to devs.

SQL Clone should reduce the total space needed for X devs to have a "copy" of this one database on their system. The reduction would be approximately DBSize * (DevCount-1).

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

MiguelSQL Hall of Fame Points: 3112 More actions · Answer 10

well... I'm not a developer, so if you all say that this makes sense, who am I to argue w/ you

I'll request getting SQLClone and see how we do.

Thanks

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 11

TheSQLGuru (9/16/2016)
I will disagree with others about reducing the data being a good thing. Obfuscation sure, but ONLY where REQUIRED.
The OP stated this database was to be promulgated to developers so they could do their work. They should ABSOLUTELY do development/QA/Testing/PERFORMANCE TESTING using full-size client data. If more companies did that there would (probably) be something less than an infinite amount of work out there for me. 😎 :w00t: :hehe:

Plus 1000. Couldn't agree more.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)