Shrinking a large DB

  • Hello friends,

    I need to reduce a DB from nearly 1 TB down to 200 GB.

    Why? and before I get laugh out of this site... ( it's not my idea!)

    Customer has this big database, and they want to provide it to developers so they can do their work.

    So they ran some scripts to truncate a bunch of tables (hence the space saving) and then run anonymization scripts.

    They end up with a large DB mostly empty, and instead of wasting space multiple times, it'll be nice to shrink it.

    All of this done on a copy, obviously.

    So I wrote a script that takes the size, shrink by 10 GB, takes the size again, shrink by 10 GB, etc, until it reaches the target. http://www.cookingsql.com/2016/01/shrink-a-data-file/

    I'm testing this on a 500 GB that has only 50 GB worth of data and here is what I see...

    <it does it like 10 times... then>

    Start dbcc shrinkfile ( CIR2_Data, 265453 ) WITH NO_INFOMSGS at 2016-09-15 15:53:30.617

    Done at 2016-09-15 15:53:34.933

    Start dbcc shrinkfile ( CIR2_Data, 255453 ) WITH NO_INFOMSGS at 2016-09-15 15:53:34.953

    Done at 2016-09-15 15:53:38.177

    Start dbcc shrinkfile ( CIR2_Data, 245453 ) WITH NO_INFOMSGS at 2016-09-15 15:53:38.200

    Done at 2016-09-15 16:27:20.847

    Start dbcc shrinkfile ( CIR2_Data, 235453 ) WITH NO_INFOMSGS at 2016-09-15 16:27:20.857

    Done at 2016-09-15 16:50:14.507

    Start dbcc shrinkfile ( CIR2_Data, 225453 ) WITH NO_INFOMSGS at 2016-09-15 16:50:14.527

    Done at 2016-09-15 17:23:31.083

    Start dbcc shrinkfile ( CIR2_Data, 215453 ) WITH NO_INFOMSGS at 2016-09-15 17:23:31.107

    Done at 2016-09-15 17:44:30.120

    Start dbcc shrinkfile ( CIR2_Data, 205453 ) WITH NO_INFOMSGS at 2016-09-15 17:44:30.140

    Done at 2016-09-15 18:02:04.763

    <not a complete picture>

    Look at the time stamps.

    At first it goes real nice, but then it start to take forever. It ran for 5 hrs before I stopped it and it dropped to 185 GB (72% full).

    I know it's heavy IO, the SPID was mostly waiting on Sos_scheduler_yield.

    Any better ideas on how to do this?

    We would need to do this several times a year (bad idea... I know)

    Any way short of migrating tables to a different filegroup and then dropping it? (that would end up with a different database than Prod for whinny developers)

    PS: I call this Project George Constanza.

    SQL 2008r2 Standard Edition. DB recovery mode was Simple

  • I don't think you need to shrink it to share it with a bunch of developers efficiently. Go get a copy of Redgate's SQL Clone.

    http://www.red-gate.com/products/dba/sql-clone/

    https://www.simple-talk.com/sql/sql-tools/clone-sweet-clone-database-provisioning-made-easy/

    Tell Richard I sent you! 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (9/15/2016)


    I don't think you need to shrink it to share it with a bunch of developers efficiently. Go get a copy of Redgate's SQL Clone.

    http://www.red-gate.com/products/dba/sql-clone/

    https://www.simple-talk.com/sql/sql-tools/clone-sweet-clone-database-provisioning-made-easy/

    Tell Richard I sent you! 😎

    Did you find any caveats, unexpected behavior, or limitations when you used it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • MiguelSQL (9/15/2016)


    We would need to do this several times a year (bad idea... I know)

    I'm not sure why anyone would think that's a bad idea unless there were some PII or other sensitive information in the database, encrypted or not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQL Clone will certainly help, but you'd still want to get the allocated space of the database down prior to using SQL Clone.

    If shrink isn't going to work, and it is the best option, then you need to look at migrating the data and structures from the existing database into a second database. That second database can be set to the right size. The only trick would be to batch the operations such that you don't bloat the log.

    I'm with Jeff, this is a good idea. Setting up an automated mechanism for getting a working database to development teams is fundamental. Just make sure the data is clean. If you're working with any kind of sensitive data, there are actually prison terms associated with leaking that data with knowledge (and you now have that knowledge, so...).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Jeff Moden (9/15/2016)


    TheSQLGuru (9/15/2016)


    I don't think you need to shrink it to share it with a bunch of developers efficiently. Go get a copy of Redgate's SQL Clone.

    http://www.red-gate.com/products/dba/sql-clone/

    https://www.simple-talk.com/sql/sql-tools/clone-sweet-clone-database-provisioning-made-easy/

    Tell Richard I sent you! 😎

    Did you find any caveats, unexpected behavior, or limitations when you used it?

    The product still is not RTM yet, but I have had interactions with the PM and it is pretty straight-forward. It was literally made for the OP's use case. I have sadly not yet had the time to get it implemented in a client for testing purposes. :crying:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I will disagree with others about reducing the data being a good thing. Obfuscation sure, but ONLY where REQUIRED.

    The OP stated this database was to be promulgated to developers so they could do their work. They should ABSOLUTELY do development/QA/Testing/PERFORMANCE TESTING using full-size client data. If more companies did that there would (probably) be something less than an infinite amount of work out there for me. 😎 :w00t: :hehe:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thank you for all your help.

    I said it was a bad idea because developers should generate their own test data, rater than use a copy of Prod that needs to be anonymized.

    Testing should be done with anonymized data.

    But there are far more copies of DEV than TEST... so it's a matter of space and time savings for the developers.

    I'll take a look at SQL Clone and see if it's something we can do. I'm the new guy so I still need to find out what the new company can do.

    Did someone say that prior using SQL Clone we still need to reduce the size?

    Any tips on how to reduce the size more efficiently?

    Would rebuilding the Index (compacting them) help?

    What do you think of the script I wrote? Make sense to reduce size in chunks?

    Thank you

  • MiguelSQL (9/16/2016)


    Thank you for all your help.

    I said it was a bad idea because developers should generate their own test data, rater than use a copy of Prod that needs to be anonymized.

    Testing should be done with anonymized data.

    I'm going to, with respect, disagree. Generated data won't have the same distributions, statistics, quirks, that production data has. A clean copy of the production database is one of the best tools you can provide for your development team.

    But there are far more copies of DEV than TEST... so it's a matter of space and time savings for the developers.

    I'll take a look at SQL Clone and see if it's something we can do. I'm the new guy so I still need to find out what the new company can do.

    Did someone say that prior using SQL Clone we still need to reduce the size?

    I did. I work for Redgate and know a little about how SQL Clone works. Actually you don't have to reduce the database size. That was bad typing on my part. You need to do the cleanup prior to setting up the clone. If part of the clean-up is deleting data, then you might want to go ahead and shrink it before creating the clone, but that's not necessary. A big part of how SQL Clone works and why we're doing it is to avoid disk space issues.

    Any tips on how to reduce the size more efficiently?

    Would rebuilding the Index (compacting them) help?

    What do you think of the script I wrote? Make sense to reduce size in chunks?

    Thank you

    I've never had to do such radical shrinks before, so no, I don't have any good advice there. Sorry.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant already addressed the data distribution/development stuff.

    I would think there is the potential for iterative shrinks to actually take a lot longer. Shrink works by moving pages and I don't know internally how "smart" or capable it is about that. It could wind up doing work multiple times for some pages. I thus also don't know if rebuilding indexes first would help. If they wound up "magically" getting put on disk from start-of-file-on, then it would be a great thing because you could simply do that and truncate free space in file and poof - as small as possible. I am 99.998% certain it doesn't work this way.

    Speaking of moving pages, that means potentially ~100% fragmentation of every object in the database when you are complete with shrink. You should explicitly rebuild every index after the shrink, before you back it up and turn it over to devs.

    SQL Clone should reduce the total space needed for X devs to have a "copy" of this one database on their system. The reduction would be approximately DBSize * (DevCount-1).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • well... I'm not a developer, so if you all say that this makes sense, who am I to argue w/ you

    I'll request getting SQLClone and see how we do.

    Thanks

  • TheSQLGuru (9/16/2016)


    I will disagree with others about reducing the data being a good thing. Obfuscation sure, but ONLY where REQUIRED.

    The OP stated this database was to be promulgated to developers so they could do their work. They should ABSOLUTELY do development/QA/Testing/PERFORMANCE TESTING using full-size client data. If more companies did that there would (probably) be something less than an infinite amount of work out there for me. 😎 :w00t: :hehe:

    Plus 1000. Couldn't agree more.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply