Database copies (cross environment) and sizing strategies for 2019

  • I'm sure this has been asked many, many, many times and hoping several optimal solutions exist that I can be pointed towards...

    A company I work for has need of production copies being restored down to DEV/TEST environments literally all the time.  So much so that the entire process is automated via tickets.  This all works quite well however, the sizes of the databases in PROD are much bigger than what is needed for testing in DEV/TEST... so herein lies the problem:  Sheer amount of data in the lower ENV's.

    Other than purging data via scripting is there a better way to remove the data once the DB has been restored?  I thought partitioning might be an option but not sure the effort level involved or if it'd work...I have asked if we could just "copy" only what data was needed but was told this was attempted and failed miserably.

    Has anyone come up with a viable solution for "trimmed down copies"?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • It's always a pain.  Have you considered investing in something like "SQL Clone" from RedGate?

    Redgate also has tools to help you modify PII so that it's no longer PII... it'll help your company keep from reading about itself in the morning news. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No matter what, you'll have to do data cleansing, scripts or tools, that much is necessary. Generally, following data cleansing, I rebuild the indexes and then shrink the database, to help ensure that my non-production environment is as small as possible. Then, back that up and use it elsewhere.

    And yeah, my company (full disclosure), Redgate Software, makes a GREAT tool to help with this, SQL Clone. It basically takes advantage of disk virtualization technologies to create an image, about the size of a backup. From the image, you create clones, which are generally, you sitting down, a few megabytes in size, regardless of the actual size of the image (yes, terrabytes to megabytes), depending on open transactions, etc., when the image was made (generally speaking, I'd make the image from the backup of my cleaned database, which means, no running transactions, so it's even smaller).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the response - it's great to see the old names replying all the time 🙂

    I have considered this and we've actually used it and an alternate (SQLSafe Virtual Backup) but with mixed results.  In these applications, the underlying logic hosts a backup from the physical backup location and keeps track of any "changes" locally.  While this technically does work, there is a significant performance hit as these copies are used to change massive amount of updates/deletes... and the more this occurs on the "copy" the worse the performance appears to get.

    I've not used Clone in a few years so perhaps I should give it another shot hoping they've made some update to their internal algorithms that make their magic work?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Probably not. Data modifications have to go somewhere to be persisted. That's locally. As the amount of data you modify grows, more and more data is stored locally. Now, you should only see a minor hit to performance overall, but storage is going to go through the roof. The performance hit is a bit surprising.

    We have another technology coming out shortly based on Kubernetes that works a very different way. However, again, if you modify massive amounts of data, that has to be stored somewhere.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply