Sandbox ideas - minimize disk use

  • I have been given a few tasks to get an independent development platform setup. There have been a few options I have been sifting through and I am implementing a RedGate Virtual restore possible solution at the moment for some testing. Unfortunately I am not fully satisfied with my research results, testing, and what I have come across and am just looking for any input on what anyone else has done for the scenario.

    What I would like as an end result:

    Each developer is able to connect to one single source for a server instance. Each transaction or alteration they do in no way affects other developers. All transactions and commands/object alterations are independent of one another. Each developer can go back to the full restore taken monthly as quick as possible without a need to completely restore all databases. Management of this can be done easily from one centralized location.

    Theory in accomplishment, separated from end goal:

    The database is in a "snapshot" mode that logs the host or ip transactions to an individual log. The snapshot mode would use a trace on transactions to have a single database to work from as a base and each user connecting would have some identifier or flag to say HOW each transaction group would be separated. One instance could separate based on user login (for application testing) and another could go based on host or ip (or another use of login credentials). The snapshot would remain unaffected, each connector would see only what they had been working on and at any time they could "flush" their snapshot cache back to the original.

    Deployment spread:

    Look at in terms of 5 developers, fastest restore time for initial database, fastest mass change roll back, multiple databases connecting to one another (database for web accesses common database and also orders database to get orders for login information - so just one database connection would not be sufficient). Putting a terabyte of data on each computer would not be the best solution and having restores independently on all computers would be cumbersome to maintain.

    What I have to work with currently:

    I am looking for a centralized server with one source in data, to ease control of restores. I have a sandbox server that restores monthly. After it restores, it runs a bulk update to a detached sandbox data warehouse independent of the other restores. After of which, I would be able to snapshot all the databases to have a quick roll back point.

    What I have come up with as closest solution to what I am looking to obtain (just not feeling good about this solution though):

    On the sandbox server would be an instance for each developer, plus the one needed for the data warehouse testing for my separate needs and the architect or lead developer. Each developer would then have their own independent version of each of the 15 or so databases that wrap into one another for various components of the applications that consume them. All transactions and changes would be independent for one another. I could do the monthly restore to all instances, take a snapshot on each instance so a job on the instance could easily be executed to roll back the developer's database back to the restore point as quickly as possible. To save on space on restores, I would have a central backup pushed to the machine as is done already and instead of full restores, use the RedGate virtual restore tool to create the databases, snapshot on the virtual (if possible - still testing), and be able to complete the operations in much less time than a full restore as well as save on space. Each developer could then alter their own workstation connection strings to point to their personal instance. Everything would have a solitary maintenance plan, as quick as possible, as flexible as possible, and keep changes done by developers as far away from one another as possible. Everyone get's their own, clean, sandbox, easy to roll back and deploy to from a good start point, from a fast* and dedicated* sql test box.

    Making sense? Hope so because I hate typing long winded explanations. I am hoping I found the best approach, but even more hoping there is an easier way for the end result I missed along the way.

  • Here's how I would approach it if I had a large database and wanted to meet your requirements.

    Instances:

    TestServ1 -- infrastructure, not used by developers.

    DevServ1, DevServ2, etc -- 1 instance per developer. Could be local on their machines or multiple instances on 1 beefy sql server.

    Process for the DBA (automated of course):

    1. Restore production DB backup to TestServ1

    2. do any cleanup required to sanitize the db and scrub personal info (credit cards, ssn's, etc)

    3. If possible, prune the database size by deleting data (e.g. all transactions over 1 month old, all transactions outside of the NW region, etc) developers rarely need all of the data.

    4. Take a backup of the now small/clean db.

    5. repeat steps 1-4 monthly.

    process for Developer 1:

    1. restore backup from step 4 above to DevServ1

    2. Immediately create a database snapshot of the db.

    3. program/test/program/test/insert/delete/update/test/etc.

    4. As needed, create a new database snapshot.

    5. As needed, restore to a previous database snapshot.

    6. As needed, goto step one and start with a fresh copy.

    You could expose a set of sp's in TestServ1 to the developer that would handle multiple databases, restores, snapshots, logging, etc.

  • The one wrinkle that I might add to it would be to set up these servers as a virtual and then copy that out to the developers workstations, but I have no idea how Virtual Restore would work in that situation.

    Other than that, I think you're on the track that I've been advocating for SQL Virtual Restore since I joined Red Gate back in the spring. What issues are you hitting with it specifically?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • What kind of development are they doing?

    If it's anything like my shop, all our developers really need to use a common database for the common application we develop;

    for example, If one developer adds a table, then updates the applications dataset and load, then checks it into TFS/Sourcesafe/SVN/Whatever, unless the other developers now get hold of the database scripts and run them on their own instances, their application will end up failing during runtime or data load because of missing tables.

    with each developer slapping database changes in that they see fit for their slice of the application, they will contiually break each others and have to stop and sync 5 database shemas together

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • SpringTownDBA (11/30/2011)


    Here's how I would approach it if I had a large database and wanted to meet your requirements.

    ....

    I see this as a good approach and one that we had reviewed. Since we have so many databases interwoven, the data would need a lot of scrubbing between the databases to clean everything in a good way. I want to avoid developer confusion if they hit bad data, and I see benefit in having a production data set to work with so developers could test items seen in the past. This is still not ruled out as an option, but is #2 in line due to licensing and maintaining each development machine outside centralized management, as well as the allotment in time spent scrubbing all the interwoven data. Like the idea for sure.

    I am not hitting any issues specifically with Virtual restore. Just setting up and playing with trial "now" (delayed a week due to the unforeseen). I have weighed in several approaches used in the past - used or seen over the years. Really just looking for some friendly opinions on the matter such as the above.

    Development structure: Web, VB, .NET migrated tools, EDI, and several other 3rd party applications as well as SQL Agent jobs scrubbing for records to process. Each developer might have work that would co-mingle with another's data BUT the majority of projects are small development "tweeks". With so many little tweaks, testing the tweaks with data manipulation can be helpful at times. Being able to reset after a large data manipulation batch would be beneficial, but one developer resetting the database might conflict with another in the middle of the second developer's work. There is also a deployment strategy being put in as part of this. We could test deployments in batch form, easily reset everything with the snapshot, and do the whole thing again so we knew it would be deployed easily.

    Side note (have work around):

    We use table functions with a select * in them (for reporting structuring). If one developer made a table column change, it could stop anyone using the instance they were one with a multitude of bugs. Anyone still reading and not familiar with some select * issues - stored procedure plans automatically invalidate and recompile on statistic changes to a table such as when a column is added. Views and functions do not do this automatically. They just cause problems, sometimes hard to spot.

    I came in thinking I could bring an easy solution doing a query through dependencies to refresh needed objects but when I did I hit a bug that had potential to cause some damage if the sp procedure was used (http://connect.microsoft.com/SQLServer/feedback/details/644572/sp-refreshsqlmodule-changes-object-definition).

    One of a many reasons to have independent working sets that can easily be reset without affecting all developers.

    The closest solution to obtain the easy reset, independent working data sets, central management, low cost (we need a CPU based license for central Stage server anyway and multiple instances are allowed - in certain ways), smallest footprint, etc, is the long winded, above method. The second was the one posted by SpringTownDBA.

    It just seems sloppy, and does not have as small a footprint as I would optimally like. *sounding like a kid yelling more candy more candy - without paying for it*.

    Thanks for the input, all! Sorry to make you read through such a load. I have had my eye on virtual restore for a few months and finally get to play with it soon.

  • Having done the method of trying to maintain "clean" database copies that are then moved around for restore purposes and having had four (4) times the amount of disk storage for development that we had for production because of the sprawl of systems everywhere, being able to link multiple databases to a single backup doesn't seem inelligent. It seems very elligent, but you'll still have to do a bunch of work. TANSTAAFL always applies.

    Now, how are you managing source control integration?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The "sloppy" part is the centralization location having spun up so many instances and the need for instance regulation. Having one backup file to create several points that can connect to it that effect only the single connection, I agree, is very nice.

    If I have to go past whatever number of instances I currently can host EFFECTIVELY on one box, the solution is not as easily scalable. I could setup a shared storage solution for the backup with a separate instance to manage jobs to child "sandpits", presumably, that would each consume from the shared location to create whatever number of instances needed. Another sloppy part is if a new developer comes in, I need to create and setup a new instance. If one were to leave, I would need to fear the removal of the instance with the un-installation procedure for that instance (yes it only goes by instance, but I still cringe when I do it). If there were just one instance, controlled automatically that each person used the same database but had their own version of a transaction log that could roll back to the virtual database that was restored from - it would be a smaller footprint and less to manage. The problem then is how to control separation of the transaction logging whether by ip, or host, or user defined, or what.

    Possibly over thinking the need to scale the solution to a large extreme. The virtual restore I like, but want more - of coarse. I am curious if there was something out there I missed that would give me the easier management and smaller footprint design than what the SQL engine itself provides. The closest I could come to the easiest to manage platform would be using virtual backup restores on multiple instances, all tied on one location that would be easy to manage centrally possibly even using a CMS to control the initial restores.

    Version control is stage 2 of the environment. I have been viewing the RedGate tools but it is not in scope as of yet. I still need to see if I can find a way to achieve working around multiple instances, even if they reside on one host. If not, I need to test that the Virtual Restore will minimize footprint enough with all the instances and then price out licensing for the new environment. Then, we will be looking at the integration with a subversion mechanism. I have my eye on the RedGate tool, but am not there yet.

    How I hope at some point I can tune down the datalength(response), but my head is currently swimming in ideas to try and get things as simple as possible.

  • matt.newman (11/30/2011)


    The "sloppy" part is the centralization location having spun up so many instances and the need for instance regulation. Having one backup file to create several points that can connect to it that effect only the single connection, I agree, is very nice.

    If I have to go past whatever number of instances I currently can host EFFECTIVELY on one box, the solution is not as easily scalable. I could setup a shared storage solution for the backup with a separate instance to manage jobs to child "sandpits", presumably, that would each consume from the shared location to create whatever number of instances needed. Another sloppy part is if a new developer comes in, I need to create and setup a new instance. If one were to leave, I would need to fear the removal of the instance with the un-installation procedure for that instance (yes it only goes by instance, but I still cringe when I do it). If there were just one instance, controlled automatically that each person used the same database but had their own version of a transaction log that could roll back to the virtual database that was restored from - it would be a smaller footprint and less to manage. The problem then is how to control separation of the transaction logging whether by ip, or host, or user defined, or what.

    We were doing it with 3 instances (2000, 2005, 2008) inside of a single virtual machine and it was the virtual machines that we spun up over & over (hundreds of them). It worked pretty well (except for the horrific disk space requirements, oh, and deployments of deltas for ever environment was a nightmare). But, again, I haven't tested that approach with Virtual Restore. It might be problematic.

    Possibly over thinking the need to scale the solution to a large extreme. The virtual restore I like, but want more - of coarse. I am curious if there was something out there I missed that would give me the easier management and smaller footprint design than what the SQL engine itself provides. The closest I could come to the easiest to manage platform would be using virtual backup restores on multiple instances, all tied on one location that would be easy to manage centrally possibly even using a CMS to control the initial restores.

    You could try using SAN replication, duplicating the drives over & over... not sure it'll be better, worse, whatever, but certainly more expensive.

    Version control is stage 2 of the environment. I have been viewing the RedGate tools but it is not in scope as of yet. I still need to see if I can find a way to achieve working around multiple instances, even if they reside on one host. If not, I need to test that the Virtual Restore will minimize footprint enough with all the instances and then price out licensing for the new environment. Then, we will be looking at the integration with a subversion mechanism. I have my eye on the RedGate tool, but am not there yet.

    How I hope at some point I can tune down the datalength(response), but my head is currently swimming in ideas to try and get things as simple as possible.

    Sounds like you're solving tough problems. They're not going to go into 140 characters so you can post the questions on twitter.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • We use TFS 2010 lab manager to help out with this. We had to make a seperate Hyper V server environment to support it, but it was worth it.

    We were able to use this to automate the creation and restore of all server/test sytem images needed for development and testing.

    Now some of the systems are nothing you could consider fast, but they are faster than trying to run and maintain 6 seperate instances of SQL Server when only two developers need seperate instances. 😎

  • We had been looking at Team foundation for a few other reasons and went the path of Subversion.

    I will definitely see what was missed in my review for 2008 for the 2010 and see if I would be pushing for this before we get too far into subversion. Thanks for the heads up!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply